A Guided Tour of the Analytics Database

Version 3

    Let's start at the very beginning - where do analytics events come from?


    The Analytics Listener


    Analytics events don't come from the application database.  Certain actions trigger the analytics listener to serialize an event to the analytics database.  A listener is a program that waits for events to happen.  When they happen, they trigger the listener to perform an action - in this case, recording the event in the analytics database.


    This means that if you disable analytics, it will turn off the listener and stop collecting data.  There isn't any way to recover data if analytics is disabled. 


    It also means that increasing the log levels to 'debug' level for analytics will record a huge amount of information.  This could negatively affect performance if left on for too long.


    The Stage Table


    The jivedw_activity_stage table is where all of the data first goes when it is serialized to the database.  The stage table uses the same IDs as the application database.


    All of the most basic data about the event will be here.  This shows the columns of this table:

    dane_4_5_6_0_analytics=# \d jivedw_activity_stage

                  Table "public.jivedw_activity_stage"

            Column        |            Type             | Modifiers


    activity_ts          | timestamp without time zone | not null

    user_id              | bigint                      | not null

    direct_object_type   | integer                     | not null

    direct_object_id     | bigint                      | not null

    indirect_object_type | integer                     |

    indirect_object_id   | bigint                      |

    activity_type        | integer                     | not null

    container_type       | integer                     |

    container_id         | bigint                      |

    metadata             | character varying(3500)     |


    All analytics data is similar to a sentence in English.  At activity_ts-o'clock, this user_id did activity_type to direct_object.


    Sometimes there will be an indirect_object, like when Karl Cyr comments on a document.  Here, Karl is creating (activity_type) a comment (direct object) on the document (indirect object).


    Other times, there will be a container_id, like when Vinh Jones starts a discussion.  He'll create (activity_type) a discussion (direct object) in a group/space (container).


    Usually there will either be an indirect object or a container, but not both.  If there's an indirect object and you need to know what container its in, you'll need to find an event where that indirect object is the direct object.  This also means that you'll need to trace back through several events to figure out the container's container.

    Running the ETLs (Extract, Transform, Load)

    But, there's a lot more to the analytics database than a flat table of events.  Next, the ETL data transfer processes will run and move the data out of the stage table.  By default, these run at 2am in the site's locale.  You can run them manually in the Admin Console under Reporting > Settings > Data Load > Run.   (Don't do this during peak hours on production, though!)  These will always run on the master node, so check there for any logging you need.

    There are a lot of ETL processes that run to sort the stage data into the rest of the database. Broadly, some take the stage table and move the data into the various jivedw_activity tables.  Others import data from the application database so we can join to it when we write reports.

    The ETLs will change the object IDs.  The object_ids are only unique within each object_type.  These are the IDs that match the application database.  So that each piece of content has its own unique ID (regardless of object type) the analytics database mostly uses dw_object_ids. DW stands for data warehouse. You can connect the object_id and the dw_object_id in the jivedw_object and jivedw_container tables.

    The Activity Tables

    The jivedw_activity_fact table most closely resembles the stage table.



    • The fact table has no container_type (join to the jivedw_container table for this.)
    • It does have a day_id in addition to the activity_ts
    • The IDs in the fact table are dw IDs instead of regular object IDs.


    The fact table still has one row for every event in the analytics database.  It is the backbone of the activity information.


    dane_4_5_6_0_analytics=# \d jivedw_activity_fact

                   Table "public.jivedw_activity_fact"

            Column         |            Type             | Modifiers


    activity_ts           | timestamp without time zone | not null

    day_id                | integer                     | not null

    user_id               | bigint                      | not null

    activity_type         | integer                     | not null

    direct_object_type    | integer                     | not null

    direct_dw_object_id   | bigint                      | not null

    indirect_object_type  | integer                     |

    indirect_dw_object_id | bigint                      |

    dw_container_id       | bigint                      |

    metadata              | character varying(3500)     |

    When writing reports, it can be easier to query the aggregate tables.  There are three regular aggregate tables:





    These each count the number of times that the same activity happens in a day, week, month.  They have an activity_count column with this number.  They join to the jivedw_day, jivedw_week, and jivedw_month tables respectively.


    dane_4_5_6_0_analytics=# \d jivedw_activity_agg_day

       Table "public.jivedw_activity_agg_day"

            Column         |  Type   | Modifiers


    day_id                | integer | not null

    user_id               | bigint  | not null

    activity_type         | integer | not null

    direct_object_type    | integer | not null

    direct_dw_object_id   | bigint  | not null

    indirect_object_type  | integer |

    indirect_dw_object_id | bigint  |

    dw_container_id       | bigint  |

    activity_count        | integer | not null

    There are three agg_user tables:





    These tables take out the user_id column, so they count the number of times an activity happened regardless of who performed it.  These tables can be a bit more confusing, so I rarely use them.



    The Meta Tables


    Just looking at the activity tables, we're still missing some information.  What do all of these numbers mean? To answer this, we used to rely heavily on documentation: Analytics Data Model.  But it can be tough to keep documents like this manually updated, making it confusing when we added new activity or object types.


    4.5.6 and 5.0 added two new tables, jivedw_activity_meta and jivedw_object_meta. These list the activity and object types:

    dane_4_5_6_0_analytics=# \d jivedw_activity_meta

             Table "public.jivedw_activity_meta"

        Column     |          Type           | Modifiers


    activity_type | integer                 | not null

    activity_name | character varying(100)  | not null

    description   | character varying(2000) | not null


    dane_4_5_6_0_analytics=# \d jivedw_object_meta

            Table "public.jivedw_object_meta"

       Column    |          Type          | Modifiers


    object_type | integer                | not null

    object_name | character varying(100) | not null


    Note on custom content types:  The object type IDs are the same for the application and analytics databases.  So if you are trying to find the object type for a custom type, it will be the same long value as the one in the jiveObjectType table from the application database.  Keep in mind that not all plugins and customizations trigger the analytics listener, though.  For example, the Ideation plugin does not currently include analytics.



    Data from the Application Database


    Most of the other tables in the analytics database copy data out of the application database.  Since some database management systems (including Postgres) don't support joins between databases, this helps flesh out our reports with meaningful information about users, places, objects, etc.   Additionally, when something is deleted in the application, we usually delete it from the application database.  For analytics, we want to keep historical information about deleted data so we can make sense of older information in the activity tables.


    Many of the tables available are listed in the analytics schema.  You can read the full schema for your Jive version in analytics.xml.


    ETL Tables


    You can see whether the ETLs are completing successfully in the Admin Console under Reporting > Settings > Data Load > ETL History.  Sometimes it's faster to check this in the analytics database. This information comes from jivedw_etl_job

    As mentioned before, there are many individual ETL tasks.  Sometimes, knowing that all of them are completing is not enough information.  You need to see when a specific ETL task ran most recently.  In this case, check jivedw_etl_task.  Not all of the ETLs are listed in this table.  For the ones that are, you can see the timestamp of when the ETL last completed (the lastprocmodificationdate column).