3 Replies Latest reply on Jul 14, 2010 10:38 AM by mark.williams@jivesoftware.com

    Jive - Analytics Data Migration - Best Practice


      My current project is planning to migrate a Jive 3.0 installation hosted by Jive into a Jive 4.0/4.5 installation hosted onsite later this year.  We will be moving from PostgreSQL to Oracle.  For migration purposes I am trying to determine the following regarding migrating the existing analytics data from the 3.0 install and whether Jive has a recommended best practice/approach:


      1. Should we bother to export the analytics database from the 3.0 installation and import into the Jive 4.0/5 installation?  Or just run the Jive 4.0/5 installation ETL process against the converted application database to repopulate the analytics data warehouse (after the application database conversion is complete)?


      2. If the preferred approach is to migrate the data from the existing 3.0 installation then how will the new objects available with the later analytics install be populated (e.g. JIVEDW_COMMUNITY, JIVEDW_SGROUP, etc. - these do not currently exist in our Analytics Warehouse).  Will these fill in seamlessy if the older analytics data is converted?


      3. If the preferred approach is to not migrate the data and rather run the ETL processes from scratch - are there any risks with this approach in terms of data integrity and lining up with the data currently being reported off of in the 3.0 install.


      Thanks for the help!



        • Re: Jive - Analytics Data Migration - Best Practice

          Simply re-running the ETL dataload process will not recover all of the data - the ETL process only copies dimension data (e.g. attributes about the content). It does not copy any activity data since this is only collected and stored in the analytics database.


          These are the steps I think believe should work:


          1. Upgrade SBS to 4.5. This will not only upgrade the SBS database but will also upgrade the analytics database (both on Postgres).

          2. Migrate SBS database to Oracle.

          3. Start SBS and in the admin console configure analytics to point to the new empty Oracle instance. This will create the analytics schema in Oracle.

          4. Stop SBS

          5. Export data from the previous analytics postgres database to the new Oracle database

          6. Reset Oracle sequence numbers in the analytics database.


          The last step is the annoying step. Analytics uses sequence numbers in Postgres and Oracle for generating surrogate ids for some of the tables. When a new schema is created in Oracle the sequences will be initialized at 1. When the ETL data load processes run they will use these numbers but will receive duplicate key exceptions when inserting records as they are already in use. Thus the sequences must be incremented to a number greater than the last id used.


          Here are the following sequences and the columns that use them:




          I have never tested this approach so some validation is required.

            • Re: Jive - Analytics Data Migration - Best Practice

              Mark - Thanks for the response and I believe those are the same steps you had originally sent me.  I believe there was a little bit of confusion on what I was asking for this time around.  Here are a handful of questions I had based on the information you provided below (these are the ones I had recently sent Priya):


              <Original Note Sent To Priya>

              From our conversation Wednesday, we discussed converting the analytics data from our Jive Pilot install into our local install.  As I mentioned, I have already had some questions answered by Mark Williams (see below) and now that we are getting around to finally testing this process I have encountered a couple additional questions:


              1. For Mark’s step #5 - Does Jive have any tools available to convert the Postgres Database format to Oracle?  Walt has been testing converting the Jive Application database using the conversion tools from Jive but I am not sure if there is anything available from  Jive for converting the analytics database tables to Oracle.  If there is not a readily available tool does Jive have a recommended approach for getting the data out of Postgres into Oracle?


              2. Marks approach below will convert all tables from the Pilot Analytics database, but really all I need to convert is the Activity Fact table.  Would it be safer If I went through the trouble to only convert the fact table by:

              1. Turning on analytics in our new Oracle environment
              2. Running the ETL process for the first time to populate the dimensions from the new oracle application source
              3. Used some temp tables from the old analytics source to map object ids and object types to the new DW_OBJECT_IDs and the same for DW_CONTAINER_ID, etc..
              4. This approach would be a little more complicated (maybe) but would limit the data that I am forcing into Analytics to just the historical activity data and not the dimensions themselves.  I am curious to get Mark or Jives take on this.
              5. I am concerned that the process above will not work if dimensions were deleted from the application and will not come over, so maybe I should just stick with the process below.





                • Re: Jive - Analytics Data Migration - Best Practice

                  1. We don't have any tools for migrating from Postgres to Oracle, at least not for analytics. We did have a migration tool that allowed people to migrate to Oracle or Postgres from other database platforms on an upgrade from 2.5 to 3.0, but I'm not sure we support that anymore. Also, this only operated on the main OLTP schema.


                  2. This approach is definately more complicated would be fairly difficult to implement. The DW_OBJECT_IDs and DW_CONTAINER_IDs are system generated ids from a sequence, so you would need to figure out what the object id and object types they mapped to first, but I assume this is what you meant when you refer to the temp tables. You would also need to make sure that the TIME_ID columns map correctly too, since these are system generated when the TIME_ETL task runs and will not match up with the original source database. It's certainly a do-able approach, it's just that you will need to be extra careful in making sure things align.