4 Replies Latest reply on Apr 7, 2014 3:26 PM by bperkinsatmarsdd

    Data Migration Question (or, how to bypass activity engine)

    bperkinsatmarsdd

      Sorry for the long post but this issue needed some background for context.

       

      We currently have a custom content type plugin based on 5.0 that we’ve re-architected in order to move to 7.0 for our on-prem Jive instance.  Instead of the data being held inside the Jive database as part of the custom content type, we’ve moved it to a separate external database and will be surfacing it to users with custom tile connecting to a web service for the data (thanks Ryan Rutan and Ryan King for sowing the seeds of this solution during a chat at JiveWorld 11)

       

      One of the features we would have given up with the move away from the custom content type is Jive’s native functionality such as search, @mentions, etc.   We’re addressing this by simply publishing content type’s text as a native Jive HTML document.  This gives us back the native search and @mention functionality and is looking like it will work quite well in practice.

       

      However, migrating the 17,000+ custom content type records is looking tricky.  Getting the data into the separate database is straightforward, but we’d also like to publish all of those records as HTML documents.  Additionally, we’d like to retain original authorship and publish dates.  We also need to store a reference to that HTML document (the documentID and contentID) in our external database, making the XML document migration framework unhelpful for this task.

       

      We think that the REST API can almost get us there.  Although we don’t have access to the publish date until the 3.6, we can hack this by updating that field directly to the database after the REST API has created the record (we need a reference to it anyway in our external database).

       

      However, the gotcha is that if we post all those documents via the REST API, the activity engine will pick these up and flood user’s streams with all those new documents.  I can disable the activity engine nodes during the migration, but Jive happily queue’s them up until one comes back online. This otherwise good design is throwing a wrench into our plan.

       

      My question: Is there some way to “flush” this queue so that notifications do not go to the activity engine during the migration?  And if so, will this also prevent the digest emails from picking up the new documents? We’re not adverse to database surgery if that’s what it takes… would just need to know what records need to be purged.  Thanks in advance for anyone who can provide some insight to this.

       

      Brent

        • Re: Data Migration Question (or, how to bypass activity engine)

          Features to assist migrations like this (in particular, maintaining published date and skipping activity stream entries) are planned for the upcoming cloud release, but have not been backported to 7.0.x.

           

          You can maintain authorship, however, by performing the migration as a user with admin privileges, and using the Core V3 API - Run-As Feature & Signed Add-Ons to set the correct author.

            • Re: Data Migration Question (or, how to bypass activity engine)
              bperkinsatmarsdd

              Thanks, Craig. I saw that the newest API would do this but was told it was months away from backporting that to 7.x (as in, it won't actually come out until 8.x).  We'd need to migrate sooner than that (like in the next month).  If you have info to indicate sooner availability, I'd love to hear.  However, if it's a ways off, is there a hack that allows me to bypass the stream entries now? For example (and I'm making this up)

              1. turn off activity engines
              2. create documents using 3.4 Rest API
              3. do magic X
              4. update creation dates in database via SQL (would need to figure out what tables to update)
              5. restart activity engines

              ... where magic X is anything from flushing in-memory queues with a restart, to truncating tables that may contain messages waiting for the activity engine. I'm not adverse to doing minor database surgery if I knew where to safely poke the right thing.

               

              Thanks for any insight on this (or guidance if you think we're heading into badness).

                • Re: Data Migration Question (or, how to bypass activity engine)

                  I'm not familiar with any plans to backport these changes -- they were pretty invasive.  And the "do magic X" part of your formula makes me really nervous that something else might get accidentally broken in the process.

                   

                  I know our Professional Services organization assists in quite a few migration scenarios, and they are pretty expert at database-to-database migrations (while Jive is down), which would have the effects you are after (maintain author and published date, skip activity stream creation).  They would also know how to make sure that the HTML content actually gets into the search index (which is normally one of the things that the activity engine does), which I would assume you'd also want.  It might be worth talking with them to see how big a task this would be.