Version 2

    Summary

    This document describes the contents of "Jive Vital Signs" page for the Jive application. Enabled via sys prop: vitals.enabled = true

     

    Rationale

    The idea is to provide a single web service which returns a list of key/value pairs indicating the status of various internal health checks. These checks could include connectivity to (and response time of) various services such as EAE, Recommender, Analytics and Search. The benefits of this approach include:

    • Providing an end-to-end check of all internal Jive services.
    • Simpler configuration as we new services are added to the Jive stack. Each additional Jive service introduces another key/value pair to the page which are captured by simply adding a data point to the monitoring system's template for the health check page.
    • Health checks are displayed in context, providing more meaningful information to administrators for troubleshooting problems.

     

    Configuration

    By default, the vitals service is disabled.  Set the system property vitals.enabled to true to enable Vitals page.  Changing this does not require a restart of Jive.  In addition each test can be enabled or disabled individually. For example to disable a test, set a "vitals.<testname>" system property to false; the test names for each test are listed in the table below.

     

    Note:  Not authentication is required.   Any user can call the service once enabled (even guests, and even if disallowGuest is set to true).  On-premise or hosted customers wishing to use this feature are recommended to control access to the web service URL via IP restrictions in Apache.

    Output

    The web service is located at /api/core/v3/vitals. The service does not require authentication and responds to a GET request with a JSON-formatted block that looks like this in Jive 7:

     

    throw 'allowIllegalResourceCall is false.';

    {

      "Event Dispatcher Queue Depth" : [ {

        "results" : {

          "Queue Depth" : "0"

        }

      } ],

      "Cluster Message Queue Depth" : [ {

        "results" : { }

      } ],

      "Cache Server(s) Connectivity" : [ {

        "results" : { }

      } ],

      "Database Connectivity" : [ {

        "results" : {

          "Connectivity" : "1"

        }

      } ],

      "Purposeful Places Integration" : [ {

        "results" : {

          "Tile Installation" : "0",

          "Api Gateway Connectivity" : "-1"

        }

      } ],

      "Search Service Client Status" : [ {

        "results" : {

          "Directory Service" : "1",

          "Tenancy Service" : "1",

          "Query Service" : "1",

          "Activity Ingress Service" : "1"

        }

      } ],

      "Apps market connectivity test" : [ {

        "results" : {

          "Jive Url connectivity" : "1",

          "Apps market connectivity" : "1",

          "Gateway Connectivity" : "1"

        }

      } ],

      "Document Conversion Status" : [ {

        "results" : {

          "Enabled" : "0"

        }

      } ],

      "Activity Engine" : [ {

        "results" : {

          "RemoteIndexableQueue depth" : "0",

          "ReshredQueue depth" : "0",

          "RecommendationRequestQueue depth" : "0",

          "Connectivity" : "1",

          "RecommendationQueue depth" : "0",

          "Server host/port" : "jarvie-z800:7020",

          "ProcessingQueue depth" : "0",

          "Recommender Available" : "0",

          "DeletionQueue depth" : "0",

          "UpgradeQueue depth" : "0",

          "RemoteIndexableRetryQueue depth" : "0",

          "CommandQueue depth" : "0",

          "ShredderQueue depth" : "1",

          "MostlyOrderedQueue depth" : "0"

        }

      } ],

    "Cloud Analytics Status Test" : [ {

        "results" : {

          "Enabled" : "1",

          "Last Activity Send Success" : "1",

          "Cloud Analytics Connectivity" : "1"

        }

      } ],

      "Analytics Connectivity" : [ {

        "results" : {

          "Enabled" : "1",

          "Database Connectivity" : "1"

        }

      } ]

    }

     

     

     

    The integer result is meant to represent a boolean, 1=true, 0=false, and -1=error (threw an exception).

     

    Test name

    System property to disable

    Return key

    Interpret
    Return valueas

    Notes

    Database Connectivityvitals.DatabaseConnectivityConnectivityBoolean
    Cache Server(s) Connectivityvitals.CacheServerConnectivity(Cache server host name)BooleanTests connectivity to each cache server (there can be more than one return key for this test). This will also return empty results if there are no cache servers or if there is only one web application node.
    Document Conversion Statusvitals.DocumentConversionStatusEnabledBoolean
    Failed conversions in the last 48 hoursIntegerThe number of documents that failed to convert in the last 48 hours. Does not display if conversion is disabled.This number is capped at 50.
    Activity Engine Connectivityvitals.ActivityEngineConnectivityConnectivityBoolean
    Recommender EnabledBooleanWill only display false; will not be present if recommender is enabled
    Recommender AvailableBooleanWill only display if Recommender Enabled is true
    RecommendationRequestQueue depthIntegerAnything above about 100 or so likely means that we are having trouble communicating with the recommender and requests for new recommendations are backing up.
    RemoteIndexableQueue depth onIntegerThis only applies to >= 6.0 installations running a muilt-node EAE configuration. If this value grows larger than 100-500 items (depending on site traffic) the EAE nodes are either having trouble communicating or one/multiple are overloaded.
    CommandQueue depthIntegerA value > 20 here would be considered large. This queue is used to hold requests to reshred activities/rebuild activities and usually doesn't fill up at all.
    RemoteIndexableRetryQueue depthIntegerThis only applies to >= 6.0 installations running a muilt-node EAE configuration. Any value here indicates that we're having trouble communicating between EAE nodes.
    ShredderQueue depthIntegerA value > 500 here means that we aren't shredding activities fast enough to keep up with the incoming flow or that we aren't able to connect to the recommender.
    ProcessingQueue depthIntegerA value > 100 items here means that new activity won't be showing up in the streams. This is the main processing queue for all new activity.
    UpgradeQueue depthIntegerA value > 0 here means that an upgrade is happening which also means a rebuild of the EAE lucene index via the firehose. This is normal just after an upgrade but while there are items in this queue users will likely notice either a slowdown in new activity or a slight slowness viewing streams.
    RecommendationQueue depthIntegerAnything above about 100 or so likely means that we are having trouble communicating with the recommender and requests for new recommendations are backing up.
    ReshredQueue depthIntegerThis will only have a value > 0 if we are currently running a reshred (which should be fairly rare). A positive value here will probably indicate some site slowness due to the shredding.
    MostlyOrderedQueue depthIntegerAny value > 100-500 here (depending on site traffic) would be worrisome and probably indicates that users would be seeing weird read-tracking issues in their inbox. This queue handles activities that should be processed quickly (reads/unreads/follows/unfollows/etc).
    Server host/portStringThe host:port that this results package refers to.
    Analytics Connectivityvitals.AnalyticsConnectivityEnabledBoolean
    Database ConnectivityBooleanWill only display if analytics is enabled.
    Last ETL StateInteger1 means Running, 2 means Completed, 3 means Failed, 4 means Interrupted. If no ETLs have run, this element is omitted.
    Hours ETL RunningInetgerShows number of hours ETL is running, but will only show if Last ETL State is 1 and at least one ETL has run.
    Cluster Message Queue Depthvitals.ClusterMessageQueueDepthMember (name) message queue depthInteger(name) is a GUID. This test measures how many cluster messages are stacking up - a large number indicates the install is dying. There will be an entry for each cluster member. This will return empty results if there is no cluster configured.
    Search Service Client Statusvitals.SearchServiceClientStatusTenancy ServiceBooleanDid the last request from webapp to backend service complete successfully?
    Directory ServiceBooleanDid the last request from webapp to backend service complete successfully?
    Activity Ingress ServiceBooleanDid the last request  from webapp to backend service complete successfully?
    Query ServiceBooleanDid the last request  from webapp to backend service complete successfully?
    Event Dispatcher Queue Depthvitals.EventDispatcherQueueDepthQueue DepthIntegerThe number of events waiting to be processed. A large number indicates the event dispatcher is blocked up - poor performance or an outage can result.
    Cloud Analytics Status Testvitals.cloud.analytics.test.enabledEnabledInteger0 is False1 is True
    Cloud Analytics ConnectivityInteger0 is Failure1 is SuccessOnly available if "Enabled"
    Last Activity Send SuccessInteger0 is Failure1 is SuccessOnly available if "Enabled"