Search in 4.x, 5.x, 6.x, 7.x and Cloud

     

    Introduction

     

    This document describes aspects of the search mechanisms in Jive and the differences in functionality between various Jive releases and some implications for search behavior.

     

    Content Search

     

    Mechanisms Prior to 5.0

     

    Each document has three sections: a title, a body and zero or more tags.  In releases prior to 5.0, the text from all three of these sections is combined into a single collection of words for indexing.  Note that when searching with multiple terms there is an implied AND between them; that is, each term must be present to make a match.  This means that one impact of combining the terms from the title, body and tags together is that a document can match a query even if none of the three sections contains all the search terms, as long as when the title, body and tags are combined, they include all the terms.   So if a document has title "Foo", its body includes "Bar" and "Baz", and it has the tag "Gar" then it would match searches like "Foo Bar" and "Foo Baz Gar".

     

    Another, potentially less intuitive impact of combining the three sections of a document for indexing is that matching search terms with the title or tags is no different for search relevance than matching terms in the body -- title and tags are not treated as more important than any other words.  If I search for "Foo", a document called "Foo" is treated the same as a document containing "Foo."   Which one shows up first in the search results is largely determined by which one has a higher percentage of matches.  So a very long document that includes "Foo" once, even if is the title, will show up lower in the search results than a short document that also includes "Foo" once.

     

    Understanding the effect of the size of a document on search results is important when thinking about how Comments and Bookmarks  are treated in search.  (Though the treatment of Bookmarks changed in 4.5.6 -- see below.)   Comments and Bookmarks are treated as separate, searchable entities.  For Comments, the title of the document that the Comment relates to is included with the text of the Comment for indexing.  With Bookmarks, the title of the document the Bookmark points to is combined with the notes in the Bookmark and its tags and then indexed.  Since Comments, and especially Bookmarks, tend to be very short but also include the title of the parent document, searching with terms from this title are likely to return Bookmarks and Comments ahead of the Document itself.   In fact, it can happen that a Bookmark for a Document can appear much higher than the Document itself in certain searches.

     

    Change in 4.5.6

     

    One result of the handling of Bookmarks in search as described above is that when a document has many bookmarks, it can happen that a search might return a large list of bookmarks all pointing to the same document. In the case of Spotlight search, Bookmarks may be the only thing returned. For example:

     

     

    bookmarks.png

     

    To avoid this behavior, in 4.5.6 Bookmarks are excluded from results in Spotlight search.  [There is no change to the behavior of full search.]   In those situations where a document that would otherwise not return high on a search query is actually ranked high because of Bookmarks (though technically, it is the Bookmark that is ranked high, not the document)  it can happen that a document that was previously included in Spotlight search results via a Bookmark now does not appear in the Spotlight search results at all.

     

    Mechanisms in 5.0

     

    Search was substantially revamped in 5.0 in response to several goals.

     

    1. Make matches to the title or tags of a document result in high search ranking independent of the content or length of the body of a document while still enabling multi-term matches that combine words from more than one of the three sections, title, body and tags
    2. Change the handling of Comments and Bookmarks so that searches for a document don't return Comments or Bookmarks instead.
    3. Improve performance and scalability (not discussed here)

     

    The first goal was achieved by associating with each document four separate collections of words:

    1. The title
    2. The body
    3. The set of all tags
    4. The union of 1-3 plus the notes and tags from any Bookmarks made on document (so doc title+doc body+doc tags+bookmark notes+bookmark tags)

     

    Each of these four collections is indexed separately and, when a search is made, a score is assigned to each collection based on matches, and then these scores are rolled up into a final score.  This final score is the total relevance for the document.   With this scheme, an exact match on the title of a document is very likely to return that document at the top.  Similarly, matching tag terms will tend to lead to a high result.  At the same time, because of the inclusion of the fourth collection, we can match searches that combine terms from the title and the tags, for example, just as in pre-5.0 search, as well as bookmarks information.

     

    Given this much more elaborate model for finding a document, it was determined that it would be simpler to simply exclude Bookmarks from search altogether (both Spotlight and full search.)   Bookmarks are not searchable entities, though they can impact search results through the inclusion of their notes and tags with the document. Comments remain as searchable entitles but the title of the document the comment is attached to is no longer added to the Comment text before it is indexed.  The result is that it is still possible to search for a Comment by supplying terms that appear in the Comment itself, but it is very unlikely to search for a Document and have a comment on the document appear ahead of the document itself.

     

    Mechanisms in 6.x

    For the on premise search in 6.x the mechanisms are very similar to 5.x outlined above.

     

    For cloud search (available as an option in 6.x and as the norm for Jive Cloud), some other improvements have been made:

    Status Updates

    Since status updates have no subject field, we only incorporate the relevance match of the body index for status updates. This prevents them from dominating search results.

    Exact word matches in the subject

    We've added additional weight to exact subject matches so that they are ranked higher.

    Word proximity

    When the query terms are next to each other (and in the same order) in a document, that document will be ranked higher

     

    Jive Find

    Additionally, cloud search introduces the concept of Jive Find - this is better detailed here Search in Jive 6 FAQs

     

    Mechanisms in 7.x

     

    Rank search results by word proximity

    When search terms are next to or close to one another, and in the same order in the subject, the content will be ranked higher in search results. This is particularly useful for users who are finding a particular document by its subject.

    Rank search results by exact word match

    Content with an exact match for your search terms in the title or body of the content will be ranked higher in search results. For example, a search for “parking” will still match documents containing either “parking” or “park”, but those containing “parking” would be boosted higher.

    Rank search results by their outcomes

    When content is associated with an outcome, its ranking score will be adjusted accordingly. Content marked as “outdated” will be ranked lower. Content marked with other outcomes will be ranked higher, and those marked as “official” or “final” will get the strongest boost.

    Rank search results by recency

    As content ages, as determined by its last modification date, Search will reduce its relevance score.

    Rank search results by content type

    Since different content types have different characteristics, they are treated slightly different now. For example, status update is no longer ranked so high.

    Better Tag handling

    When a tag is separated by underscores, it is tokenized during search in order to improve search relevancy. For example, a search for “catalog” will now match tag “product_catalog”.

     

     

    Multi-Locale Support

    Jive 7.0 has enhanced multi-locale support on search. 

    Thai analyzer

    Thai is now an officially supported language by Jive. This allows content to be indexed and searched in a Thai-specific manner.

    Diacritics

    For English content, accent is removed during indexing and search. For example, searching for “protégé” will match content containing “protege”.


    Mentioning

    Starting from Jive 7.0, the “mentioning” function is implemented in the same way as spotlight search, for a better user experience.

     

    Some Comments on SEO

     

    Based on the mechanisms outlined here, it is possible to identify techniques for making a particular document more likely to appear highly ranked in response to a specific search query.   Prior to 5.0, Bookmarking a document could have a major impact on search results (except for Spotlight search in 4.5.6), as could adding short comments.   On the other hand, the title and the tags on a document, while included in the index, do not have nearly as dramatic an impact.

     

    In 5.0 and 6.0, things are very different.  Adding a Bookmark can make a difference but only through its notes and tags; the fact of the bookmark alone does not matter in 5.0 search.   Also, adding a Comment is unlikely to change the search results for a document.  The title, however, is very important in 5.0 and 6.0 search.  But the title must contain ALL the search terms.  A document called "My Adventure on Big Bear" may not match highly with a search for "Big Bear Mountain".  But it definitely will if the title is "My Adventure on Big Bear Mountain".   Also, tags become a dramatically more powerful tool for search engine optimization in 5.0.  But as with title, it is important that ALL the search terms be included in the tags.  If the document "My Adventure on Big Bear" had one tag "Mountain" then it would match the search "Big Bear Mountain" but it would typically be ranked much lower than if the tags were "#Big_Bear" and "#Mountain".

     

    People Search

     

    At a high level, people search includes all terms from profile fields.  There are, however, some exceptions.  First, certain fields can be configured to be searchable (or not).  Non-searchable fields are obviously not included in the search index.   A more complex situation arises when a user has declared certain profile fields to be private.  In this a, the a searcher who does not have access to a particular profile field will not receive a match on text from that field, but a user with access will get a hit.  Note that the functionality to hide profile fields and respect that in search was new in 4.5.

     

    The people search feature has the capability to match make matches based on similar sounds rather than just literal string matching. With this feature, a search for "Rumelhardt" would match a person named "Rumelhart," whereas just using string matching would not.  In 4.5 this phonetic matching is used by default for people search, though it can be disabled through configuration.  In 5.0 phonetic matching is off by default, though it can be enabled for an individual search on the main people search page.  Furthermore, in 5.0 synonyms were added for people search.  This means that people search results will include matches for common variations on names, such as "Bob" for "Robert."

     

    Places Search

     

    Places search is pretty simple: the name, tags and description of a place are combined and indexed.  A search can match against any combination of terms with relevance determined by the number of hits relative to the amount of text.  This is unchanged from 4.5 to 5.0 to 6.0.

     

    Spotlight Search

     

    In Spotlight search, a wildcard, *, is added to the end of the search query.  This means that matches will be found for the exact string typed or for any string that starts with what has been typed.  The idea is that results can appear before the user has finished typing.  Spotlight searching will not begin until the user has typed at least three characters.