58 Replies Latest reply: Jan 7, 2013 1:36 PM by Tracy Maurer RSS

    Want to improve search relevancy in Jive?  Here's your chance!

    Ryan Rutan

      You take the good...you take the bad...you take them both and there you have, the facts of .... Search????

      (High-five for anyone who can name the TV show without Google)

      That's right.  In a way, Search is just like that.   Crawling and indexing content, both good and bad, and trying its best to provide fast and relevant results.  But, much like most scenarios, there are always trade-offs.  In this case, it's usually relevancy vs. speed.  Over the past major iterations of the Jive platform, Jive's engineers have maintained a constant focus on ways to keep delivering on faster and more relevant search results.  Most recently in Jive 5, the search architecture was completely revamped to set the stage for conversations such as this one:

       

      How can we make search relevancy better?

       

      Problem: Search relevancy is often considered a qualitative metric.  In order for Jive engineers to properly tune and tweak the search relevancy algorithms that ship with the product, we need use-cases that we can test against, but most important ... they need a common data-set, which is very difficult to share in most cases. 

      • If only there was a way we could articulate use-cases on a common data-set. 
      • If only feedback could be received prior to product release using a live system, and not mock data.

       

      Solution: The Jive Community.  It's a live Jive 5 instance with a large data sample, and houses many of the same use-cases customers see in their own instances.

      Instructions on How to Help using the Jive Community:

      We are asking customers to share their search relevancy grievances with Jive in this conversation, and asking you to provide the following detail(s), if available.

       

      1. How are you searching?  (@mentions, spotlight search, search page, other)
      2. Example of the Search Query Typed
      3. Example of the Data Scenario
        1. For example, where were you expecting the match to hit?  Subject? Body? Attachment? Tags? Comments? Binary Contents? Inline Comments? Hidden Meta-Data?
        2. Also, specifics about the context.  Where you searching across containers? Single Container? Filters? etc...
      4. Example of the Search Results Returned
        1. For example, providing a screenshot of the Jive Community results (redacted if needs be) is extremely helpful.
      5. What is your "result sentiment"?  Are the results returned:  (fair, unexpected, right on, random)
        1. Please also indicate if this is negative or positive search behavior.

       

      Note: If a search use-case is already listed, please Like it so we can gauge the reach of a use-case.

      How will this feedback be used?

      For each piece of reproducible feedback that our engineers receive that is articulated in terms of the Jive Community, Jive will include the scenario into a suite of regression tests that Jive can use to validate future relevancy tuning initiatives.

       

      If you have any questions about this effort, please feel free to ask either myself or Karl Rumelhart (who is leading this effort from Engineering).  We look forward to your feedback.

        • Re: Want to improve search relevancy in Jive?  Here's your chance!
          Ryan Rutan

          To start this off, I'll share one of my use-cases:

          1.) @mentioning or Spot-Light Search

          2.)  "@Business"

          3.) The name of the intended target is Business and should match on the Subject.  Global search, no filters.

          4.)

          Screen Shot 2012-01-05 at 11.33.10 PM.png

          5.) Not ideal.  There is nothing else for me to type to get a match.  It would be ideal if exact matches were promoted. 

           

          Existing Workaround:  If you find yourself in a similar situation, you can usually append an _ (underscore) as the next character and this will relay a stop word to the search engine, and it usually matches...but this is somewhat non-intuitive.  The idea is to not have to leave the Context to do the search. 

          • Re: Want to improve search relevancy in Jive?  Here's your chance!
            Karl Rumelhart

            I just want to jump in here to emphasize how valuable it is for us to get this feedback.   We are going to put considerable effort into search relevance work over the coming months but we need test cases!   I know that many folks have examples in their own communities of searches that aren't giving them what they would like.  If we can get equivalent examples in the Jive Community -- where we have access to the data set -- we can actually put tests in place where we tweak the search algorithms until they produce the desired results against the actual data set. 

             

            So please keep the examples coming!! 

             

            By the way, it is also great to get examples of searches that DO give what you want.  When we make changes to search to improve in cases where it isn't working as well, we need to make sure that we don't start to do worse in places where it is working great.  So if you do a search and get exactly what you want, let us know about that too.

            • Re: Want to improve search relevancy in Jive?  Here's your chance!
              mrowbory

              My request is for the search system to actually return the number of search results, and preferably a breakdown of the amount of each content type within the search listing.

               

              This is related to relevancy in the way that it makes it easier to tell how likely the search has the result I'm looking for.  i.e if I get 25 results found, I know it's worth paging through a couple of pages to see if I can find what I'm looking for.  If I get 2000 results then I know straight away that I should probably add some more terms to refine the search.

               

              By having the content type filters show the result count breakdown I know that if I'm looking for a blog result and there are 10 blog results out of 1000 total, I can refine simply by clicking on blogs and likely find the result I need.

               

              An example:

               

              There were 250 results for your search 'jive'

               

              documents (120)

              discussions (25)

              events (0)

              status (12)

              etc

              etc

              etc

               

              Clicking on the filter name would filter by that content type, a check box would allow you to select multiple content types.

               

              This is fairly standard functionality on most search systems and I've certainly never seen one that doesn't give you the amount of results that were found.

                • Re: Want to improve search relevancy in Jive?  Here's your chance!
                  Karl Rumelhart

                  My request is for the search system to actually return the number of search results, and preferably a breakdown of the amount of each content type within the search listing.

                  A great question, actually.   It would be awesome to show the number of results.  This is something that those of us on the search team would really like to provide.  But to be open about it, this is still a ways out. To give you some insight into why this is harder than it sounds, the challenge is related to permissions.  We know how many items match the query.  What is harder is to show how many items match the query that you have permission to see.  

                   

                  Please keep the ideas and use cases coming!

                    • Re: Want to improve search relevancy in Jive?  Here's your chance!
                      mrowbory

                        I know the back ground for the reasons, I just hoped that the archtiecture issues might have been addressed, or at least planned with some more urgency.  This affects paging on all the widgets and browse all pages etc.

                       

                      I actually wrote a proof of concept that sent the lucene results and passed those to the DB to do a nice set based join with the permissions tables. Worked quite well and pretty fast.

                       

                      We've also started indexing a flag to say if the content is 'public'. This allows us to do a simple search and know that all the results are valid for a user who isn't logged in, we use this to provide an api style feed to our other websites so they can see if there is related community content.

                       

                      I've been trying to think of some sort of hash or other encoding you could index in lucene that would convey the permissions somehow. I would expect in most setups any particular user wouldn't have specific access to such a great number of containers. I'll give this some more thought.

                  • Re: Want to improve search relevancy in Jive?  Here's your chance!
                    Ryan Rutan

                    Another use-case,

                     

                    Spot-light or standard Search

                    filtering content by Gia Lyons Authored or Participated

                    Term business unit (matches community which is in almost everything she writes) ... I try with "business unit" and no results.

                     

                    Intended content:  Example User Activities by Business Unit

                    • Re: Want to improve search relevancy in Jive?  Here's your chance!
                      Ryan Rutan

                      Another exact match candidate in spotlight, "Admin Essentials Plugin" result #1 (awesome) ... in @mention, not on the page.  It has been there for a while now, so not sure why it has been bumped.  But exact matches should be #1 result...general feedback. =)

                      • Want to improve search relevancy in Jive?  Here's your chance!
                        Tracy Maurer

                        To your original point, the answer is "The Facts of Life"

                        • Re: Want to improve search relevancy in Jive?  Here's your chance!
                          JohnSchwiller

                          Spotlight search, typing "Chris Taylor" trying to find Chris Taylor doesn't find him in the displayed 'top hits'

                          • Re: Want to improve search relevancy in Jive?  Here's your chance!
                            Tracy Maurer

                            I believe that Jive already has a synonym dictionary that as admins, we are able to modify. Perhaps making this something that more people can easily add to would assist in the search relevancy? For example, there is a group or space admin doing a search. The first search doesn't return the item they want, so they try something different. If at that point, they could mark the word they used to find the document as a synonym to the first word they tried, that might help other people get better results. Or as a sys admin, I could be more proactive about adding to the synonyms because it is in the workflow (front end, easy) instead of me having to bounce over to the Admin Console, remember where the dictionary was, etc.

                            • Re: Want to improve search relevancy in Jive?  Here's your chance!
                              LG .

                              How are you searching?

                              That's a good question, without a search report it's hard to say. 1st wish: Offer search reports.

                              Our users enter one or two words. One can create a very basic search report with:

                              cat access_log|grep ' /search.jspa'|tr '[& ]' '\n'|grep '^q='|cut -d= -f2|sort|uniq -c|sort -rn|sed 's#\(\+\|%20\)# #g'|head
                              
                              Example of the Search Query Typed

                              You may provide some. If you ask admins or power users here then you will likely get queries which real users do never use.

                              Does one really use wildcard search or  "... AND ...". Do users really change the "?view=content" to "People" or "Places"?

                              where were you expecting the match to hit?

                              I do not care, the most relevant document should be returned first.

                              Creating the Search Experience: Introduction - Google Search Appliance - Google Code contains some features which one may want to have (Filters, KeyMatch, Related queries, Remove URLs, Dynamic result clusters, Query expansion, Result biasing)

                              In order for Jive engineers to properly tune and tweak the search relevancy algorithms that ship with the product, ...

                              Allow the admins to modify the factors which have influence on relevancy.

                               

                              I do want only one result page which contains content, people and places. With an option to drill down / cluster into threads, documents, people, ...

                              Why "only" one page with all results? Look up the phone number of someone. Search for "CTO" or "Matt Tucker" and expect as the first result a link to Matt Tucker. Of course there's no phone number on the profile/summary page but internal instances may show one. Currently one does need to click on "people" to see it. So one would save one click looking up a phone number.

                              • Re: Want to improve search relevancy in Jive?  Here's your chance!
                                Ryan Rutan

                                one thing I've noticed/observed has to do with search relevancy being impacted by the author being disabled or not.  It appears that if a user is disabled, that search relevancy either ignores content matches or applies at least a different path to relevance.   Regardless of a person's status (enabled/disabled), content should be found the same.  Dont have a strong use-case on this one, but pretty sure that this can be replicated.

                                • Re: Want to improve search relevancy in Jive?  Here's your chance!
                                  Cathy Woodard

                                  This is a great thread.  It looks like you are constantly improving the search relevance logic.  (including things like # of times the search criteria are in the title, content and tags, # of views, rating?, documents higher than blogs?, other?)

                                   

                                  Can someone provide me with a "user friendly" explanation on how Jive 5.0 search relevance works?

                                  • Re: Want to improve search relevancy in Jive?  Here's your chance!
                                    Tracy Maurer

                                    Karl, Ryan - I was just responding to something else and an idea struck me. What if there was a tool that would allow either a sysadmin or a content creator to check the findability of their document? There are SEO tools that help you check whether you've done a good job optimizing an article for certain keywords. Something similar for content in Jive would be a win-win!

                                    • As a content author, I could validate whether or not I was likely to reach my audience.
                                    • If the content author didn't bother to do that and there were complaints, the sysadmin could check to see why the content wasn't being found by the audience.

                                     

                                    The tool should not just say whether or not the piece of content would be found using the keyword, but offer suggestions as to how to improve the relevance. Because each Jive instance is closed, you could even offer suggestions for improvement based on the existing body of content. For example, if I asked whether X document would be found if I searched for "keyword", the result could say, "This piece of content would be ranked #25 for relevance. Your selected keyword is mentioned in the content body once. Increase the number of mentions to 10 for it to reach the top 5 results."

                                      • Re: Want to improve search relevancy in Jive?  Here's your chance!
                                        LG .

                                        "each Jive instance is closed" - What does this mean? There are a lot of public instances.

                                        Adding hints will either confuse the user or lead to silly documents. If the document does contain "example" only one time in the body then it is likely not as relevant as other documents. If users try to push their documents by adding random or useless keywords nothing is won.

                                        Most content authors do not know what search terms the users do enter, so it can be really hard to use the right keywords. If the users search for "paradigm" instead of "example" then the document will likely never be found. Unless the list of synonyms contains both words.

                                         

                                        Checking whether the title matches the body and the tags/keywords would be a great help.

                                      • Re: Want to improve search relevancy in Jive?  Here's your chance!
                                        mnevill

                                        One use case that I see is showing results if a configurable percentage of the terms show up in the title, body, tags vs all of them needing to be present.  As a real world example if my search string was "configure xauth vpn on router 2055" and some content lived in Jive that contained vpn, 2055, and xauth I would want that to show up in the results.  I see the value in having AND be the default logic as it gets you better results, but it feels like the more terms you search with the better chances you have to miss relevant content.  Our search admin uses a Solr parameter (on our non-Jive website) to set configurable values like if 2 out of 3 or 3 out of 5 terms are tied to the content throw it in the results.  It would be nice if we could tap into that somehow from the admin panel, or give people who are Lucene ninjas access to set things like that and other parameters that can alter the "under the hood" operation of Lucene/Solr within Jive.

                                        • Re: Want to improve search relevancy in Jive?  Here's your chance!
                                          Nick Howe

                                          One of my biggest frustrations (at least in our 4.5.6.3 instance) is the apparent lack of weighting on title/subject.  I don't know how many times I've used quick search to try to find a document using words in the title, and have a list of quick search results appear that have no visible relevance to the words I typed.  I KNOW the document has "Academy Feb Meeting" as three of the four words in the title.  Why doesn't it appear in the quick hit list????

                                           

                                          I try and find similar results from this community and post (or maybe it is better in 5.0?)

                                          • Re: Want to improve search relevancy in Jive?  Here's your chance!
                                            karnold

                                            A one-line document (a placeholder, containing no useful information) in our internal Jive instance was returned first in the search results for a particular term, despite having the lowest star-rating, no "like"s and no tags. Someone added a comment to the document complaining about this, and now the comment is the top search result!

                                             

                                            To the user who pointed this out, I responded that the 13-word document contains 4 words that are in the search term and this high concentration may contribute to the document being highly ranked. Also hardly any docs on our site are rated, so ratings probably don't contribute much to rankings. His reply was very interesting and I hope it will help to improve search relevancy:

                                            I don’t think concentration alone should be the measure of quality. If you tell me you’ve got two articles which contain the search term once, one 20 characters long and the other 200, I’m pretty sure the 20 character article is going to be useless.

                                             

                                            This algorithm also favours comments over articles. [User cites a second example from our internal Jive site, where] a comment on the article that I am looking for is ranked higher than the original article. Both have one instance of the string "[search term]" (although my article also has an attachment called "[search term]"). I don’t think that is correct.

                                             

                                            I think there are a couple things that could be changed to improve the results:

                                             

                                            • prioritise articles over comments (I’m not sure I ever want to see comments in my results, but I appreciate that some might feel differently)
                                            • include absolute article size as another parameter, perhaps with an asymptotic value function so that this only affects small entries (this might be sufficient to solve the point above)
                                            • use rating as a search parameter (if you build it, they will come – this could be a good way of ensuring that good content bubbles up to the top of search results).
                                            • Re: Want to improve search relevancy in Jive?  Here's your chance!
                                              karnold

                                              A new use case that I don't think anyone's covered yet...

                                              1. Using spotlight search on Jive community.

                                              2. Query is:

                                                 search engine ranking.

                                              3. Expecting to find https://community.jivesoftware.com/thread/161107.

                                              4. Spotlight results after typing "search" (without quotes) includes one of the messages in the thread I was looking for (ranks it #1).

                                                 Results after typing "search engine"  did not include anything from the thread I was looking for.

                                              spotlight.png

                                                 Results at the end of typing "search engine ranking" included a different message from the thread I was looking for.

                                              5. My reaction to these results?
                                                 At 2 out of 3 stages I was provided with a link that would have got me to the right place, so on the whole not bad.

                                               

                                              How could this search have gone better? Firstly there's no point including messages in results. Better to link to the top of the thread, because people usually need to read the whole thing anyway to understand the context of the message that the search returned. Secondly why did the correct result disappear and then reappear? Some of our users have noticed this tendency and it's unnerving. (On 4.5, just typing a space makes a difference - but on the Jive community it doesn't, so I assume that changed in Jive 5.)

                                               

                                              A related and even stranger case is the behaviour of the search box for choosing where to move content to on Jive 4.5. Start typing the name of a space that you know exists, and in the beginning it returns reasonable results that include the space. However if you continue typing the space name and it contains a short word (e.g. "for", "and") there are suddenly no results displayed at all! Finish typing the name, and the correct space reappears. (Does this still happen on Jive 5?)

                                                • Re: Want to improve search relevancy in Jive?  Here's your chance!
                                                  Karl Rumelhart

                                                  Thanks for the post.  This is exactly the sort of example that is super helpful!

                                                   

                                                  Let me make a couple comments that may be useful in interpreting what could be going on.

                                                   

                                                  First, regarding returning replies or comments rather than the main document:  as you can imagine, often a particular search query will return multiple results from the same thread.  Rather than returning a ton of closely related results, the system tries to decide which is the "closest" result from the thread and only display that one.  Always returning the parent would be a reasonable policy, but currently the highest relevance result is the one selected.

                                                   

                                                  Regarding the issue of correct results appearing and then disappearing: there have been reports of this, largely in the context of at-mention.  I believe that what is happening is the following: every time you type a character, a search query is issued.  E.g. if you at-mention  mike  first there is a query for m* then for mi*  then for mik* and so forth.  As it happens, smaller queries are actually much more costly for the search infrastructure so it can happen that a search for m* actually takes much longer than a search for mike* and, especially if you are typing fast, the results actually come back out of order.  Unfortunately, there is a problem with the UI where it doesn't properly account for that possibility and so it can happen that one of the older queries will overwrite a newer one.   This is supposed to be handled better in an upcoming maintenance release.  Of course, I don't know if this is necessarily what is going on in your case, but it would in interesting if you can play around with it a bit and see if the phenomena you experience is consistent with this hypothesis.

                                                • Re: Want to improve search relevancy in Jive?  Here's your chance!
                                                  Gabriel Bruck

                                                  One situation I find difficult is when I'm searching for a top level space like ebusiness, but in the spotlight search I get results like ebusiness product management, ebusiness XYZ project and other groups that have ebusiness and some modifier behind it. But, just plain ebusiness doesn't show up. When I hit enter and look at the full results, I have to go to the places tab and scroll down quite a ways to find it.

                                                   

                                                  It would be fine if the ebusiness product management group didn't show up; I could add the words "product management" in my search to find it. But there's nothing I can do with just "ebusiness." I think groups and spaces like this should appear higher up in the results.

                                                  • Re: Want to improve search relevancy in Jive?  Here's your chance!
                                                    LG .

                                                    The search engine should learn. Or at least remember the last 5 search terms and the last result which was clicked. So if one searches for "Jive 6" and clicks Jive 6 Has Arrived then the next search (likely one other day) should return Jive 6 Has Arrived as the first result.

                                                    Of course it would be better if it would learn and remember which result was clicked for a specific search term, both user specific and also for the whole community using different weights.

                                                    • Re: Want to improve search relevancy in Jive?  Here's your chance!
                                                      Tracy Maurer

                                                      BTW, answer to your High-5 is "Facts of Life".