    Searching in ClearSpace returns irrelevant (or poorly relevant) search results

      We use ClearSpace 1.8 with Oracle.



      An example:  I posted an organization chart in PDF format in a Space.  I tagged it.  I gave it a very descriptive name and a verbose description.  The document is in a Space that has no access restrictions.




      Document Name:  Information Technology and Product Development Organization Chart




      Description:  This organization chart represents the current IT structure as of October 2007




      Tags:  it, org_chart, organization, organization_chart, visio, it_organization, 2007




      When I search for "Org chart" or "organization chart" or "visio" or "IT org chart" or any number of other combinations ClearSpace does not return this document in the list of search results.   ClearSpace returns what I'd consider very irrelevant results.  In fact, it will return results that have the word "org" in a URL (i.e. a document has a URL noted in it that happens to contain the sub-string "org" somewhere in the URL) before files it finds that actually contain the whole word "org".  



      This is VERY FRUSTRATING because the search box is the first place people go when looking for something that doesn't have an obvious place on our wiki.  I have other examples but this one really highlights the issue.



      What can we do to improve the relevancy of search results? What is Jive doing to improve this?

          When searching, tag names must be an exact match.  So, searching for "org chart" won't match against the tag "org_chart".  Searching on the exact tag names or multiple words from either the document description, name or contents should return the document more consistently.  Also searching on "org" by itself shouldn't match any of the descriptive information that you've provided for the document you described.



          Have you reviewed the search tips that are linked near the search box you will find on search result pages?   There are a few useful features mentioned there that will help you get more specific results.



          It looks like you might be familiar with this from the searches in your posts, but it might be worth mentioning that searching using quotes like this:



              "product development organization chart"



          as opposed to without quotes like this:



              product development organization chart



          should return more specific results

              If I search for: "Information Technology and Product Development Organization Chart" (w/o quotes) which is the title of the file, the file does not appear in the search results.  If I put them in quotes which should give one and only one result (because there is only one file with that exact name) I get zero results.  If I search for any of the tags (verbatim)...the file does not appear in the search results.  It's as if this file has disappeared from the search engine but it is there, in a Space with no access restrictions.  When I originally posted this file it would appear in search results but over time it simply stopped appearing.  This looks like a bug and we have other examples of the same thing happening.






              hi Lee,


              A search for 'org chart' given the document information you have above, shouldn't result in that document being highly relevant because you haven't include the word 'org' (by itself) anywhere in the document (we don't tokenize underscored tags into separate words AFAIK).  If you do a search for 'organization chart' it should be one of the top results since you have 'organization chart' in the title and the body of the document.  If you do a search for 'visio' it should be a result, but it may not be a top result since you've only tagged the document with 'visio', but you didn't include the word 'visio' in the document description or title.  The last search, 'IT org chart' might be a hard one for our search engine since I'd bet that the word 'IT' is filtered out as being a stop word, so then you'd be left with 'org chart' and we already established that the word 'org' doesn't appear anywhere in the document. 


              With all that said, we've heard from a number of customers that they want search to work 'just like google', which I take to mean "make it work fast" and "make it return relevant results".  We're definitely working to make it faster and return more relevant results.