Skip navigation


3 Posts authored by: britchie

We just released Clearspace 1.0.3 (download, README, changelog).  Like version 1.0.2, this is a bug fix release in which we fixed 32 bugs and made some very minor improvements. Overall, work continues on new features and we'll detail them soon.


One of the areas we focussed on for 1.0.3 was LDAP integration. We fixed quite a few issues uncovered by customers and ourselves. We've also recently made the Clearspace issue tracker public so feel free to browse the issues and vote for any that may be affecting you.


Wiki syntax challenges

Posted by britchie Jan 19, 2007

Out of all the things that I've had a hand working on for Clearspace (and that's a lot) the one that I've found to be the most challenging and frustrating to implement would have to be the new wiki syntax support. As many of you are aware wiki syntax is a very useful (and fast) way to markup text to create things like lists, tables, links and styles without having to type html. We looked at a variety of existing wiki markup syntaxes and decided on a syntax that was as common as possible. It's very fast to use once you've know the syntax - and potentially a deal breaker to new users who don't want or have the time to learn the syntax no matter how simple it may be. Thus in Clearspace we decided it would be best to allow documents to be created in one of two ways - using a graphical editor (IE or Firefox) or a plain text editor with preview functionality.


[existing wiki markup syntaxes|]


Since users will have the ability to switch between the two editors we needed a way to convert the html content generated from the GUI editor into the wiki syntax supported by the plain text editor and vice versa. It was this requirement that proved to be the source of many frustrating hours stepping through code trying to isolate yet another bug.


Not surprisingly we couldn't find an open source wiki syntax implementation that met all of the requirements so we had to roll our own. Our existing rendering solution found in Jive Forums proved not to be up to the task so we had to design one from scratch to meet the new requirements. Before that though we had to evaluate and test a variety of html parsers to facilitate the conversion from the html syntax generated from the GUI editor back into the wiki syntax. We settled on a flexible and extremely fast parser from the open source Sitemesh project which we modified slightly to suit the requirements. Spending the time to research and choose the right tools proved to be a life saver - it allowed us to handle the "wikification" of the html in a very clean manner.


Once the tools were chosen we created the API for the new render system and started writing the implementation. As you can see from the following code the general use case is fairly straightforward for generating html from the stored wikified text:

RenderManager rManager = jiveContext.getRenderManager();
String htmlText = rManager.render(doc, RenderType.DOCUMENT_BODY, wikifiedText);

Going the other way from html to wiki text requires a bit more work but is also fairly straightforward:

RenderStrategy strategy = new RenderStrategy(
RenderTarget.TARGET_PLAIN_TEXT, RenderStrategy.RENDER_ALL);
RenderManager rManager = jiveContext.getRenderManager();
String wikifiedText = rManager.render(doc,
RenderType.DOCUMENT_BODY, strategy, htmlText);

Getting it all right though required a lot of time and a large number of test cases to work out the bugs. Without these test cases I'm sure some of the more subtle bugs that cropped up would never have been caught. I'm very proud of the outcome of all this work - it's going to be a great feature as well as the basis of quite a bit of new functionality in future releases of Clearspace.


Searching in Clearspace

Posted by britchie Jan 10, 2007

One of the areas of research that has always fascinated me is searching and linguistics - specifically machine comprehension of human questions. The current level of technology in this field is both very advanced (i.e. Google) and yet at the same time it can be very limited (machines do not really understand the question - it's all algorithms, patterns and the like). In Clearspace we've updated our search code to try and take advantage of as much of the available search technology as we had time to incorporate.


As is true with all our products our core search technology is based upon the excellent Lucene search library which we updated to the latest release to gain some new features and benefits, as well as the usual set of bug fixes. New in Clearspace however is a completely redesigned API around Lucene which provides some clear benefits to what we had available previously. I'd like to highlight a few features that we've added that I think are noteworthy.


Combined search API This means that you can search blog posts, forum messages and wiki documents at once in the same call to the API (or any combination of those). While this may not seem all that much of an improvement it is in fact quite an improvement over how searching was accomplished in the Integrated Server product (our only other product that had multiple content types to search over). When you execute a general search in the Integrated Server you are in fact executing multiple searches at the same time over two separate Lucene indexes - one for kb content, one for forum content. This approach has consequences on performance and flexibility on how search results could be displayed. With the new approach it's faster and provides the ability  to simply execute a search and display the results irregardless of the content type of the result.




Find Similar searches We've built into the new API the ability to query on any blog post, message or document to find other content in the system that is similar to the source content object. This is a feature that we've taken full advantage of in many places in the UI to display 'More Like This' type links, helping to automatically link content together.



Pluggability While we do our best to make the searching that is built into Clearspace the best that we can make it we understand that corporations often have existing search implementions that they will want use. Thus we've adopted two approaches that we feel will cover most requirements in this regard. The first is that searching is webservice enabled which allows corporations to easily search Clearspace content from external applications. The second is that the whole core implementation of search in Clearspace is completely pluggable so that if you had a Google search appliance it's quite possible (with some coding of course) to replace the built-in Lucene implementation with one which hooks into your Google appliance.


Distributed searching The search implementation in Clearspace has been written in such a way that we'll be able in the future to allow customers to setup the search system in such a way that they can define a seperate server (or servers) that will be delegated solely to searching. Or, if they do not want to do that they'll have the ability to have search queries be executed in the normal cluster by the server that happens to be the least busy at the moment. While I had only a hand in this work (most credit from this must go to Gaston Dombiak who is probably best known for his work on the Wildfire XMPP server) I think this feature is perhaps one of the technically interesting features we've added. Unfortunately, given time constraints distributed searching will not make it into the initial release of Clearspace - look forward to word of it in future releases.


We have a lot of ideas for future improvements we can make to searching in Clearspace - hopefully I'll be able to find the time to blog about some of those in the future!


Filter Blog

By date: By tag: