In part 2 of this series we outlined the desired features for the next generation of Jive SBS's caching technology. Today we'll go over why we selected the solution that we did as well as what we did to overcome a couple of issues that threatened to derail the project.
The prototype and decision phase was perhaps the most challenging aspect of the project as opinions differed as to the whether benefits of one solution outweighed it's limitations compared to one of the other solutions. Once preliminary prototyping was complete it was apparent that Ehcache was not going to be the solution that was going to work for us. While Ehcache works great as an in-process cache we decided that it would require too much work to implement the desired cache server functionality. We did look into both Ehcache's cache server and decided against it for performance reasons as we didn't believe (and tests proved out) that a REST-based solution would perform to the standards we required. Very promising but finally decided against was the Ehcache/Terracotta integration that had been recently released which was rejected on a number of factors including (but not limited to) cost, complexity and solution maturity for our specific use-case. With Ehcache out of the running the bake off proceeded between the highly regarded Memcached distributed caching solution and Voldemort. In the end both proved to be stable, very high performance distributed systems.
Why we selected Voldemort
The choice of caching solution came down to features and expected ease of implementation and integration. In those metrics Voldemort won for the following reasons:
- Voldemort's design incorporates versioning directly in it's architecture allowing for resolving eventual-consistency related conflicts in a simple manner using vector clocks and read-repair. There is no concept of versioning in Memcached beyond what you yourself implement using key naming tricks.
- Voldemort's implementation is pluggable in many places. In fact, it's only because of this fact that Voldemort was even considered in the first place as it was a very simple matter to create a custom caching store to use in place of other persistent stores that Voldemort ships with. This means that in the future if there is a piece of Voldemort that we would like to change it's likely that there is a mechanism available to do that without having to modify the source code of the application.
- Having the ability to have separate stores meant that we could have a store per cache allowing for an easy implementation of clear(). Memcached does not have this and clients have to resort to key tricks to work around this limitation
- A strong Java client supporting pluggable serialization, client-side routing of requests and the ability to specify required and preferred values for how many servers to read and write to for any particular request.
- A strong developer community with timely releases incorporating highly desired functionality (like the newly released rebalancing feature in Voldemort 0.70). This helps offset the fact that Voldemort is still a work in progress and doesn't incorporate all the features outlined in Amazon's original Dynamo paper.
Deciding on using Voldemort over memcached on features alone was one thing; overcoming internal scepticism that a (normally) persistent key-value store was a better solution then a dedicated caching solution like Memcached was another thing altogether. The discussion generally was resolved once it was understood that while Memcached is an excellent distributed cache it's extremely simple. Any additional functionality is left to clients to implement and the available Java clients did not nearly match in terms of desired features compared to Voldemort.
The issues that almost derailed the project
Finalizing the choice of the caching server that we would use for the next generation of caching for Jive SBS helped us move forward to solving the next set of issues. Two main issues arose shortly thereafter that threatened to derail the project. The first issue that arose was that the caching store implemented for the Voldemort prototype was shown to have some highly undesirable behaviour under certain circumstances. The second issue was that we didn't have a serialization mechanism in place beyond Java's built in serialization and performance was suffering significantly because of that.
The first issue - the Voldemort cache store issue - took some understanding of the causes of the problem. The initial implementation had used the google collection framework to create a soft reference based concurrent map. It was thought that using soft references would be enough as Java's garbage collector would only clean up soft references only when memory had to be made available. That worked well until the system happened to run a full GC (potentially because of memory fragmentation) which immediately caused all soft references to be dropped effectively emptying every cache in the system. To work around this highly undesirable behaviour a new solution was introduced where memory-based evictions would only start to occur once the used heap size reached a certain percentage of the maximum heap size. With some tuning of code and JVM arguments the system stabilized such that garbage collection tended to occur just about the same time that memory based evictions started occurring. This naturally leads to a fairly balanced state where the amount of memory used for caches stays fairly steady and the garbage collection system is happy and rarely if ever runs has to run a full GC (which would cause all cache operations to stop on that node till the GC operation completes). This was tested under heavy load over many days with well over a 100 million cache operations to verify the long term stability of this design.
The second issue - the serialization issue - proved to be vexing. We didn't want to use Java's serialization mechanism for performance reasons and we didn't want to have to resort to requiring cacheable objects to implement Externalizable with it's problems as outlined above. We took a long look at quite a few of the available serialization libraries include protocol buffers, thrift, avro among others however none of them provided a simple and transparent object graph serialization that we were looking for. It was only after a lucky happenstance that we discovered a serialization library called Kryo that we found a solution that we thought would work. Kryo is an easy to use serialization library for Java that has little to no mindshare in the Java community (at least not yet). it was risky to use an immature library for such an important portion of the caching solution but Kryo had a few things going for it. First, the codebase was small and the design was excellent. Secondly, the primary developer of the library was very open to bug reports, improvements and feature discussions. Thirdly, it was very simple to incorporate and use the library in the new caching solution. Lastly, the performance of the library was excellent and the size of serialized objects was very small. During the implementation phase we did uncover bugs in Kryo which were promptly fixed however this was expected and was factored into the timelines for the project.
The requirement that didn't make the cut
Unfortunately serialization performance, while excellent, was still not fast enough to be used for local and near caches. We uncovered this in the extensive performance testing we did as part of the 4.5 release and we were understandably disappointed that we had to abandon one of the requirements for the new cache system. We did however put into place mechanisms so that developers could change one setting and reenable serialization for all caches thus allowing for serialization testing without having to start a cluster.
With the cache work complete and being incorporated into Jive SBS 4.5 we can reflect on the benefits that the new caching solution will provide for us and our customers
- Increased stability of Jive SBS clusters driven by the cache server changes
- Reducing administration requirements and better cache utilization
- Increased developer productivity with a reduction or elimination of serialization bugs
- Simpler cache configuration yet at the same time being flexible and expandable
- Virtualizing Jive SBS clusters is now possible
We're very excited to have our customers upgrade and enjoy the benefits of the new system!