Skip navigation

Jive Release Blog

3 Posts authored by: britchie

In part 2 of this series we outlined the desired features for the next generation of Jive SBS's caching technology. Today we'll go over why we selected the solution that we did as well as what we did to overcome a couple of issues that threatened to derail the project.

 

The prototype and decision phase was perhaps the most challenging aspect of the project as opinions differed as to the whether benefits of one solution outweighed it's limitations compared to one of the other solutions. Once preliminary prototyping was complete it was apparent that Ehcache was not going to be the solution that was going to work for us. While Ehcache works great as an in-process cache we decided that it would require too much work to implement the desired cache server functionality. We did look into both Ehcache's cache server and decided against it for performance reasons as we didn't believe (and tests proved out) that a REST-based solution would perform to the standards we required. Very promising but finally decided against was the Ehcache/Terracotta integration that had been recently released which was rejected on a number of factors including (but not limited to) cost, complexity and solution maturity for our specific use-case. With Ehcache out of the running the bake off proceeded between the highly regarded Memcached distributed caching solution and Voldemort. In the end both proved to be stable, very high performance distributed systems.

 

Why we selected Voldemort

 

The choice of caching solution came down to features and expected ease of implementation and integration. In those metrics Voldemort won for the following reasons:

 

  • Voldemort's design incorporates versioning directly in it's architecture allowing for resolving eventual-consistency related conflicts in a simple manner using vector clocks and read-repair. There is no concept of versioning in Memcached beyond what you yourself implement using key naming tricks.
  • Voldemort's implementation is pluggable in many places. In fact, it's only because of this fact that Voldemort was even considered in the first place as it was a very simple matter to create a custom caching store to use in place of other persistent stores that Voldemort ships with. This means that in the future if there is a piece of Voldemort that we would like to change it's likely that there is a mechanism available to do that without having to modify the source code of the application.
  • Having the ability to have separate stores meant that we could have a store per cache allowing for an easy implementation of clear(). Memcached does not have this and clients have to resort to key tricks to work around this limitation
  • A strong Java client supporting pluggable serialization, client-side routing of requests and the ability to specify required and preferred values for how many servers to read and write to for any particular request.
  • A strong developer community with timely releases incorporating highly desired functionality (like the newly released rebalancing feature in Voldemort 0.70). This helps offset the fact that Voldemort is still a work in progress and doesn't incorporate all the features outlined in Amazon's original Dynamo paper.

 

Deciding on using Voldemort over memcached on features alone was one thing; overcoming internal scepticism that a (normally) persistent key-value store was a better solution then a dedicated caching solution like Memcached was another thing altogether. The discussion generally was resolved once it was understood that while Memcached is an excellent distributed cache it's extremely simple. Any additional functionality is left to clients to implement and the available Java clients did not nearly match in terms of desired features compared to Voldemort.

 

The issues that almost derailed the project


Finalizing the choice of the caching server that we would use for the next generation of caching for Jive SBS helped us move forward to solving the next set of issues. Two main issues arose shortly thereafter that threatened to derail the project. The first issue that arose was that the caching store implemented for the Voldemort prototype was shown to have some highly undesirable behaviour under certain circumstances. The second issue was that we didn't have a serialization mechanism in place beyond Java's built in serialization and performance was suffering significantly because of that.

 

The first issue - the Voldemort cache store issue -  took some understanding of the causes of the problem. The initial implementation had used the google collection framework to create a soft reference based concurrent map. It was thought that using soft references would be enough as Java's garbage collector would only clean up soft references only when memory had to be made available. That worked well until the system happened to run a full GC (potentially because of memory fragmentation) which immediately caused all soft references to be dropped effectively emptying every cache in the system. To work around this highly undesirable behaviour a new solution was introduced where memory-based evictions would only start to occur once the used heap size reached a certain percentage of the maximum heap size. With some tuning of code and JVM arguments the system stabilized such that garbage collection tended to occur just about the same time that memory based evictions started occurring. This naturally leads to a fairly balanced state where the amount of memory used for caches stays fairly steady and the garbage collection system is happy and rarely if ever runs has to run a full GC (which would cause all cache operations to stop on that node till the GC operation completes). This was tested under heavy load over many days with well over a 100 million cache operations to verify the long term stability of this design.

 

The second issue - the serialization issue - proved to be vexing. We didn't want to use Java's serialization mechanism for performance reasons and we didn't want to have to resort to requiring cacheable objects to implement Externalizable with it's problems as outlined above. We took a long look at quite a few of the available serialization libraries include protocol buffers, thrift, avro among others however none of them provided a simple and transparent object graph serialization that we were looking for. It was only after a lucky happenstance that we discovered a serialization library called Kryo that we found a solution that we thought would work. Kryo is an easy to use serialization library for Java that has little to no mindshare in the Java community (at least not yet). it was risky to use an immature library for such an important portion of the caching solution but Kryo had a few things going for it. First, the codebase was small and the design was excellent. Secondly, the primary developer of the library was very open to bug reports, improvements and feature discussions. Thirdly, it was very simple to incorporate and use the library in the new caching solution. Lastly, the performance of the library was excellent and the size of serialized objects was very small. During the implementation phase we did uncover bugs in Kryo which were promptly fixed however this was expected and was factored into the timelines for the project.

 

The requirement that didn't make the cut

 

Unfortunately serialization performance, while excellent, was still not fast enough to be used for local and near caches. We uncovered this in the extensive performance testing we did as part of the 4.5 release and we were understandably disappointed that we had to abandon one of the requirements for the new cache system. We did however put into place mechanisms so that developers could change one setting and reenable serialization for all caches thus allowing for serialization testing without having to start a cluster.

 

cache-illustration.png

 

With the cache work complete and being incorporated into Jive SBS 4.5 we can reflect on the benefits that the new caching solution will provide for us and our customers

 

  • Increased stability of Jive SBS clusters driven by the cache server changes
  • Reducing administration requirements and better cache utilization
  • Increased developer productivity with a reduction or elimination of serialization bugs
  • Simpler cache configuration yet at the same time being flexible and expandable
  • Virtualizing Jive SBS clusters is now possible

 

 

We're very excited to have our customers upgrade and enjoy the benefits of the new system!

In part 1 of this series we outlined the issues we had with the 4.0 and earlier cache system design and implementation. Today we'll outline the desired features for the next generation of Jive SBS's caching technology.

 

The desired features

 

After some debate the major desired features were narrowed down to the following list:

 

  • Caches should be run in their own process. No longer should clustered caches run in the main application server process.
  • Caches should be able to be distributed across multiple cache servers to eliminate any single point of failure.
  • High availability of cache data. Often this means replicating cache data across multiple nodes such that losing a single node doesn't mean losing all the cached data on that node.
  • Fault tolerant. If the whole caching system goes down the application can still run - though performance will understandably suffer under these circumstances
  • Reliable. The caching solution should be reliable in the face of server and network issues and can handle data consistency issues that can arise from server or network flap, write races, etc.
  • The cache api should not be Map based. Instead, caches operations should largely be limited to get(key), put(key), remove(key) and clear().
  • Serialization should be transparent in the majority of cases and shouldn't require developers to implement methods for every single class that is to be cached.
  • Individual caches shouldn't require a maximum size to be specified. Instead, the cache server should just figure out which objects to evict across all caches based on normal cache eviction policies such as LRU (Least Recently Used), second chance FIFO, etc.
  • Determining the size of cached objects if required at all should be handled by the caching system itself and not in the classes to be cached.
  • All cache operations must operate on serialized data.
  • Eventually consistent. As Jive SBS grows and scales up having every cache be coherent across the cluster is not something that we see as consistent with our performance goals.
  • Source available. Having source code available when troubleshooting an issue can be of tremendous help in solving the issue in a timely manner.
  • TCP based. While UDP based caches work well for many other systems and applications it is our belief that a TCP based solution will work best for Jive SBS. This is especially true when Jive SBS is run within a virtualized environment. This also means no multicast for node discovery.
  • Dynamic cluster membership. While typically a rare operation having to restart a whole cluster to add a new node isn't desirable
  • Performance should be as good as if not better then the current implementation in Jive SBS 4
  • If possible, the cache server should Java based. Our expertise is primarily in Java and for us it's highly desirable that any developer could jump in and quickly understand the code without having to learn a new programming language

 

With this list of requirements defined the process of implementing the new cache system began. Among the first thing that was decided was that since we didn't know which distributed caching solution we would use at the time (the list of potential solutions was quite long initially) we would take a provider based approach for the main caching solution and would use the decorator pattern to layer functionality on top of that cache. This allowed us to quickly prototype a number of potential solutions to find one that would be the best fit to our requirements.

 

The solutions we prototyped

 

Memcached being the most popular distributed caching solution in common use was an easy choice as one of the solutions to prototype. After much research and discussion it was decided that we would also prototype two other solutions, one being Ehcache and the second being a rather unorthodox use of Voldemort, a java-based persistent key-value store developed by LinkedIn. All potential solutions were open source and in production use at various companies.

 

In part 3 of this series we will outline the reasoning for our solution choice as well as what we did to overcome a couple of issues that threatened to derail the project.

At Jive we strive to satisfy our customers with the best social business software application that we can build. We have a constant drive to improve what we already have in addition to expanding the envelope and provide a unique social-based solution to common business problems. Our software, Jive SBS, has been a study in constant evolution ever since it's roots in Jive Forums with new functionality being introduced continuously and major architectural changes being made as required. One aspect of the application that hasn't seen a lot of changes over the years however has been the caching layer used by Jive SBS. Like most high performance server applications Jive SBS relies heavily on in-memory caching to improve performance and reduce load on the underlying database. Without this caching layer our application could not nearly do as much as it does nor scale anywhere near the loads seen every day in production by our customers. It's a testament to the initial design choices that the cache implementation had stood the test of time as well as it has. That being said over time it become apparent that there were limitations and problems that needed to be addressed.

 

The problem with map based caches

 

Caching (and clustering) in Jive SBS 4 is implemented using Oracle Coherence with each application server in the cluster acting not just as an application server but as a cache server.  The caches defined in Jive SBS (of which there are well over a hundred) are based on Coherence's distributed map architecture and benefit (and suffer) from that choice. Having caches be map based leads developers to start to treat the caches no differently than they would any other map. This allows new developers to get up to speed faster but can lead to performance issues, heavier than expected network traffic and other problems that tend to only show up when systems are under heavy load. More importantly design choices which worked well historically have proven to be problematic which has lead us to reevaluate those choices and indeed the full requirements for what Jive SBS's caching layer should look like. What we came up with in that reevaluation was a design that on the surface looked similar to what was in Jive SBS 4 but on closer inspection is significantly improved.

 

Why we redesigned the cache

 

As part of the cache system reevaluation we started by going over the concerns we have with the 4.0 and earlier design so we have a basis to compare any solution we came up with against. Outlined below is a sampling of the issues we discussed:

 

  • Having caches only exist in the same process as the application server has proven to be an issue. Caches by their very nature use a lot of memory and if not configured correctly can quickly consume all available memory in the JVM causing the dreaded OutOfMemoryError. This puts a strain on Java's garbage collector which has to balance normal app server traffic which creates a constant stream of many small but largely short lived objects with caches which tend to hold onto objects for a much longer period of time. While having caches be in-process was a great design decision historically it was not without significant risks and issues.
  • Every cache is sized independently. While this worked fine when there were just a few caches in the application as the number of caches increased various measures had to be put into place to make administration easier. These measures, including system wide defaults for small, medium and large installations helped but did not solve the underlying problem. Having individually sized caches meant that administrators had to experiment a fair bit with cache sizes to come up with a set of cache sizes resulting in the best cache usage patterns they could get while at the same time didn't cause overall cache usage to go above the recommended percentage of the maximum amount of memory given to the JVM. This pattern of balancing cache sizes was a common practice that administrators had to go through and lead to some caches being undersized while other caches being oversized leading to what amounted to 'wasted' memory.
  • The behaviour of caches was different for a single node install vs one with multiple nodes. The behaviour differed in that cached objects in a single node install were not serialized whereas objects in a multiple node installation most often were. This had two repercussions, the first being that developers developing on a single node would not see serialization problems until very late in the development process when they started testing in a cluster - or in some cases not at all if a object was rarely cached. The second repercussion of this behaviour is that the number of objects that could be stored in caches was dramatically different depending on whether the system was clustered or not, a rather surprising situation for many customers.
  • While important for the best performance of the system cache operations should not block web requests unnecessarily. Unfortunately all too often this was something that we saw in production when one node in a cluster had memory issues (a memory leak or improper configuration causing a full GC to occur). This tended to have a domino effect where problems on one node would start to cause other nodes to backup resulting in a cluster wide failure. There are possible mitigations to this in Coherence including timeouts (which throw Exceptions) and increasing the replication factor (so cached data is stored on multiple nodes) however neither has been used. Having a replication factor greater then one is a great idea however the increased load on the network always seemed to be an issue - especially on networks which were already strained and/or running on less then stable hardware.
  • Manual serialization. The serialization mechanism used in Jive SBS 4 and earlier is Coherence's version of the Externalizable interface in Java. It's extremely fast however it's prone to developer errors and omissions. This is especially true when classes underwent changes due to refactoring, feature improvements or other changes that resulted in new fields being introduced into the class but not included into the readExternal/writerExternal methods used to serialize and deserialize the object. This had the unfortunate behaviour in which updated code seems to work fine, it serialized fine, however when deserialized it was missing data corresponding to the newly introduced fields.
  • Manual determination of serialized object size. Originally, with just a few classes that would be serialized it was a fairly simple exercise to determine just how large a serialized version of the object would be without actually having to serialize the object. This information was used in part to determine the sizes of caches and when cache eviction would start to occur. The main issue arising from manually determining this is twofold: firstly it suffers from the same omission problem as described in the previous issue, and secondly, and possibly more importantly, it was often returning a size that wasn't close to the real size of the object. This resulting in caches thinking they were smaller then they really were leading on occasion to the dreaded OutOfMemoryError.
  • Caches were not transaction aware. While the vast majority of transactions commit successfully there are always transactions that for one reason or other fail and have to be rolled back. With caches not being transaction aware any cache puts that were made prior to the method call resulting in the transaction being rolled back were now available to every node in the cluster. This invalid data now would have to be manually rolled back out of the cache, a very tedious and error prone process which was often overlooked.

 

These and other issues lead us to the conclusion that a redesign was in order. Tomorrow in part 2 of this series we will go the features we wanted in the next generation of Jive SBS's caching technology and outline some of the solutions we prototyped and evaluated

Filter Blog

By date: By tag: