Today we're announcing the Open Source release of Tasmo, a key part of Jive's cloud architecture. An early snapshot is available now on Github. We believe Tasmo is an important step forward in web-scale architectures, so we have made Tasmo available under the Apache 2.0 license and look forward to growing a vibrant Tasmo community.


As part of the continual evolution of Jive's core infrastructure, we've been heavily investing in new service-oriented technologies. The engine at the heart of this evolution is a service we call Tasmo. Think of it as a replacement for a relational database, and for many use cases it offers some significant advantages.

Jive has followed the same general evolution as much of the industry with respect to how we store and retrieve data in our applications:

Stage 1 --- Store and retrieve all data from a relational database.

Stage 2 --- Make more and more extensive use of sharding and distributed caching.

Stage 3 --- Use purpose specific, non relational storage technologies. In our case one of them is HBase, now in several of our production systems.

Tasmo is at the center of stage 4 of our systems evolution. It allows developers to declare their logical data model and handles the details of the underlying storage in HBase tables. Equally important, Tasmo effectively performs at write time the selects and joins that would traditionally be done at read time in a traditional database-backed web application. This translates into far less work being done while loading a page and therefore vastly improves application response time. Essentially, Tasmo performs the same function as materialized views in a relational database -- just without the database.


Our motivation for Tasmo stemmed from the knowledge that in our web application we display the same interconnected data in quite a few different ways, and that the ratio of reads to writes in our application is at least 90/10 in typical installations. Before Tasmo, the many different views of the same data resulted in a lot of different joins performed at read time. The high read to write ratio resulted in performing those joins redundantly many times over. It seemed obvious that in a read heavy system it would make sense to shift work to the write side and optimize for reading.




Tasmo based applications declare a set of events and views, where the events are the form data will be written and the views are the multiple forms it will be read. A particular view constitutes the joining and filtering of event data that is maintained at write time. Here are a couple event declarations and a view declaration:




A given event declaration is just a type name and a set of fields.  These fields can either hold literal values or references to other events. A group of event declarations which point to each other via reference fields form a structural graph. A view declaration roots on one event type and selects whatever fields of the root event are needed. A view declaration also spans reference fields in order to fold in data from related events. In other words, declaring a view involves choosing paths through the structural graph and selecting fields of the nodes at the ends of those paths. This is all performed at application design time.


How It Works


At runtime, applications fire instances of events to Tasmo. An event instance conforms to some declared type and carries the identity of the object to modify and one or more populated fields which hold only the updated state. The Tasmo service itself ingests batches of these events and writes down their values in HBase tables - both the literal values and the references. It then consults the view declarations to see what view instances it needs to create or update to reflect the changes made by the event. In order to do this, Tasmo traverses the stored object graph and writes down the modified view information path by path into the HBase table which holds the final output of the write processing.




When it comes time to read a view instance, an application just supplies the type of view to read and the identity of the root object to a library which knows how to read the view table. That library does a single column scan of a single row which holds the full tree of data that is the view instance. Under load, this is often orders of magnitude faster and more scalable than performing the equivalent database queries we used before. The resulting code path is also noticeably simpler - with no distributed caches sitting on top of the data and no relational to object mapping code.


Check It Out


Tasmo is available now as an early snapshot. There are modules which bootstrap all the parts together for writing and reading, as well as a module which allows you to just run it all in one process. Feel free to check out the code in our repo and examples and explanations in our wiki. Over the next several weeks we will be building out documentation, examples, as well as continuing to push many bug fixes and improvements. There is still a long road to version 1.0, but we'd welcome feedback and contributions!


Also, check out the SlideShare presentation for more details: