Abstract Your Key Value Store

This is a blog about the development of Yeller, The Exception Tracker with Answers

Lots of applications these days work with a key value store directly. Riak, redis, zookeeper and so on all pretty well satisfy a key value store. There is a lot of architectural value in abstracting what key value storage you’re using, your software can migrate between storages more easily, it can run tests without needing the real storages there, it can combine storages to build new abstractions (or existing storage). Of course, as with any decision like this, there are tradeoffs - you limit your code to using the common subset of operations that the storage implementations you use provide.

The abstraction of key value storage is a big win for the artifact that your writing of the software produces, as well as being a reasonable win for the author of the software.

Yeller uses liza for abstracting away key value storages. Liza just provides a set of intentionally small protocols. Storage backends can implement as many of them as they can support, but the protocols are intentionally split apart so that storages that can’t support particular properties aren’t required to. The ideas in this post aren’t particular to liza at all, but the huge wins you can get from abstracting your key/value storage.

Compositional Storage

What I’ve found the most use for with abstract key value storage, is the ability to compose storages, or wrap them in particular ways. For example, Yeller’s api stores active auth tokens in riak (so that looking them up is relatively cheap), but even that puts too much pressure on riak at high throughputs. So, there’s a bucket wrapper that wraps an underlying bucket with a guava cache (with metrics for the cache). This can then be used against any underlying storage without too much difficulty.

Datomic internally does something pretty similar to this, though they (as far as I know) have a simpler still interface that just requires get/put/delete, and says that values are always bytes (I think), and keys are always strings (they might be uuids, I don’t know). They get away with that because they only ever write immutable data under uuid keys. I’d speculate that their hierarchical in memory cache (caching both serialized and deserialized storage chunks), and their use of memcached just look like a single store to most of the code, and that that each level of caching wraps storage. Likewise, datomic’s process model requires that it has the ability to CAS/consistent read with one bit of storage, and the ability to do k/v storage on another bit, but those two do not have to be the same backend (which is how you can build a consistent database with one piece of eventually consistent storage, and one piece of consistent storage).

Migration

The obvious advantage of abstracting storage is that you can switch out storages in production. A more advanced piece to this is that you can build migration between two storages in production, by providing a new storage that, for example does background writes to your new storage, tries reading from the new storage, and if that fails tries reading from the old storage. The rest of your application doesn’t have to know that there are two storages, it just keeps on using the key/value abstraction it’s always been using.

Yeller will likely lean on this, if it ever offers an “appliance” offering for companies that need more security than a hosted service can provide - companies sure don’t want to set up a 5 node riak cluster, so being able to use a more suitable storage in different environments is an obvious win.

Testing

A win for the author of the software who uses key/value abstraction like this is that your tests don’t need the storages available (you can run purely in-memory). Liza ships with an in-memory bucket (just backed by a clojure (atom {}), which works amazingly well for testing. Liza also says that storages should do their own binary encoding, and the in memory storage doesn’t do binary encoding at all, so tests are even faster.

Tradeoffs

As with any engineering decision, there are definitely tradeoffs when abstracting your key value store. The most important one is that your abstraction might limit the set of operations you can use against storage. A great example of this is working with redis - the command list has hundreds of commands, and restricting it to a key/value store for bytes means you aren’t really using it very well.

Example Backend: Riak

The original motiviating use case for yeller was abstracting out it’s use of riak. Working with riak can be tricky - you have to resolve conflicts, do serialization etc. Yeller pushes all of those into a storage implementation. Application code provides a merge function, and sometimes custom serialization logic, but most of the application code doesn’t have to use that anywhere, or indeed know anything about riak at all.

This is a blog about the development of Yeller, the Exception Tracker with Answers.

Looking for more about running production applications, debugging, Clojure development and distributed systems? Subscribe to our newsletter: