Distributed System Building Block: Flake Ids

This is a blog about the development of Yeller, The Exception Tracker with Answers

Read more about Yeller here

Many distributed systems have a requirement to generate time sorted, unique ids of some kind - for distinguishing incoming events, for resolving conflicts, for using as keys in key/value stores, for logging, and a whole bunch more.

In many applications, you can simply use an autoincrementing id in Postgresql or Mysql. But for larger applications, having ids be dependent on the database can get really painful:

  • it introduces contention - your entire app blocks on a single lock inside a single database server.
  • scaling to more than one id generating node can be painful

So what properties do we really want for our unique id generation?

  • id generation should be unique
  • id generation at a node should not require coordination with other nodes
  • ids should be roughly time ordered

Fortunately, distributed systems literature solved this ages ago, and there are a number of well designed open source implementations of the solution: flake ids

Flake Ids

The basic idea behind flake ids is simple: instead of incrementing a counter each time you need an ID, use some of the top bits in an id to represent time, and then some others to represent a “node id”, such that id generation across nodes is unique. The wonderful thing about the node id is that you can just coordinate once - very often just by writing to config files inside your orchestration tool (chef/puppet/ansible/etc).

The tradeoff, of course is that your events are no longer exactly time sorted. Of course, this is a misnomer in distrubted systems anyway - clocks are never perfectly in sync, and so knowing an exact ordering of events requires coordination of all events through a single place. That brings in all the scaling problems detailed above. Another nastier tradeoff is that you’re now dependent on system time being synchronized across nodes, which means you have to monitor clock skew.

However, for many practical purposes, flake ids work very, very well.

Often you’ll include a sequential counter in there (per process), so that you can generate more than one id per timespan. The flake id generator will guarantee that no single process will output the same sequential counter within the same millisecond

a sample scheme might be (stolen from noeq):

  • 41 bits for time in milliseconds since the epoch
  • 10 bits for node id
  • 12 bits for the sequential counter

Note that there’s nothing saying what size the id is in here. The size of flake id you choose has implications on storage space and effeciency. Larger flake ids mean you can generate more in a given timespan (some schemes limit you to e.g. 2^16 ids per millisecond).

Many flake id schemes fit within 64 bit numbers, but quite a few others go out to 128 or even 160 bits. The size you choose depends on what your requirements are - trading off storage space for ids against the number you can generate in a given time interval.

Improvements

One good improvement for implementation is to have the id generation process keep a track on the current system clock and block on issuing ids in case the clock moves backwards (NTP can and will do this).

To improve space usage, a neat trick many flake id schemes use is a custom epoch - instead of counting milliseconds since the standard unix epoch, they pick one that’s significantly later (all you really need to do is pick a time that’s before you started to generate flake ids). Noeqd and Snowflake for example, use March 2006

Fin

Flake ids are dope. They solve a whole pile of distributed problems in a neat and efficient manner, and aren’t hard at all to implement. Most of the time, you can just use one of the open source implementations:

This is a blog about the development of Yeller, the Exception Tracker with Answers.

Read more about Yeller here

Looking for more about running production applications, debugging, Clojure development and distributed systems? Subscribe to our newsletter: