Laboratory: Doing Science In Production

This is a blog about the development of Yeller, The Exception Tracker with Answers

Read more about Yeller here

Feature flags are by far and away one of the best tools for distributed systems engineers. They’re a mechanism for letting you ramp up new code, basically a conditional that lets you say “this user can see this feature”, or “this event goes down this codepath”, and a mechanism to change which users are allowed through at runtime. They make rolling out new infrastructure less frightening, because you can turn it off if it’s misbehaving.

But sometimes you can’t take that risk. Sometimes your rewritten feature is so important to get right, you don’t want anybody seeing it, just it to run in the background and compare results against the existing code.

Imagine a permissions system, for a large well used website. These things are often full of cruft, and suffer from performance problems to boot. So you decide to rewrite it. But screwing up your rewrite has horrible consequences for your users, and you. They might see things they shouldn’t. You might publicly expose some of your customer’s data, and now face a lawsuit. Sure, you can test it for days, but testing isn’t against real user data. It’s not in your production environment.

Imagine if you could easily run new production experiments, without affecting the current codepath. You could rewrite your permissions system without any users ever seeing the results from the new system. And, you could measure. You could see that 10% of requests to the new system are incorrect. And you could examine those incorrect answers and fix them. All without your users noticing a thing.

Github had exactly this problem. They were rewriting their permissions system, for improved flexibility and performance. But rewriting permissions is hard. One screw up, and somebody has gained access to private data that they shouldn’t have.

Yeller had exactly this problem. I was rewriting its “new exception detection” feature, for improved flexibility and performance. But rewriting that feature was hard. One screw up, and some user has been emailed repeatedly about the same exception that they already knew about.

Github wrote and standardized around a library for doing production experiments: Scientist

I’ve written and standardized around a Clojure port of that library: Laboratory

Enter Laboratory

Laboratory lets you express experiments as simple maps:

(require '[laboratory.experiment :as science])

(def my-experiment
  {:name "widget-permissions"
   :use (fn [widget user] (check-user? widget user))
   :try (fn [widget user] (allowed-to? :read user widget))})

(science/run my-experiment widget user)

:use is the control - no matter what, calling run will return whatever :use returns. If :use throws, run will throw that exception. :try is the candidate - it’s result will never be shown to the user. If :try throws, the exception won’t be rethrown by run.

That’s the most basic, most useless experiment ever though. The important win comes in when you start publishing results:

(require '[laboratory.experiment :as science])

(def my-experiment
  {:name "widget-permissions"
   :use (fn [widget user] (check-user? widget user))
   :try (fn [widget user] (allowed-to? :read user widget))
   :publish (fn [result]
              (record-timing "widget-permissions-control" (-> result :control :duration))
              (record-timing "widget-permissions-candidate" (-> result :candidate :duration))
              (if (not= (-> result :control :value) (-> result :candidate :value))
                (log/warn "widget-permissions-mismatch" result)))})

(science/run my-experiment widget user)

publish takes a result, which contains the return values, and the durations of each side of the experiment. What you do with that is up to you. You can log it. You can write mismatches into the database. You can page admins on value mismatches. Whatever you like. Note that if either the control or the candidate throws an exception, their :value will be the exception that was thrown. You might want to log that.

Lastly, what good is an experiment if you can’t turn it off if it’s misbehaving:

(require '[laboratory.experiment :as science])

(def my-experiment
  {:name "widget-permissions"
   :use (fn [widget user] (check-user? widget user))
   :try (fn [widget user] (allowed-to? :read user widget))
   :enabled (fn [_ user] (is-staff? user))
   :publish (fn [result]
              (record-timing "widget-permissions-control" (-> result :control :duration))
              (record-timing "widget-permissions-candidate" (-> result :candidate :duration))
              (if (not= (-> result :control :value) (-> result :candidate :value))
                (log/warn "widget-permissions-mismatch" result)))})

(science/run my-experiment widget user)

:enabled takes the same arguments as science/run, and lets you determine if the candidate should run this time. This is ripe for a feature flag - letting you control how many times you try out the new widget system, and for which users.

Laboratory is up on Github and Clojars. I hope it’s useful for you.

Fin

That’s it. There are a few more options, but that’s most of the library covered. It’s been invaluable in shipping infrastructure changes at Yeller, and I hope it’ll help you out as well.

Because experiments are just maps, and they include a name, it’s easy to build supporting features around them. Yeller centrally registers all experiments, and then provides a simple JSON endpoint on each server which displays how many mismatches that experiment got, latency, how enabled an experiment is, and more. Because experiments are just maps, you can write publish and enabled once, generically, base what they do off the experiment :name, and have those standardized as well.

The first two tries of Yeller’s new exception detection system were a bust. They had much worse latency, and had incorrect results compared to the existing system. But the third approach I tried worked great, and has been in production for 3 days now. Laboratory helped me iterate and ship that without users ever noticing anything was changing.

References

Everything in this post was directly inspired by @jesseplusplus’ awesome presentation: Easy Rewrites with Ruby and Science

This is a blog about the development of Yeller, the Exception Tracker with Answers.

Read more about Yeller here

Looking for more about running production applications, debugging, Clojure development and distributed systems? Subscribe to our newsletter: