Laboratory: Doing Science In Production

This is a blog about the development of Yeller, The Exception Tracker with Answers

Enter Laboratory

Laboratory lets you express experiments as simple maps:

(require '[laboratory.experiment :as science])

(def my-experiment
  {:name "widget-permissions"
   :use (fn [widget user] (check-user? widget user))
   :try (fn [widget user] (allowed-to? :read user widget))})

(science/run my-experiment widget user)

:use is the control - no matter what, calling run will return whatever :use returns. If :use throws, run will throw that exception. :try is the candidate - it’s result will never be shown to the user. If :try throws, the exception won’t be rethrown by run.

That’s the most basic, most useless experiment ever though. The important win comes in when you start publishing results:

(require '[laboratory.experiment :as science])

(def my-experiment
  {:name "widget-permissions"
   :use (fn [widget user] (check-user? widget user))
   :try (fn [widget user] (allowed-to? :read user widget))
   :publish (fn [result]
              (record-timing "widget-permissions-control" (-> result :control :duration))
              (record-timing "widget-permissions-candidate" (-> result :candidate :duration))
              (if (not= (-> result :control :value) (-> result :candidate :value))
                (log/warn "widget-permissions-mismatch" result)))})

(science/run my-experiment widget user)

publish takes a result, which contains the return values, and the durations of each side of the experiment. What you do with that is up to you. You can log it. You can write mismatches into the database. You can page admins on value mismatches. Whatever you like. Note that if either the control or the candidate throws an exception, their :value will be the exception that was thrown. You might want to log that.

Lastly, what good is an experiment if you can’t turn it off if it’s misbehaving:

(require '[laboratory.experiment :as science])

(def my-experiment
  {:name "widget-permissions"
   :use (fn [widget user] (check-user? widget user))
   :try (fn [widget user] (allowed-to? :read user widget))
   :enabled (fn [_ user] (is-staff? user))
   :publish (fn [result]
              (record-timing "widget-permissions-control" (-> result :control :duration))
              (record-timing "widget-permissions-candidate" (-> result :candidate :duration))
              (if (not= (-> result :control :value) (-> result :candidate :value))
                (log/warn "widget-permissions-mismatch" result)))})

(science/run my-experiment widget user)

:enabled takes the same arguments as science/run, and lets you determine if the candidate should run this time. This is ripe for a feature flag - letting you control how many times you try out the new widget system, and for which users.

Laboratory is up on Github and Clojars. I hope it’s useful for you.

Fin

That’s it. There are a few more options, but that’s most of the library covered. It’s been invaluable in shipping infrastructure changes at Yeller, and I hope it’ll help you out as well.

Because experiments are just maps, and they include a name, it’s easy to build supporting features around them. Yeller centrally registers all experiments, and then provides a simple JSON endpoint on each server which displays how many mismatches that experiment got, latency, how enabled an experiment is, and more. Because experiments are just maps, you can write publish and enabled once, generically, base what they do off the experiment :name, and have those standardized as well.

The first two tries of Yeller’s new exception detection system were a bust. They had much worse latency, and had incorrect results compared to the existing system. But the third approach I tried worked great, and has been in production for 3 days now. Laboratory helped me iterate and ship that without users ever noticing anything was changing.

References

Everything in this post was directly inspired by @jesseplusplus’ awesome presentation: Easy Rewrites with Ruby and Science

This is a blog about the development of Yeller, the Exception Tracker with Answers.

Looking for more about running production applications, debugging, Clojure development and distributed systems? Subscribe to our newsletter: