Get your objects out of my session

Last week I had the pleasant job of fixing a feature that broke due to a change in a third-party API. Specifically, Twitter changed part of their authentication API and this broke our ‘post your attendance to Twitter’ feature. After a while spelunking through several layers of HTTP indirection inside the twitter and oauth gems, it became apparent that an upgrade was in order – we implemented this feature so long ago that our twitter gem was lagging four major releases behind the current version.

But this isn’t about Twitter, or OAuth, or even those specific Ruby libraries. It’s about an antipattern I was reminded of while updating our code and reading the OAuth gem documentation. Here is how it suggests you start the authorization process in your Twitter client app:

@callback_url = "http://127.0.0.1:3000/oauth/callback"
@consumer = OAuth::Consumer.new("key", "secret", :site => "https://agree2")
@request_token = @consumer.get_request_token(:oauth_callback => @callback_url)
session[:request_token] = @request_token
redirect_to @request_token.authorize_url(:oauth_callback => @callback_url)

This code contains a bug that’s bitten me so many times it jumped right off the page:

session[:request_token] = @request_token

Here’s the bug: you just stored the Marshal.dump of some random object in the session. One day, you will refactor this object – change its class name, adjust its instance variables – and next time you deploy, no-one will be able to access your site. It doesn’t matter whether the session is stored in the cookie (and therefore on the user’s computer) or on your servers, the problem is that you’ve stored a representation of state that’s tightly coupled to its implementation.

A simple example

Let’s see this in action. Imagine we have a little Sinatra app with two endpoints. One of these endpoints puts an object in the session, and another one retrieves data from the stored object:

require 'sinatra'
set :sessions, true
set :session_secret, 'some very large random value'

class State
  def initialize(params = {})
    @params = params
  end

  def get
    @params.values.first
  end
end

get '/' do
  session[:state] = State.new(:flow => 'sign_up')
  'Hello'
end

get '/state' do
  session[:state].get
end

We boot the app, and see that it works:

$ curl -i localhost:4567/
HTTP/1.1 200 OK
Content-Type: text/html;charset=utf-8
Content-Length: 5
Set-Cookie: rack.session=BAh7CEk...; path=/; HttpOnly

Hello

$ curl localhost:4567/state -H 'Cookie: rack.session=BAh7CEk...'
sign_up

A little change

So, this seems to work, and we leave the site running like this for a while, and people visit the site and create sessions. Then one day we decide we need to refactor the State class, by changing that hash into an array:

class State
  def initialize(params = [])
    @params = params
  end

  def get
    @params.last
  end
end

get '/' do
  session[:state] = State.new(['sign_up'])
  'Hello'
end

Now if we retry our request we find this buried among the stack traces:

$ curl localhost:4567/state -H 'Cookie: rack.session=BAh7CEk...'

NoMethodError at /state
undefined method `last' for {:flow=>"sign_up"}:Hash

A peek at Rack’s guts

To understand why this happens you need to see how Rack represents the session. Basically, it takes the session hash, such as {:state => State.new(:flow => 'sign_up')}, runs it through Marshal.dump and base64-encodes the result. Here’s what Marshal emits:

>> session = {:state => State.new(:flow => 'sign_up')}
=> {:state=>#"sign_up"}>}
>> Marshal.dump session
=> "\x04\b{\x06:\nstateo:\nState\x06:\f@params{\x06:\tflowI\"\fsign_up\x06:\x06ET"

Marshal produces a literal representation of the object – its class, its instance variables and their values. It is a snapshot of the object that can be completely reconstructed later through Marshal.load.

When you store objects in the session, you are dumping part of your program’s implementation into storage and, if you use cookie-stored sessions, sending that representation to the user for them to give back later. Now, fortunately, cookies are signed by Rack using HMAC-SHA1 so the user should not be able to construct arbitrary Marshal output and inject objects into your program – don’t forget to set :session_secret unless you want people sending forged objects to you! But there is still the problem that your code is effectively injecting objects into processes running in the future, when those objects may no longer be valid.

If you change the name of a class, then Marshal.load will fail, and you’ll get an empty session object. But if all the types referenced in the session dump still exist, it will happily reconstruct all those objects and their state may not reflect what the current process expects.

And as a bonus, once you’ve deployed the session-breaking change, you can’t revert it, because recent visitors will have the new representation in their session. We’ve got various classes in our codebase with multiple names to work around times when we made this mistake.

A better way

In light of the above, you should treat your sessions with a certain degree of paranoia. You should treat them with the same care as a public API, making sure you only put stable representations of state into them. Personally I stick to Ruby’s core data types – strings, numbers, booleans, arrays, hashes. I don’t put user-defined classes (including anything from stdlib or gems) in there. Similarly, you should not assume any given session key exists, since the session may become corrupt, the user may delete their cookies, and so on. Always check for nil values before using any session data, unless you want your site to become unreachable.

A future-proof Twitter client

So how should you use the Twitter gem and avoid these problems? Easy – just store the credentials from the request token, and reconstruct the token when Twitter calls you back:

Twitter.configure do |c|
  c.consumer_key    = 'twitter_key'
  c.consumer_secret = 'twitter_secret'
end

def consumer
  OAuth::Consumer.new('twitter_key',
                      'twitter_secret',
                      :site => 'https://www.example.com')
end

def callback_url
  'https://www.example.com/auth/twitter/callback'
end

get '/auth/twitter' do
  request_token = consumer.get_request_token(:oauth_callback => callback_url)
  session[:request_token] = @request_token.token
  session[:request_secret] = @request_token.secret
  redirect request_token.authorize_url(:oauth_callback => callback_url)
end

get '/auth/twitter/callback' do
  token  = session[:request_token]
  secret = session[:request_secret]

  halt 400 unless token and secret
  session[:request_token] = session[:request_secret] = nil
  
  request_token = OAuth::RequestToken.from_hash(consumer,
                      :oauth_token => token,
                      :oauth_token_secret => secret)
  
  access_token = request_token.get_access_token(:oauth_verifier => params[:oauth_verifier])
  
  client = Twitter::Client.new(
               :oauth_token => access_token.token,
               :oauth_token_secret => access_token.secret)
  
  user_details = client.verify_credentials
  
  store_twitter_tokens(user_details.screen_name,
                       access_token.token,
                       access_token.secret)
  
  redirect '/auth/twitter/success'
end

Note how we only store strings in the session and the database, and we store just enough of the credentials that we can construct an OAuth or Twitter client later, whenever we need one.

This approach only stores stable representations – tokens used in the OAuth protocol – and constructs objects by hand when they are needed rather than relying on Marshal dumps. This makes the application more resilient when the libraries you depend on inevitably need upgrading.

Statistics for fun and profit (and analyzing split tests)

This is a post about split testing. Split testing, sometimes known as A/B testing, is a way of figuring out which of two (or more) versions of a site performs better. The idea is simple: divide visitors to your site into groups at random and present each group with one of the versions under test. You then measure the effectiveness separately for each group and compare results. The big advantage of running things this way rather than, say, showing everyone version A on Monday followed by version B on Tuesday is that it automatically corrects for external confounding factors; what if Monday was a public holiday, for example.

So far, so good. It all sounds pretty simple, and implementation can be as straightforward as a setting a cookie and counting entries in the server logs. However, things get a little more complicated when it comes to analyzing the results.

For example, how do you know when to stop collecting data and make a decision? Leaving the test running for too long is a waste of time, which is something that most start-ups don’t exactly have a lot of, but not collecting enough data has more subtle consequences. Each group of users will display a wide variety of behavior for all sorts of reasons nothing to do with the change you’re making. Suppose that by pure chance the average age of visitors in group A was much higher than in group B; in this case you could easily imagine that their behavior would differ regardless of the versions of the site they had seen. Put another way, how can you be confident that the difference you observe implies a fundamental difference between the versions rather than simply being explained by random chance? This topic is known as statistical significance.

There are a few ways to approach this question. One common approach is frequentist hypothesis testing, which I’m not going ot discuss here. Instead I’ll focus on an approach based on Bayesian modelling.

As the name would suggest, at the core of this approach is a mathematical model of the data observed during the test. To be a little more precise, by mathematical model I mean a statement about the relationship between various quantities. A non-statistical example of this is Ohm’s law, which is a model of electrical conductivity in idealized materials, and states that three quantities, current (I), voltage (V) and resistance (R) are related by

V = I \times R

Statistical models generalize this by introducing random variables into the mix. A random variable is a variable which, rather than having a single fixed value, is represented by distribution of possible values with associated probabilities; we may be 90% sure that the number of beers left in the fridge is 10, but we can’t quite remember who drank what last night, so there’s a 10% chance that the number is 9. The exact meaning of probabilities is an interesting philosophical discussion in its own right, but intuitively it’s a measure of the strength of our belief represented as a real number between 0 and 1. Values with probability 0 can never happen, and values with probability 1 are certain, and everything in between may or may not be true.

Models for split tests

How do we apply it to the results of a split test? Let’s start by modelling the behavior of a single group of users.

As a concrete example, lets say we want to improve the number of users successfully filling in our sign-up form. In this case, over some period n visitors land on the form, of which k successfully fill it in and hit ‘submit’. A third relevant quantity is p, which is the conversion rate, i.e. the probability that a randomly chosen individual from the entire population, when presented with the form, will sign-up. The emphasis here is important; we want to be able to generalize to future visitors, so calculating a value for p based purely on the participants in the test, while easy, isn’t good enough.

Before we can make any inferences we need to relate these quantities to each other via a statistical model. In this case a binomial model is appropriate. This uses a Binomial distribution, which has the following probability mass function (PMF) for the value of k given a certain n and p:

f\left(k; n, p\right) = {n \choose k}\times p^k\times \left(1-p\right)^{n-k}

The PMF allows us to take a value of k and find the probability of that value occurring under the model. Graphically it looks like this:

where the red, green and blue curves are for p=0.5, p=0.9 and p=0.1 respectivley (n=100 in all cases).

The binomial distribution is often described using the example of a biased coin. Suppose I have such a coin with a known probability, p, of turning up heads; the binomial distribution represents the probability of seeing k heads if I flip it n times. Note the use of random variables: even if n and p are known with certainty we still can’t do better than assigning a distribution over a range of possible values for k. Hopefully it’s not too much of a stretch to relate this scenario to the sign-up conversion problem.

Inference and Bayes’ theorem

Let’s write the probability of a particular value of k as P(k | p, n). In this notation the bar (‘|’) represents conditional probability. In other words, this is an expression for the distribution over possible values of k if p and n have known, fixed values, and in this case is exactly the binomial PMF given by f(.) above.

This isn’t quite what we want. Given the results of a test, k is known, but p isn’t, so we want to a know a distribution over p given the observed data, or P(p | k, n). Fortunately, Bayes’ theorem tells us how to compute precisely that:

P\left(p\mid k,n\right) = {P\left(k\mid p,n\right) \times P\left(p\mid n\right) \over P\left(k\mid n\right)}

There are a couple of other quantities here, P(p | n), which is known as the prior, and P(k | n). The prior represents our beliefs about p in the absence of data. Given we know nothing in that case it’s not unreasonable to model it as a flat distribution (i.e. a constant). P(k | n) is dependent only on fixed, observed quantities, so can also be treated as a constant for this analysis, hence:

P\left(p\mid k,n\right) = {1 \over Z} P\left(k\mid p,n\right)

Probability distributions must sum to one (i.e. we know that we’ll certainly get one of the possible values), so Z isn’t free to vary arbitrarily.

All of this can easily be done numerically, either with a small script in your language of choice or using a spreadsheet. Excel has a function BINOMDIST which gives P(k | p, n), so you can use something like this:

Comparing test groups

In a split test we treat each group as a separate population, with separate conversion rates, pa and pb. Each of these can be analysed as above, so we’ll end up with a distribution for each. Numerically, this will be a set of discrete values for each with probabilities assigned to each, probably represented as two columns if you’re using a spreadsheet.

We’ll treat the two groups as independent. For independent variables ‘and’ queries correspond to multiplying probabilities, so the probability of group A having conversion rate pa and group B having conversion rate pb is

p\left(p_a\mid k_a, n_a\right)\times p\left(p_b\mid k_b, n_b\right)

Finding the probability that A wins is then just a matter of finding all of the pairs (pa, pb) where pa > pb, multiplying the corresponding values for each and then adding them all up. Using mathematical notation, this is the same as saying

\sum_{p_a > p_b} p\left(p_a\mid k_a, n_a\right)\times p\left(p_b\mid k_b, n_b\right)

It turns out that this sort of calculation doesn’t really lend itself to spreadsheets, but it’s pretty straightforward in most programming languages. We’ve actually put some of the scripts we use for this kind of analysis on GitHub: https://github.com/songkick/skab.

To make a decision you first need to decide how confident you want to be. If the answer you get from the above is 0.95 and you’re happy with a 5% margin of error you should choose to roll out version A, and if it’s 0.05 you probably want to pick B.

If you get something close to 0.5 you need to work out whether to declare neither A nor B the winner (i.e. they’re as good as each other), or wait a bit longer and gather more data. To help with this you can vary the above sum to consider pairs where pa and pb are within some small distance of each other (say a 1% difference). If the probability mass for these pairs is high it’s very likely that there is little difference between A and B, but if not you just don’t have enough data to draw a conclusion either way.

A month at Songkick

I love Songkick.

Not in a soppy “no you hang up first” kinda way, but in a “I haven’t missed a great gig in over a year” way. Which is why when I was given the opportunity to work here, I jumped at it.

After working at Songkick for a few weeks now, I thought I’d write about my experiences so far, from the interview process through to day-to-day development.

Here are the six simple steps I took to Songkick happiness.

Step 1 – Network

I’ve been a fan of Songkick’s service for a long time, and after I met some of the team at the Silicon Milkroundabout event in May 2012, I was invited to start the interview process. This was great news (Songkick are awesome[1]).

Initially, I did have a few concerns about my technical compatibility with the company; I’ve spent the last few years in a Windows and .NET environment, and Songkick are a long way from that. I was soon to find that these worries were misplaced.

Step 2 – Code

To kick off the interview process, I received an email from Songkick – “Hey Aaron, You seem pretty rad, fancy taking a technical test?”. At least that’s how I remember it.

The rules:

  • Complete an hour long programming challenge
  • From home, at a time that suited you
  • In a programming language of your choice

I let them know when I could set an hour aside, and at the agreed time I was emailed a PDF describing the challenge. I can’t give too much away, but the challenge was really interesting, and Songkick-specific.

I hacked away in C#, making use of third-party libraries as required, and after the hour was up, emailed my solution. I didn’t have time to fully complete the challenge, but I had concentrated on getting a clean design, stubbing all core interfaces, classes and methods, and adding comments and pseudo-code where necessary.

After a few days, I received an email informing me that I was through to round two.

Step 3 – More Code

I was invited to have a couple of face-to-face interviews, and sit another coding test. This time I was to complete a 90-minute pair-programming exercise, in Ruby.

The test was a little daunting as I was a complete Ruby novice. However, with it being a pair-programming exercise, I had a friendly developer (Sabrina) sitting with me to help with syntax questions. Any time I was unaware of the syntax in Ruby (quite a lot!), I could scribble on a notepad how I would solve the problem in C#, and Sabrina would show me the equivalent syntax in Ruby.

This was a test-driven development exercise, and I was introduced to the challenge with a brief overview of the task, and a collection of failing Cucumber tests. I wrote code to gradually pass each test, until all passed – and in the nick of time too. I had a couple of minutes to discuss my solution and what I would add to it if I had more time, and the 1.5 hours were up.

Step 4 – Meet and Greet

As a firm believer in The Joel Test, I agree that writing code during the interview process is important, but equally important is the rapport between yourself and your potential colleagues.

During the interview process, I met a large percentage of the company over a number of interviews, including a coffee and chat with the entire development team. It’s pretty intimidating stuff, but it gives both parties the opportunity to make sure each will be a good fit for the other.

After a few more days of waiting, I received the call I was hoping for.

Step 5 – On-boarding

Joining Songkick was a super-smooth operation. We run a tight ship (as I was to find out), and my first few days were as follows.

Day 1

I spent the morning being shown around the office: an open plan environment with everything a professional developer needs to maintain a high level of productivity (ping pong table, foosball table, a fully-stocked kitchen and a proper coffee machine).

I was provided with a mentor for the week – Robin. Having someone to sit with you, explain the development environment and application design really helped me to become productive quickly. In fact, I made my first code commit on day one.

Day 2 & 3

I spent the next two days divided between coding (with Robin) and various presentations from the different departments in Songkick. These ranged from the data science team (who handle making sense of the huge amounts of data we have), to QA and infrastructure.

Day 4

The whole company boarded a vintage Routemaster bus, and we were taken to End of the Road festival for the weekend. Did I mention Songkick are awesome[1]?

Step 6 – Develop

By far the biggest change (and probably worry) in my move to Songkick was the development environment. I’ve been working in a .NET ecosystem for a number of years, the framework is stable and Visual Studio is in my opinion, a great IDE; it’s feature-rich and has some useful plugins. On the other hand, Songkick’s development environment is entirely Unix-based, making use of (and contributing back to) lots of open-source projects.

I do have experience developing in a Linux environment, but haven’t touched it for a few years, so had a feeling I was going to be rusty. After a few days, I was pleasantly surprised to see how far the tools and frameworks have come. Again, having a mentor to guide me through this transition was crucial; I could ask questions and receive answers immediately.

All in all, joining Songkick has been an amazing experience. I’m surrounded by different teams of people (ranging from developers and testers, through to UX experts and designers), all of which are the best at what they do (but don’t take my word for it, check out the team page). Having a passion for the product is essential, but if you love live music, Songkick is for you.

[1] How about developing for a platform that has millions of users, and enables fans from across the world to see their favourite artists live. And the perks are pretty amazing too; great office, free food and drink, table tennis and foosball, monthly ticket allowance, annual festival trip for the company, etc. I could go on, but you should probably just apply.