A Second Here, A Second There…

A Mysterious Performance Problem

In production, we were seeing lots of requests to our backend services taking a long time. They would typically be over 1000ms when we would expect them to be much, much faster. It was intermittent too. Sometimes the same endpoints were fast, sometimes they were slow.

When we investigated, we were able to reduce the performance problem to the following mystifying test case, which should happen on your system too.

With this Sinatra app:

require 'sinatra'

post "/foo" do
  status 200
  "There are #{params.length} params"
end

Making this request:

curl -d "a=1" "localhost:3000/foo"

takes almost exactly 0 seconds.

(Note, we use libcurl to talk to services instead of Ruby’s Net::HTTP, so this is very close to a real service request from one of our applications.)

On the other hand, making this request:

curl -id "a=$A_KB_OF_DATA" "localhost:8080/foo"

takes almost exactly 1 second.

Now, it shouldn’t take 1 second to make such a simple request just because I’m sending 1K of data.

So what on earth is happening here? If you already know the answer then much respect to you; but this was a head scratcher for me.

After spending a long time in the Sinatra, Rack and Unicorn stack, I was able to trace the long wait to the socket read call in the Unicorn request object.

At which point, Graham (our Head of Tech Ops) suggested that we dump the packets to examine what Curl was actually sending to the server and receiving back. The command we used for this was:

tcpdump -i lo -s 0 port 8080

(Which says, capture all packets on the loopback interface, in their entirety, to and from port 8080.)

Inspecting the contents of these packets led to us (or at least me) learning some new things about HTTP.

I Suck At HTTP

Here’s something I didn’t know about HTTP, from the RFC. An HTTP client can omit the request body and add this header to the request:

Expect: 100-continue

Which is a way of saying “I’m not sure if you want this request body, do you?”

If the server does want the request body, it should respond with a 100 response, and the client can then upload the body and the request/response continues as normal from there.

If it does not want the request body, it can respond with whatever other HTTP code.

Now, this is handy if there are circumstances where you want to reject a request before you go to the trouble of uploading a perhaps massive request body. (Maybe an enormous image file being posted to an endpoint that is not accepting uploads right now.)

It’s entirely up to the client whether to include this header. You can not bother at all, or only bother for very large files, or do it all the time. All valid approaches.

Curl says: if your post data is larger than 1K, I’ll send the Expect header. I say: OK Curl. 1 kilobyte seems pretty small, but whatever, fine.

So far so good.

Here’s the problem: Ruby web servers do not implement 100 Continue automatically.

And they shouldn’t, either. It’s up to the application to make that choice, receive or not receive, so application authors need to implement that themselves in general. And we didn’t.

So what happens? Curl sits there waiting. “Come on, come on, do you want this request body or not?

Curl makes a Weird Choice

So true, our app was technically broken. It happens. And we would have fixed it if Curl had timed out waiting for the 100, as you might expect. Because that would have been an error we would have noticed and fixed.

But Curl doesn’t timeout waiting. If it has sent the Expect header, Curl will wait for the 100 response, but only for exactly one second. After one second has passed, it will send the request body anyway.

So from our apps point of view, everything is fine. It was just a bit slow. No errors raised or reported. Just a slow service. Sometimes.

I guess from the point of view of Curl, it’s best to make every effort to have a successful request, even with performance degradation. From our point of view, failing hard would have been better than months of silent performance regression.

Working Around It

So we now have two options. The first and best is to actually implement the 100 Continue behaviour correctly. The second is to make Curl not behave like this.

Because we control all the clients to our internal services and wanted a quick fix, we decided to go with the second approach, which I’ll describe first.

If you set the Expect header yourself, even to a blank string, that overrides Curl’s behaviour and forces it to send the entire request immediately:

Expect: 

We did this and fixed all the slow requests we were seeing instantly.

Fixing it Properly (With Unicorn)

It’s not entirely obvious how to implement the required behaviour in a Sinatra application. How would you respond with 100 and then continue processing a request in Sinatra?

Fortunately, Unicorn has support for 100 Continue built in. If you respond with 100 from your application, it will return that to the client, and then immediately turn around and call your app a second time with the full data.

So we can make our application detect the Expect header and respond with 100 if appropriate.

We’d like to be able to make this decision from Sinatra, but I believe that Sinatra eagerly reads all of the request, so the delay is triggered before your action block is run.

So instead, include the logic in a Rack middleware, like this. Although here there is no logic, we simply request the body always and immediately if we detect that the client is waiting for 100 response:

class AlwaysRequestBody
  def initialize(app)
    @app = app
  end

  def call(env)
    if env["HTTP_EXPECT"] =~ /100-continue/
      [100, {}, [""]]
    else
      @app.call(env)
    end
  end
end

use AlwaysRequestBody
run Sinatra::Application

With this in place, the previously slow curl request returns instantly, and you can see here Curl logging the intermediate 100 response as well.

$ time curl -id "a=$A_KB_OF_DATA" "localhost:8080/foo"
HTTP/1.1 100 Continue

HTTP/1.1 200 OK
Date: Tue, 27 Nov 2012 17:32:52 GMT
Status: 200 OK
Connection: close
X-Frame-Options: sameorigin
Content-Type: text/html;charset=utf-8
X-XSS-Protection: 1; mode=block
Content-Length: 18
There are 1 params

real    0m0.008s
user    0m0.008s
sys     0m0.008s

Problem solved.

Conclusion

Hence my initial assertion that your app is probably broken. If a client sends the Expect header it has every right to expect a correct response,

Streaming modules in Rails 3

Rails 3 has limited built-in support for progressive rendering (or template streaming). It can stream your header, page, and footer when each has completed rendering. This can have a big impact on your pages’ speed. In our case it’s brought our average first response time from 1s to 400ms.

However, this doesn’t go as far as we’d like. In previous versions of songkick.com, we used a plugin that activated module-by-module template streaming. This means that the page streams each module of the page as it is rendered, not just in three big blocks.

This doesn’t seem to be possible in Rails 3 (although it is apparently planned for Rails 4), but with a bit of fiddling we have managed to make it work. It’s currently activated on most pages on www.songkick.com, if you head over there and find a slow page you can watch the page come in a module at a time.

I’m sharing it here both because I thought it might be useful, and because it’s a bit of a hack and I’d like feedback on it.

In the action you want to stream, you can turn on standard Rails 3 streaming with:

def show
  render :stream => true
end

But to enable module by module streaming there are two things to do. First, in your layout:

<% $streaming_output_buffer = @output_buffer %>
<%= yield %>

And then in the page template (here show.html.erb) wrap the entire contents in a block:

<% with_output_buffer($streaming_output_buffer) %>
  … all of ERB template in here
<% end %>

This will cause this ERB template to write to the streaming buffer rather than its own output buffer each time it renders a portion of the page.

You should only add this block in the top level page templates. Don’t put it in partials or whatever, otherwise weird things will happen. Also, if your entire page is a render call to just one other template (say the page just contained one big partial) then you won’t see any benefit, because it’s only this template that streams not recursively called ones.

Told you it was a bit of a hack. Hopefully it will tide us over until Rails 4 implements this properly.

The cultural side of continuous deployment

We’ve written multiple posts about how we reduced our build time and optimised our tests. Moving to continuous integration (CI) and continuous deployment (CD) allowed us to remove many of the frustrations we had with our build and deploy process. On its own a fast build was not going to move us to continuous deployment but it was a pretty significant enabler. We knew that we had plenty more that we could be improving; we were still reliant on having the right people around to sign off features before releasing and we still depended on manual testing to supplement our automation.

We wanted to avoid drawing a process diagram and then having to police it so we focused on a process which was natural to the way we worked but that improved our process as much as possible.

Don’t aim for perfection

One of our major hold-ups was our attempts to make every feature and every release perfect. We were spending days perfecting pixels and copy only to find out that the feature didn’t have the anticipated impact. There is a huge benefit in getting feedback from users on what works and what doesn’t before you invest a whole load of time in making it look perfect on multiple browsers. Over time we have moved from releasing features and then tweaking them to planning and running A/B tests to gather the information we need before we start designing the final feature.

QA has a key role to play in working with the Product and Design teams to define exactly how much breakage is acceptable. We were moving from a process where every release was tested and it was expected that almost all bugs would have been spotted and fixed before going to production. Now we were relying on our developers and our automation to keep things in a ‘good enough’ state. When something went wrong we stepped back and looked at what was missing – in most cases it was an up-front conversation about risks and expectations.

Of course this is not an excuse for having a website full of badly designed and half-working features. We accept that bugs will end up on production but we work hard to make sure they get fixed as soon as possible.

Managing how many more bugs went to production was a job for our automated tests. Reviewing all the tests as part of our ‘make all the tests fast’ overhaul started to convince us that we had decent coverage. Deciding that we were going to trust the tests gave us the freedom to say that any green build was a releasable build. If this turned out not to the be the case, either because manual testing discovered a bug or because of an issue in production then we amended the tests. Regular reviews and conversations, particularly between developers and QA, help us to keep the tests maintained and testing the right things.

Avoid red builds

Historically Songkick has had an unnatural tolerance for red builds. They didn’t appear to slow us down that much so we didn’t take the time to really try to avoid them. Once we started to seriously look at adopting continuous integration we realised that this would have to change. Frequent check-ins will only work if the builds are green. Loud and visible alerts that go to the whole team when a build fails not only means someone looks into the failure quickly but also helped us to view red builds as a delay. This coupled with having a very simple, and fast, way to run the tests on a dev environment before checking code in keeps our red builds to a minimum.

Integrate small changes frequently

A key part of CI is integrating frequently. In an ideal world you probably have everyone working off the master branch. We are careful to maintain a releasable master branch but opted for individual freedom around working on individual branches or directly off master. We like CI because it allows developers the freedom to work in a way that suits them whilst still having enough safeguards to keep the site running. Once we had a fast and painless way to integrate and release most developers naturally started integrating small changes on a more frequent basis.

Have a shared understanding of your goals

Make sure you, and everyone in the team understands what you’re trying to achieve at each stage of the build pipeline. At Songkick we expect to be able to build and test features on a local dev environment. If we discover something that forces us to test on a real test environment, such as missing data or missing services, then work gets prioritised to change that for next time.

Green builds have been tested on the CI server so we assume that a green build has the minimum required functionality to be releasable.

We use the test environment to test that the build can be deployed, and that the website works as we expect it to when running on multiple servers with lifelike data. Acceptance tests running with Selenium check that agreed business-critical functionality has not been broken. We have separated our build and deploy pipeline from feature launches so passing acceptance tests are our green flag to deploy to production.

Manual acceptance testing takes place on the production environment with the aid of feature flippers to control who can see which features. Once a feature has been tested we manually change the flipper to ‘launch’ the feature to the users.

Keep on learning

CI and CD are difficult to implement, and one of the hardest parts is imagining what the process will actually look like. Rather than trying to pin down the final process we introduced changes gradually, focusing on removing the biggest bottlenecks first. Once one bottleneck was removed it was pretty easy to see what the next one was. Speaking up when you feel frustrated along with analysing problems using the 5-Whys method has helped us improve the process to where we are today. It is fine to make a mistake but at least make sure it is an original one.

validates_uniqueness_of :nothing

Warning: this article contains rather a lot of silly decisions.

I’ve recently been working out some bugs in our OAuth implementation, including our OAuth2::Provider library. One of the biggest gotchas I found while diagnosing problems with our client apps was the existence of duplicate Authorization records.

An Authorization is a link between a ResouceOwner (i.e. a Songkick user) and a Client, for example our iPhone application. It represents that the user has granted the client access to their resources on Songkick. There should only be one of these per owner-client pair, and somehow we had a few thousand duplicates in our database. Getting more concrete, the table’s columns include the following:

+---------------------+--------------+
| Field               | Type         |
+---------------------+--------------+
| resource_owner_type | varchar(255) |
| resource_owner_id   | int(11)      |
| client_id           | int(11)      |
+---------------------+--------------+

Each combination of values for these three columns must only appear once in the table.

A series of unfortunate events

Now the Rails Way to make such guarantees is to use validates_uniqueness_of, or use a find_or_create_by_* call to check if something exists before creating it. And that’s basically what I’d done; OAuth2::Provider has a method called Authorization.for(owner, client) that would either find a suitable record or create a new one.

But despite implementing this, we were still getting duplicates. I removed an alternative code path for getting Authorization records, and still the duplicates continued. I figured something in our applications must be creating them, so I made new() and create() private on the Authorization model. No dice.

And then I remembered: concurrency! Trying to enforce uniqueness on the client doesn’t work, unless all the clients subscribe to a distributed decision-making protocol. If two requests are in flight, both can run a SELECT query, find there’s no existing record, and then both decide to create the record. Something like this:

             User 1                 |               User 2
------------------------------------+--------------------------------------
# User 1 checks whether there's     |
# already a comment with the title  |
# 'My Post'. This is not the case.  |
SELECT * FROM comments              |
WHERE title = 'My Post'             |
                                    |
                                    | # User 2 does the same thing and also
                                    | # infers that his title is unique.
                                    | SELECT * FROM comments
                                    | WHERE title = 'My Post'
                                    |
# User 1 inserts his comment.       |
INSERT INTO comments                |
(title, content) VALUES             |
('My Post', 'hi!')                  |
                                    |
                                    | # User 2 does the same thing.
                                    | INSERT INTO comments
                                    | (title, content) VALUES
                                    | ('My Post', 'hello!')
                                    |
                                    | # ^^^^^^
                                    | # Boom! We now have a duplicate
                                    | # title!

This may look familiar to you. In fact, I lifted straight out of the ActiveRecord source where it explains why validates_uniqueness_ofdoesn’t work when you have concurrent requests.

Users do the funniest things

I agree with you – in theory. In theory, communism works. In theory.

— Homer J. Simpson

There can be a tendency among some programmers to dismiss these arguments as things that probably won’t be a problem in practice. Why would two requests arrive at the same time, close enough to cause this race condition in the database, for the same user’s resources? This is the same thinking that tells you timing attacks are impossible over the Internet.

And I subscribed to this belief for a long time. Not that I thought it was impossible, I just thought there were likelier causes – hence all the attempts to shut down record creation code paths. But I was wrong, and here’s why:

People double-click on things on the Web.

Over time, we designers of software systems have instilled some confusing habits in the people who use our products, and one of those habits means that there is a set of people that always double-click links and form buttons on web pages. Looking at the updated_at timestamps on the duplicate records showed that most of them were modified very close together in time, certainly close enough to cause database race conditions. This fact by itself makes client-enforced uniqueness checks a waste of time. Even if you’re not getting a lot of requests, one little user action can blow your validation.

This is the database’s job

Here’s how this thing should be done, even if you think you’re not at risk:

class AddUniqueIndexToThings < ActiveRecord::Migration
  def self.up
    add_index :oauth_authorizations,
              [:client_id, :resource_owner_type, :resource_owner_id],
              :unique => true
  end
  
  def self.down
    remove_index :oauth_authorizations,
                 [:client_id, :resource_owner_type, :resource_owner_id]
  end
end

Then, when you try to create a record, you should catch the potential exception that this index will through if the new record violates the uniqueness constraint. Rails 3 introduced a new exception called ActiveRecord::RecordNotUnique for its core adapters, but if you’re still supporting older Rails versions you need to catch ActiveRecord::StatementInvalid and check the error message. Here’s how our OAuth library does things.

DUPLICATE_RECORD_ERRORS = [
  /^Mysql::Error:\s+Duplicate\s+entry\b/,
  /^PG::Error:\s+ERROR:\s+duplicate\s+key\b/,
  /\bConstraintException\b/
]

def self.duplicate_record_error?(error)
  error.class.name == 'ActiveRecord::RecordNotUnique' or
  DUPLICATE_RECORD_ERRORS.any? { |re| re =~ error.message }
end

In the Authorization.for(owner, client) method, there’s a rescue clause that uses duplicate_record_error? to check the exception raised. If it’s a duplicate record error, we retry the method call since the second time it should find the new record that was inserted since the first call started.

Get your objects out of my session

Last week I had the pleasant job of fixing a feature that broke due to a change in a third-party API. Specifically, Twitter changed part of their authentication API and this broke our ‘post your attendance to Twitter’ feature. After a while spelunking through several layers of HTTP indirection inside the twitter and oauth gems, it became apparent that an upgrade was in order – we implemented this feature so long ago that our twitter gem was lagging four major releases behind the current version.

But this isn’t about Twitter, or OAuth, or even those specific Ruby libraries. It’s about an antipattern I was reminded of while updating our code and reading the OAuth gem documentation. Here is how it suggests you start the authorization process in your Twitter client app:

@callback_url = "http://127.0.0.1:3000/oauth/callback"
@consumer = OAuth::Consumer.new("key", "secret", :site => "https://agree2")
@request_token = @consumer.get_request_token(:oauth_callback => @callback_url)
session[:request_token] = @request_token
redirect_to @request_token.authorize_url(:oauth_callback => @callback_url)

This code contains a bug that’s bitten me so many times it jumped right off the page:

session[:request_token] = @request_token

Here’s the bug: you just stored the Marshal.dump of some random object in the session. One day, you will refactor this object – change its class name, adjust its instance variables – and next time you deploy, no-one will be able to access your site. It doesn’t matter whether the session is stored in the cookie (and therefore on the user’s computer) or on your servers, the problem is that you’ve stored a representation of state that’s tightly coupled to its implementation.

A simple example

Let’s see this in action. Imagine we have a little Sinatra app with two endpoints. One of these endpoints puts an object in the session, and another one retrieves data from the stored object:

require 'sinatra'
set :sessions, true
set :session_secret, 'some very large random value'

class State
  def initialize(params = {})
    @params = params
  end

  def get
    @params.values.first
  end
end

get '/' do
  session[:state] = State.new(:flow => 'sign_up')
  'Hello'
end

get '/state' do
  session[:state].get
end

We boot the app, and see that it works:

$ curl -i localhost:4567/
HTTP/1.1 200 OK
Content-Type: text/html;charset=utf-8
Content-Length: 5
Set-Cookie: rack.session=BAh7CEk...; path=/; HttpOnly

Hello

$ curl localhost:4567/state -H 'Cookie: rack.session=BAh7CEk...'
sign_up

A little change

So, this seems to work, and we leave the site running like this for a while, and people visit the site and create sessions. Then one day we decide we need to refactor the State class, by changing that hash into an array:

class State
  def initialize(params = [])
    @params = params
  end

  def get
    @params.last
  end
end

get '/' do
  session[:state] = State.new(['sign_up'])
  'Hello'
end

Now if we retry our request we find this buried among the stack traces:

$ curl localhost:4567/state -H 'Cookie: rack.session=BAh7CEk...'

NoMethodError at /state
undefined method `last' for {:flow=>"sign_up"}:Hash

A peek at Rack’s guts

To understand why this happens you need to see how Rack represents the session. Basically, it takes the session hash, such as {:state => State.new(:flow => 'sign_up')}, runs it through Marshal.dump and base64-encodes the result. Here’s what Marshal emits:

>> session = {:state => State.new(:flow => 'sign_up')}
=> {:state=>#"sign_up"}>}
>> Marshal.dump session
=> "\x04\b{\x06:\nstateo:\nState\x06:\f@params{\x06:\tflowI\"\fsign_up\x06:\x06ET"

Marshal produces a literal representation of the object – its class, its instance variables and their values. It is a snapshot of the object that can be completely reconstructed later through Marshal.load.

When you store objects in the session, you are dumping part of your program’s implementation into storage and, if you use cookie-stored sessions, sending that representation to the user for them to give back later. Now, fortunately, cookies are signed by Rack using HMAC-SHA1 so the user should not be able to construct arbitrary Marshal output and inject objects into your program – don’t forget to set :session_secret unless you want people sending forged objects to you! But there is still the problem that your code is effectively injecting objects into processes running in the future, when those objects may no longer be valid.

If you change the name of a class, then Marshal.load will fail, and you’ll get an empty session object. But if all the types referenced in the session dump still exist, it will happily reconstruct all those objects and their state may not reflect what the current process expects.

And as a bonus, once you’ve deployed the session-breaking change, you can’t revert it, because recent visitors will have the new representation in their session. We’ve got various classes in our codebase with multiple names to work around times when we made this mistake.

A better way

In light of the above, you should treat your sessions with a certain degree of paranoia. You should treat them with the same care as a public API, making sure you only put stable representations of state into them. Personally I stick to Ruby’s core data types – strings, numbers, booleans, arrays, hashes. I don’t put user-defined classes (including anything from stdlib or gems) in there. Similarly, you should not assume any given session key exists, since the session may become corrupt, the user may delete their cookies, and so on. Always check for nil values before using any session data, unless you want your site to become unreachable.

A future-proof Twitter client

So how should you use the Twitter gem and avoid these problems? Easy – just store the credentials from the request token, and reconstruct the token when Twitter calls you back:

Twitter.configure do |c|
  c.consumer_key    = 'twitter_key'
  c.consumer_secret = 'twitter_secret'
end

def consumer
  OAuth::Consumer.new('twitter_key',
                      'twitter_secret',
                      :site => 'https://www.example.com')
end

def callback_url
  'https://www.example.com/auth/twitter/callback'
end

get '/auth/twitter' do
  request_token = consumer.get_request_token(:oauth_callback => callback_url)
  session[:request_token] = @request_token.token
  session[:request_secret] = @request_token.secret
  redirect request_token.authorize_url(:oauth_callback => callback_url)
end

get '/auth/twitter/callback' do
  token  = session[:request_token]
  secret = session[:request_secret]

  halt 400 unless token and secret
  session[:request_token] = session[:request_secret] = nil
  
  request_token = OAuth::RequestToken.from_hash(consumer,
                      :oauth_token => token,
                      :oauth_token_secret => secret)
  
  access_token = request_token.get_access_token(:oauth_verifier => params[:oauth_verifier])
  
  client = Twitter::Client.new(
               :oauth_token => access_token.token,
               :oauth_token_secret => access_token.secret)
  
  user_details = client.verify_credentials
  
  store_twitter_tokens(user_details.screen_name,
                       access_token.token,
                       access_token.secret)
  
  redirect '/auth/twitter/success'
end

Note how we only store strings in the session and the database, and we store just enough of the credentials that we can construct an OAuth or Twitter client later, whenever we need one.

This approach only stores stable representations – tokens used in the OAuth protocol – and constructs objects by hand when they are needed rather than relying on Marshal dumps. This makes the application more resilient when the libraries you depend on inevitably need upgrading.

Statistics for fun and profit (and analyzing split tests)

This is a post about split testing. Split testing, sometimes known as A/B testing, is a way of figuring out which of two (or more) versions of a site performs better. The idea is simple: divide visitors to your site into groups at random and present each group with one of the versions under test. You then measure the effectiveness separately for each group and compare results. The big advantage of running things this way rather than, say, showing everyone version A on Monday followed by version B on Tuesday is that it automatically corrects for external confounding factors; what if Monday was a public holiday, for example.

So far, so good. It all sounds pretty simple, and implementation can be as straightforward as a setting a cookie and counting entries in the server logs. However, things get a little more complicated when it comes to analyzing the results.

For example, how do you know when to stop collecting data and make a decision? Leaving the test running for too long is a waste of time, which is something that most start-ups don’t exactly have a lot of, but not collecting enough data has more subtle consequences. Each group of users will display a wide variety of behavior for all sorts of reasons nothing to do with the change you’re making. Suppose that by pure chance the average age of visitors in group A was much higher than in group B; in this case you could easily imagine that their behavior would differ regardless of the versions of the site they had seen. Put another way, how can you be confident that the difference you observe implies a fundamental difference between the versions rather than simply being explained by random chance? This topic is known as statistical significance.

There are a few ways to approach this question. One common approach is frequentist hypothesis testing, which I’m not going ot discuss here. Instead I’ll focus on an approach based on Bayesian modelling.

As the name would suggest, at the core of this approach is a mathematical model of the data observed during the test. To be a little more precise, by mathematical model I mean a statement about the relationship between various quantities. A non-statistical example of this is Ohm’s law, which is a model of electrical conductivity in idealized materials, and states that three quantities, current (I), voltage (V) and resistance (R) are related by

V = I \times R

Statistical models generalize this by introducing random variables into the mix. A random variable is a variable which, rather than having a single fixed value, is represented by distribution of possible values with associated probabilities; we may be 90% sure that the number of beers left in the fridge is 10, but we can’t quite remember who drank what last night, so there’s a 10% chance that the number is 9. The exact meaning of probabilities is an interesting philosophical discussion in its own right, but intuitively it’s a measure of the strength of our belief represented as a real number between 0 and 1. Values with probability 0 can never happen, and values with probability 1 are certain, and everything in between may or may not be true.

Models for split tests

How do we apply it to the results of a split test? Let’s start by modelling the behavior of a single group of users.

As a concrete example, lets say we want to improve the number of users successfully filling in our sign-up form. In this case, over some period n visitors land on the form, of which k successfully fill it in and hit ‘submit’. A third relevant quantity is p, which is the conversion rate, i.e. the probability that a randomly chosen individual from the entire population, when presented with the form, will sign-up. The emphasis here is important; we want to be able to generalize to future visitors, so calculating a value for p based purely on the participants in the test, while easy, isn’t good enough.

Before we can make any inferences we need to relate these quantities to each other via a statistical model. In this case a binomial model is appropriate. This uses a Binomial distribution, which has the following probability mass function (PMF) for the value of k given a certain n and p:

f\left(k; n, p\right) = {n \choose k}\times p^k\times \left(1-p\right)^{n-k}

The PMF allows us to take a value of k and find the probability of that value occurring under the model. Graphically it looks like this:

where the red, green and blue curves are for p=0.5, p=0.9 and p=0.1 respectivley (n=100 in all cases).

The binomial distribution is often described using the example of a biased coin. Suppose I have such a coin with a known probability, p, of turning up heads; the binomial distribution represents the probability of seeing k heads if I flip it n times. Note the use of random variables: even if n and p are known with certainty we still can’t do better than assigning a distribution over a range of possible values for k. Hopefully it’s not too much of a stretch to relate this scenario to the sign-up conversion problem.

Inference and Bayes’ theorem

Let’s write the probability of a particular value of k as P(k | p, n). In this notation the bar (‘|’) represents conditional probability. In other words, this is an expression for the distribution over possible values of k if p and n have known, fixed values, and in this case is exactly the binomial PMF given by f(.) above.

This isn’t quite what we want. Given the results of a test, k is known, but p isn’t, so we want to a know a distribution over p given the observed data, or P(p | k, n). Fortunately, Bayes’ theorem tells us how to compute precisely that:

P\left(p\mid k,n\right) = {P\left(k\mid p,n\right) \times P\left(p\mid n\right) \over P\left(k\mid n\right)}

There are a couple of other quantities here, P(p | n), which is known as the prior, and P(k | n). The prior represents our beliefs about p in the absence of data. Given we know nothing in that case it’s not unreasonable to model it as a flat distribution (i.e. a constant). P(k | n) is dependent only on fixed, observed quantities, so can also be treated as a constant for this analysis, hence:

P\left(p\mid k,n\right) = {1 \over Z} P\left(k\mid p,n\right)

Probability distributions must sum to one (i.e. we know that we’ll certainly get one of the possible values), so Z isn’t free to vary arbitrarily.

All of this can easily be done numerically, either with a small script in your language of choice or using a spreadsheet. Excel has a function BINOMDIST which gives P(k | p, n), so you can use something like this:

Comparing test groups

In a split test we treat each group as a separate population, with separate conversion rates, pa and pb. Each of these can be analysed as above, so we’ll end up with a distribution for each. Numerically, this will be a set of discrete values for each with probabilities assigned to each, probably represented as two columns if you’re using a spreadsheet.

We’ll treat the two groups as independent. For independent variables ‘and’ queries correspond to multiplying probabilities, so the probability of group A having conversion rate pa and group B having conversion rate pb is

p\left(p_a\mid k_a, n_a\right)\times p\left(p_b\mid k_b, n_b\right)

Finding the probability that A wins is then just a matter of finding all of the pairs (pa, pb) where pa > pb, multiplying the corresponding values for each and then adding them all up. Using mathematical notation, this is the same as saying

\sum_{p_a > p_b} p\left(p_a\mid k_a, n_a\right)\times p\left(p_b\mid k_b, n_b\right)

It turns out that this sort of calculation doesn’t really lend itself to spreadsheets, but it’s pretty straightforward in most programming languages. We’ve actually put some of the scripts we use for this kind of analysis on GitHub: https://github.com/songkick/skab.

To make a decision you first need to decide how confident you want to be. If the answer you get from the above is 0.95 and you’re happy with a 5% margin of error you should choose to roll out version A, and if it’s 0.05 you probably want to pick B.

If you get something close to 0.5 you need to work out whether to declare neither A nor B the winner (i.e. they’re as good as each other), or wait a bit longer and gather more data. To help with this you can vary the above sum to consider pairs where pa and pb are within some small distance of each other (say a 1% difference). If the probability mass for these pairs is high it’s very likely that there is little difference between A and B, but if not you just don’t have enough data to draw a conclusion either way.

A month at Songkick

I love Songkick.

Not in a soppy “no you hang up first” kinda way, but in a “I haven’t missed a great gig in over a year” way. Which is why when I was given the opportunity to work here, I jumped at it.

After working at Songkick for a few weeks now, I thought I’d write about my experiences so far, from the interview process through to day-to-day development.

Here are the six simple steps I took to Songkick happiness.

Step 1 – Network

I’ve been a fan of Songkick’s service for a long time, and after I met some of the team at the Silicon Milkroundabout event in May 2012, I was invited to start the interview process. This was great news (Songkick are awesome[1]).

Initially, I did have a few concerns about my technical compatibility with the company; I’ve spent the last few years in a Windows and .NET environment, and Songkick are a long way from that. I was soon to find that these worries were misplaced.

Step 2 – Code

To kick off the interview process, I received an email from Songkick – “Hey Aaron, You seem pretty rad, fancy taking a technical test?”. At least that’s how I remember it.

The rules:

  • Complete an hour long programming challenge
  • From home, at a time that suited you
  • In a programming language of your choice

I let them know when I could set an hour aside, and at the agreed time I was emailed a PDF describing the challenge. I can’t give too much away, but the challenge was really interesting, and Songkick-specific.

I hacked away in C#, making use of third-party libraries as required, and after the hour was up, emailed my solution. I didn’t have time to fully complete the challenge, but I had concentrated on getting a clean design, stubbing all core interfaces, classes and methods, and adding comments and pseudo-code where necessary.

After a few days, I received an email informing me that I was through to round two.

Step 3 – More Code

I was invited to have a couple of face-to-face interviews, and sit another coding test. This time I was to complete a 90-minute pair-programming exercise, in Ruby.

The test was a little daunting as I was a complete Ruby novice. However, with it being a pair-programming exercise, I had a friendly developer (Sabrina) sitting with me to help with syntax questions. Any time I was unaware of the syntax in Ruby (quite a lot!), I could scribble on a notepad how I would solve the problem in C#, and Sabrina would show me the equivalent syntax in Ruby.

This was a test-driven development exercise, and I was introduced to the challenge with a brief overview of the task, and a collection of failing Cucumber tests. I wrote code to gradually pass each test, until all passed – and in the nick of time too. I had a couple of minutes to discuss my solution and what I would add to it if I had more time, and the 1.5 hours were up.

Step 4 – Meet and Greet

As a firm believer in The Joel Test, I agree that writing code during the interview process is important, but equally important is the rapport between yourself and your potential colleagues.

During the interview process, I met a large percentage of the company over a number of interviews, including a coffee and chat with the entire development team. It’s pretty intimidating stuff, but it gives both parties the opportunity to make sure each will be a good fit for the other.

After a few more days of waiting, I received the call I was hoping for.

Step 5 – On-boarding

Joining Songkick was a super-smooth operation. We run a tight ship (as I was to find out), and my first few days were as follows.

Day 1

I spent the morning being shown around the office: an open plan environment with everything a professional developer needs to maintain a high level of productivity (ping pong table, foosball table, a fully-stocked kitchen and a proper coffee machine).

I was provided with a mentor for the week – Robin. Having someone to sit with you, explain the development environment and application design really helped me to become productive quickly. In fact, I made my first code commit on day one.

Day 2 & 3

I spent the next two days divided between coding (with Robin) and various presentations from the different departments in Songkick. These ranged from the data science team (who handle making sense of the huge amounts of data we have), to QA and infrastructure.

Day 4

The whole company boarded a vintage Routemaster bus, and we were taken to End of the Road festival for the weekend. Did I mention Songkick are awesome[1]?

Step 6 – Develop

By far the biggest change (and probably worry) in my move to Songkick was the development environment. I’ve been working in a .NET ecosystem for a number of years, the framework is stable and Visual Studio is in my opinion, a great IDE; it’s feature-rich and has some useful plugins. On the other hand, Songkick’s development environment is entirely Unix-based, making use of (and contributing back to) lots of open-source projects.

I do have experience developing in a Linux environment, but haven’t touched it for a few years, so had a feeling I was going to be rusty. After a few days, I was pleasantly surprised to see how far the tools and frameworks have come. Again, having a mentor to guide me through this transition was crucial; I could ask questions and receive answers immediately.

All in all, joining Songkick has been an amazing experience. I’m surrounded by different teams of people (ranging from developers and testers, through to UX experts and designers), all of which are the best at what they do (but don’t take my word for it, check out the team page). Having a passion for the product is essential, but if you love live music, Songkick is for you.

[1] How about developing for a platform that has millions of users, and enables fans from across the world to see their favourite artists live. And the perks are pretty amazing too; great office, free food and drink, table tennis and foosball, monthly ticket allowance, annual festival trip for the company, etc. I could go on, but you should probably just apply.

Songkick’s first engineering open house

Here at Songkick HQ, we’ve been working on some pretty exciting projects over the last year. With over 6 million monthly uniques, the most comprehensive live music dataset on the planet, and successful apps on Spotify, iPhone, Android and Facebook, we help the world’s music fans go to more concerts.

Come and find out more about the technology behind Songkick. Meet the engineering team, ask questions and – most importantly – enjoy free beer and pizza.

We have four short presentations for you, with plenty of time for you to talk with the team.

When: Wednesday October 10th, 6pm – 8pm
Where:
Songkick HQ, Hoxton Street

Speakers:

Dan Lucraft
Hyperadmin and our Service-Oriented Architecture
How SOA let us build self-documenting APIs

Sabrina Leandro
Data ingestion
How we handle concert data from multiple sources

Phil Cowans
Data Science
Analyzing Songkick’s mountains of data

Amy Phillips
Testing and Continuous Deployment
The heart of Songkick’s Agile process

If you’d like to come along, register below. Spaces are strictly limited, so sign up now. If we have space for you, we’ll send you a confirmation email. If you don’t get in this time, don’t worry, we’ll notify you of future events.

Registration is now closed

Follow us on Twitter: @songkicktech
Read our devblog
We’re hiring: Songkick jobs

Run the right tests at the right time

Way back in June, Dan Crow posted about some of the key principles that we at Songkick believe in. One that I spend some time thinking about every day is, ‘ship early, ship often’. We firmly believe that code should be shipped as soon as it’s ready. From a development point view this just makes sense. From a user’s point of view this just makes sense. From a testing point of view this proves to be a bit of a challenge.

Shipping fast doesn’t mean shipping untested code and hoping for the best. Every single thing that we release has been tested extensively. Obviously the only way we manage to ship often is by keeping the build/test/release cycle as short as possible. All builds are managed in Jenkins. Pushing code will automatically trigger our unit and integration test suites. If all the tests pass we end up with a green build which can be manually deployed to our test environment. Finally a suite of Acceptance tests run through the browser using Capybara and the Selenium Web Driver to confirm we haven’t broken any of our critical user journeys. These tests are pretty slow, taking roughly 4 minutes to run a handful of scenarios but this is the first check that the user will actually be able to interact with the website.

Only after all these tests have passed will we deploy code to Production. This applies to all new features, bug fixes and even changes to the tests themselves.

The problem

Despite our best intentions we were still struggling to ship changes as soon as they were ready:

In June 2011 we made 7 releases.

In the best case it took 3 hours to build, test and ship code. In reality we were spending around 2 days preparing each release. Something had to change.

Dan Lucraft wrote an excellent post about how we reduced the time it takes to run our tests. It feels pretty obvious to say you can increase release speed if you make your tests run faster but this was only part of the solution. Keeping the test suites fast requires constant diligence. Aiming for 100% test coverage is a distraction. Not only will you never achieve it but if you even came close then your builds would likely be taking far longer than needed to run.

Run the right tests

We took the step of identifying which features we wouldn’t want to break and plotting them against the overhead of running tests. In the case of unit tests you can pretty much add as many tests as you like without too much overhead. Integration tests need to be things that you actually care about. If you discovered a feature was broken during manual testing but wouldn’t hold a release to fix it then you shouldn’t have an automated test for that feature in your build (well, unless it was a super quick unit test).

An example of this is our automatic tweets when authenticated users mark their attendance to an event. It is a valid and highly used service that we wouldn’t want to be without but it is not business critical. If we were to have an automated test for this we would need a test which set up a user who appears authenticated with Twitter. The test user would then mark their attendance to an event and the test would need to check whether the tweet was fired for the correct event.

Not only is that a fair bit of work to write and maintain but the resulting test would be pretty slow to execute. The alternative, to push to production and monitor errors in the logs whilst also keeping an eye on the Songkick twitter feed (something we’re already monitoring) means we have one fewer test to run and maintain. The feedback comes later (post release rather than pre) but since we wouldn’t hold a release even if we knew that we had broken this feature then actual time to fix is roughly the same.

At the right time

To allow the team to ship fast we need to keep the release channel clear. Builds run through the test suites as cleanly and as quickly as possible to free up the channel for the next release. Part of our process involves establishing up-front how we will test a code change. Usually this will mean adding or modifying automated tests to cover the new functionality. However some of our changes need more than just an automated build run against them so we needed to come up with a way to separate testing from the actual releases.

Our solution was to use what we call Flippers, additional code which lets admins control whether a feature is visible to users. We can then turn features on and off on the live site without needing to make additional releases. As well as giving us a fast way to turn off problem features this has the benefit of allowing us to turn features on for a particular type of user. High risk or extensively changed features are released to production behind a flipper that makes them visible to admin users only. This means we can run the code on the live servers, using live data but test them as if we were working on a test environment.

Fix bugs fast

One problem with testing code on Production is that the bugs you find are also on Production. Obviously many of these bugs aren’t visible to users thanks to to the flippers but there will always be some bugs in live code. Our approach is a cultural one: yes, we move fast and accept that things might break, but we don’t leave them like that. We fix bugs as fast as possible.

Sounds interesting but does it work?

We spent 12 months looking at our tests, our process and probably ourselves. Changes were made and in June 2012 we made 113 releases. 14 of those were on the same day. In fact we released on every single working day that month (and there were a few sneaky weekend releases too!).

Our object-based Rails frontend

Part of the rewrite of Songkick’s website was a re-architecture of the main client application, affectionately known as skweb (pronounced /skwɛb/, not /ɛskeɪwɛb/). Skweb, as has been mentioned in other posts, had grown into a monster, not just in size but also in complexity. I was asked to suggest an improved structure for the new simplified application. Based on my observations working on our application and the one I’d worked on at the Guardian, I noticed that a lot of complexity was introduced to make rendering web pages easier. It was as if, since we were so focused on modelling the business logic of the company, we had neglected to model a core function of a web site: presenting HTML pages to the user.

With this in mind I proposed splitting out the modelling of webpages into ‘page models’ that would sit alongside the application models and focus on taking Songkick’s data and turning it in to web pages. Each type of page on the website would have a ‘page model’ responsible for rendering the page. This separation would eventually lead naturally to suggesting that we use services to drive skweb, since the page models were built to be agnostic about where their data came from so we could migrate away from our single database more easily.

These days, all the business logic that drives Songkick is contained within internal web services, and skweb’s main job is creating web pages from that information. Certainly there are pages about artists and concerts with tickets and venues so all that vocabulary remains, but it is not the business model of Songkick we are modelling. What we are concerned with is presenting that information in web pages.

Pages, Components, Elements

Once we settled on having page models, it became straightforward to break the page up into its constituent parts. A page has a collection of components, and the components consist of elements. The component is given any data it needs by its enclosing page. Any sufficiently complex components can have their own models that the page model invokes when needed.

The default behaviour for a component which has no data to render is to render nothing. For example if the service that provides data to the component is down, the component should contain the error and emit no output. There should be no stray markup hanging around on the page, and if components need to display something when empty it is up to the page to allow this.

What makes a component?

A component is a discrete module of functionality on the page, that can function independently of other components. Typically you can easily draw a box around a component and it will probably contain a heading and some supporting information. I decided (somewhat arbitrarily) that components are not nestable: you cannot have components inside components. While this constraint is not a technical one, I imposed it to try and reduce complexity in the design. Since components aren’t nestable, if we do need to break them into parts or share code between components then we use elements instead. Components that appear on more than one type of page are called shared components.

An element is something smaller and usually less complex than a component, and may appear in more than one component (if this happens it is called a shared element). An example of this is the attendance buttons that appear all over our site and appear both in the event listings like those found on an artist page and on the individual event pages.

We arrange the view code around pages and components with each page having its own stylesheet, and each component having its own stylesheet, JavaScript and images. We use the same name for each page model and its associated assets, so it’s easy to understand which static assets the component depends on. An advantage of this approach is when a component is removed or refactored there is no ambiguity about which images, CSS files, and JavaScript must be removed or updated.

So how does all this work in practice?

Let’s examine how this works, by following one component through its rendering process. I’m going to use the Map component on the Venue page.

Skweb is still a Rails app and still has the familiar layout, but we’ve added some conventions of our own. First, all pages have a type – ‘venue’, for example – that also provides the name for the CSS file for the page to link to. The page provides methods that expose its components, and it constructs each component by passing in whatever data that component needs: the component has no access to databases, services or the HTTP request, everything they need is given to them via the page model and controller. By convention the name of the component is also the name of the template in the views folder, in fact it is the use of common names that makes understanding component dependencies easier.

A small fragment of our app might look like this:

skweb/
    app/
        controllers/
            venues_controller.rb
        models/
            page_models/
                venue.rb
            skweb/
                models/
                    venue.rb
        views/
            shared/
                components/
                    _calendar_summary.html.erb
                elements/
                    _attendance_buttons_element.html.erb
                    _event_listings.html.erb
            venues/
                _brief.html.erb
                _map.html.erb
                show.html.erb
    public/
        javascripts/
            songkick/
                component/
                    tickets.js
        stylesheets/
            components/
                venue-brief.css
                venue-map.css
            shared/
                elements/
                    pagination.css
                components/
                    brief.css
            venue.css

When a user visits the a Venue page the controller creates a new page object:

class VenuesController < ApplicationController
  def show
    @page = PageModels::Venue.new(venue, logged_in_user)
  end
end

The page model for the Venue includes something to this effect:

module PageModels
  class Venue < PageModels::Base
    def initialize(venue, logged_in_user)
      @venue = venue
      @logged_in_user = logged_in_user
    end

    def brief
      Brief.new(@venue, upcoming_events.total_entries, @logged_in_user)
    end
 end
end

The Brief component is responsible for displaying the venue’s address, map, image, image and so on, but the Ruby objects only expose data. Markup is confined to the view templates, and rendering is performed by glueing a page model and a view template together.

module PageModels
  class Venue
    class Brief
      def geolocation
        @venue.geolocation
      end
    end
  end
end

Moving to the view, the ‘show’ page for a venue might look like this:

<div class="primary col">
  <%= component('brief', @page.brief) %>
  <%= component('map', @page.brief.geolocation) %>
  <%= shared_component('calendar_summary',   @page.calendar_summary) %>
  <%= shared_component('media_summary',      @page.media_summary) %>
  <%= shared_component('media_links',        @page.media_links) %>
  <%= shared_component('gigography_summary', @page.gigography_summary) %>
</div>

component() and shared_component() are defined in ApplicationHelper and look like this:

def component(component_name, object)
  return '' if object.nil?
  render :partial => component_name, :object => object
end

def shared_component(component_name, object)
  component("shared/components/#{component_name}", object)
end

As you can see really just a thin wrapper around partials, but, it also enforces that we do not render if there is no data to give to the component.

The content of the component is pretty standard ERB:

<div class="component venue-map">
  <a href="<%= google_maps_url(map, :zoom => 15) %>" target="_blank">
    <img src="<%= static_google_maps_image_url(map, :width => 640, :height => 220, :zoom => 15) %>">
  </a>
</div>

As a convenience, the object passed in to the component by its page will have the same name as the component. That is where map comes from in the above code. this is also useful in shared components as they don’t need to know anything about the context in which they are being used and what instance variables it might be using.

The Venue page will link to its venue.css file, which looks like:

@import 'shared/components/brief.css';
@import 'components/venue-brief.css';
@import 'components/venue-map.css';
@import 'shared/components/media-summary.css';
@import 'shared/components/event-listings.css';

And the venue-map.css file is short and sweet:

.venue-map
{
  padding: 0;
  position: relative;
  z-index: 5;
  -webkit-box-shadow: 0 4px 2px -2px rgba(0, 0, 0, 0.2);
     -moz-box-shadow: 0 4px 2px -2px rgba(0, 0, 0, 0.2);
          box-shadow: 0 4px 2px -2px rgba(0, 0, 0, 0.2);
}

.venue-map img
{
  vertical-align: bottom;
}

.venue-map
{
  margin-bottom: 26px;
}

@media only screen and (max-width: 767px)
{
  .mobile-enabled .venue-map img
  {
    width: 100%
  }

  .mobile-enabled .venue-map
  {
    padding-left: 0;
    padding-right: 0;
  }
}

The CSS file contains only the CSS that this component needs and includes any CSS for the small screen rendering of that component.

What is that called?

Another aspect of the design was to use pervasive language. The idea is that everyone at Songkick – product managers, designers, and developers – uses the same name for pages and components on the website. The advantage of having a shared language across the company comes through when talking about the site. If someone says, ‘the ticket component is broken,’ I know exactly they mean. It will correspond to a file called tickets.html.erb in the views, the page model for the component will be called Tickets; its CSS will live in stylesheets/components/tickets.css, the HTML class name on the component is tickets; any JavaScript needed for the component lives in javascript/songkick/component/tickets.js. The strong naming convention makes navigating around the project easy and makes finding dependencies very straightforward.

What does this give us?

The page/component/element structure makes deciding where to put code easier by having very strong conventions. The page models made migrating skweb onto services simpler as it provided a separation between the rendering stack and the source of the data it uses. We were able to behave like we were building on top of services when in some cases the services didn’t exist yet.

We have now also used this architecture on a new application and again the clear demarcation of responsibilities makes deciding where to put code and how to structure it easier and more predictable. That’s not say that there aren’t costs to this approach: certainly some find the shear number of files, especially for CSS, difficult to navigate. Others find the insistence on rigidly mapping names across types of files excessive. While this is somewhat down to personal taste, in our experience having a predictable structure of small files with focussed responsibilities has made it easier to maintain our codebase.