Introducing Aspec: A black box API testing DSL

Caltrak is the service that stores Songkick users’ tracked artists and cities. It has no other service dependencies. You put data into the Caltrak box, then you get it back out.

For instance, you might make two POST requests to store artist trackings, and then want to retrieve them, which would look like this:

# create and retrieve artist trackings
POST /users/7/artists/1    204
POST /users/7/artists/2    204
 GET /users/7/artists      200    application/json   [1, 2]

Did you understand basically what that was saying? I hope so, because that’s an executable spec from the Caltrak tests.

It’s pretty simple. Every line is both a request and an assertion. Every line says “If I make this request then I expect to get this back”.

This works because the behaviour of this service can be entirely described through the REST API. There are no “side affects” that are not visible through the API itself.

Here is a longer portion from the aspec file.

# no users have pending notifications
   GET /users/with-pending-notifications                200  application/json  []

# users with events on their calendar have pending notifications
  POST /users/764/metro-areas/999                       204
  POST /users/764/artists/123                           204
  POST /events/5?artist_ids=123&metro_area_id=999       204
  POST /events/5/enqueue-notifications                  204
   GET /users/with-pending-notifications                200  application/json  [[764, "ep"]]

# users are unique in the response
  POST /users/764/artists/123                           204
  POST /users/764/artists/456                           204
  POST /users/764/metro-areas/999                       204
  POST /events/5?artist_ids=123,456&metro_area_id=999   204
  POST /events/5/enqueue-notifications                  204
   GET /users/with-pending-notifications                200  application/json  [[764, "ep"]]

Some aspects:

  • Each line has the format Verb, Url (with Params), Status, Content Type, Body separated by whitespace. These are the only things that can be asserted about the service responses.
  • Each “paragraph” is a separate test. The database is cleared in-between.
  • Lines beginning with # are comments.
  • Aspec stubs time, so that the first line of the test occurs precisely on the epoch and each subsequent line occurs 2s after that. This allows us to test responses with creation timestamps in them.

Motivation

When we began developing Caltrak, I wasn’t happy with the process of writing tests for this service.

I wanted the test framework to expose the simple nature of the API. You could make something almost as simple in RSpec or Cucumber with judicious use of helpers and so on, but you would end up with a DSL that obscured the underlying REST API.

In an Aspec file, there is no syntax that does not express real data either sent or received from the service. You’re basically writing down the actual HTTP requests and responses with lots of parts omitted. It is technical, but it is very readable. I think it is better documentation than most service tests.

Also, there is no context that is not immediately visible, as there might be with nested RSpec contexts, for example, where in a large test file the setup may be very distant from the test and assertion.

Implementation

NB This project is very immature. Use at your own risk.

Aspec assumes your project uses Rack, and uses Rack/Test to talk to it. The code is published on GitHub and there is a tiny example API project.

It is very similar to rspec in operation. You write a .aspec file, and put an aspec_helper.rb next to it.

Then run

aspec aspec/my_service.aspec

I’d be interested in hearing your thoughts on this testing style.

Run the right tests at the right time

Way back in June, Dan Crow posted about some of the key principles that we at Songkick believe in. One that I spend some time thinking about every day is, ‘ship early, ship often’. We firmly believe that code should be shipped as soon as it’s ready. From a development point view this just makes sense. From a user’s point of view this just makes sense. From a testing point of view this proves to be a bit of a challenge.

Shipping fast doesn’t mean shipping untested code and hoping for the best. Every single thing that we release has been tested extensively. Obviously the only way we manage to ship often is by keeping the build/test/release cycle as short as possible. All builds are managed in Jenkins. Pushing code will automatically trigger our unit and integration test suites. If all the tests pass we end up with a green build which can be manually deployed to our test environment. Finally a suite of Acceptance tests run through the browser using Capybara and the Selenium Web Driver to confirm we haven’t broken any of our critical user journeys. These tests are pretty slow, taking roughly 4 minutes to run a handful of scenarios but this is the first check that the user will actually be able to interact with the website.

Only after all these tests have passed will we deploy code to Production. This applies to all new features, bug fixes and even changes to the tests themselves.

The problem

Despite our best intentions we were still struggling to ship changes as soon as they were ready:

In June 2011 we made 7 releases.

In the best case it took 3 hours to build, test and ship code. In reality we were spending around 2 days preparing each release. Something had to change.

Dan Lucraft wrote an excellent post about how we reduced the time it takes to run our tests. It feels pretty obvious to say you can increase release speed if you make your tests run faster but this was only part of the solution. Keeping the test suites fast requires constant diligence. Aiming for 100% test coverage is a distraction. Not only will you never achieve it but if you even came close then your builds would likely be taking far longer than needed to run.

Run the right tests

We took the step of identifying which features we wouldn’t want to break and plotting them against the overhead of running tests. In the case of unit tests you can pretty much add as many tests as you like without too much overhead. Integration tests need to be things that you actually care about. If you discovered a feature was broken during manual testing but wouldn’t hold a release to fix it then you shouldn’t have an automated test for that feature in your build (well, unless it was a super quick unit test).

An example of this is our automatic tweets when authenticated users mark their attendance to an event. It is a valid and highly used service that we wouldn’t want to be without but it is not business critical. If we were to have an automated test for this we would need a test which set up a user who appears authenticated with Twitter. The test user would then mark their attendance to an event and the test would need to check whether the tweet was fired for the correct event.

Not only is that a fair bit of work to write and maintain but the resulting test would be pretty slow to execute. The alternative, to push to production and monitor errors in the logs whilst also keeping an eye on the Songkick twitter feed (something we’re already monitoring) means we have one fewer test to run and maintain. The feedback comes later (post release rather than pre) but since we wouldn’t hold a release even if we knew that we had broken this feature then actual time to fix is roughly the same.

At the right time

To allow the team to ship fast we need to keep the release channel clear. Builds run through the test suites as cleanly and as quickly as possible to free up the channel for the next release. Part of our process involves establishing up-front how we will test a code change. Usually this will mean adding or modifying automated tests to cover the new functionality. However some of our changes need more than just an automated build run against them so we needed to come up with a way to separate testing from the actual releases.

Our solution was to use what we call Flippers, additional code which lets admins control whether a feature is visible to users. We can then turn features on and off on the live site without needing to make additional releases. As well as giving us a fast way to turn off problem features this has the benefit of allowing us to turn features on for a particular type of user. High risk or extensively changed features are released to production behind a flipper that makes them visible to admin users only. This means we can run the code on the live servers, using live data but test them as if we were working on a test environment.

Fix bugs fast

One problem with testing code on Production is that the bugs you find are also on Production. Obviously many of these bugs aren’t visible to users thanks to to the flippers but there will always be some bugs in live code. Our approach is a cultural one: yes, we move fast and accept that things might break, but we don’t leave them like that. We fix bugs as fast as possible.

Sounds interesting but does it work?

We spent 12 months looking at our tests, our process and probably ourselves. Changes were made and in June 2012 we made 113 releases. 14 of those were on the same day. In fact we released on every single working day that month (and there were a few sneaky weekend releases too!).

The client side of SOA

This article is part of a series on Songkick’s migration to a service-oriented architecture. The full series:

Following on from my previous article on what our backend services look like, it’s time to talk about the client side. How do our user-facing applications use the services, and how is it different from using ActiveRecord?

The nice thing about Rails is it doesn’t force you into using ActiveRecord. If you do, then a lot of conveniences are made available to you, but you’re really free to do whatever you want in your Rails controllers. So, instead of speaking to ActiveRecord models, our applications make HTTP calls to several backend services.

HTTP, do you speak it?

The first bit of the problem is, how do we make HTTP calls? We want this to be extremely convenient for people writing application code, which means avoiding as much boilerplate as possible. We don’t want application code cluttered with stuff like this:

uri = URI.parse("http://accounts-service/users/#{name}")
http = Net::HTTP.new(uri.host, uri.port)
response = http.request_get(uri.path)
if response.code == '200'
  JSON.parse(response.body)
else
  raise NotFound
end

when we could just write:

http_client.get("/users/#{name}").data

And that’s the simple case. When making HTTP calls, you have to deal with a lot of complexity: serializing parameters, query strings vs entity bodies, multipart uploads, content types, service hostname lookups, keep-alive or not, response parsing and several classes of error detection: DNS failure, refused connections, timeouts, HTTP failure responses, user input validation errors, malformed or interrupted output formats… and good luck changing all that if you want to change which HTTP library you want to use.

So, the first thing we did is create an abstract HTTP API with several implementations, and released it as open-source. Songkick::Transport gives us a terse HTTP interface with backends based on Curb, HTTParty and Rack::Test, all with the same high-level feature set. This lets us switch HTTP library easily, and we’ve used this to tweak the performance of our internal code.

You use it by making a connection to a host, and issuing requests. It assumes anything but a 200, 201, 204 or 409 is a software error and raises an exception, otherwise it parses the response for you and returns it:

http = Songkick::Transport::Curb.new('http://accounts-service')
user = http.get('/users/jcoglan').data
# => {'id' => 18787, 'username' => 'jcoglan'}

Songkick::Transport also has some useful reporting facilities built into it, for example it makes it easy to record all the backend service requests made during a single call to our user-facing Rails app, and log the total time spent calling services, much like Rails does for DB calls. More details in the README.

Who needs FakeWeb?

The nice thing about having a simple flat API for doing HTTP means it’s really easy to test clients built on top of Songkick::Transport, as opposed to something like FakeWeb that fakes the whole complicated Net::HTTP interface. In each application, we have clients built on top of Songkick::Transport that take an HTTP client as a constructor argument. When they make an HTTP call, they wrap the response data in a model object, which allows the application to shield itself from potential changes to the API wire format.

module Services
  class AccountsClient
    def initialize(http_client)
      @http = http_client
    end
    
    def find_user(username)
      data = @http.get("/users/#{username}").data
      Models::User.new(data)
    end
  end
end

module Models
  class User
    def initialize(data)
      @data = data
    end

    def username
      @data['username']
    end
  end
end

This approach makes it really easy to stub out the response of a backend service for a test:

before do
  @http   = mock('Transport')
  @client = Services::AccountsClient.new(@http)
end

it "returns a User" do
  response = mock('Response', :data => {'username' => 'jcoglan'})
  @http.stub(:get).with('/users/jcoglan').and_return(response)
  @client.find_user('jcoglan').username.should == 'jcoglan'
end

It also makes mock-based testing really easy:

it "tells the service to delete a User" do
  @http.should_receive(:delete).with('/users/jcoglan')
  @client.delete_user('jcoglan')
end

Being able to stub HTTP calls like this is very powerful, especially when query strings or entity bodies are involved. Your backend probably treats foo=bar&something=else and something=else&foo=bar the same, and it’s much easier to mock/stub on such parameter sets when they’re expressed as a hash, as in

http.get '/', :foo => 'bar', :something => 'else'

rather than as an order-sensitive string:

http.get '/?foo=bar&something=else'

It’s also worth noting that the models are basically inert data objects, and in many cases they are immutable values. They don’t know anything about the services, or any other I/O device, they just accept and expose data. This means you can use real data objects in other tests, rather than hard-to-maintain fakes, and still your tests run fast.

Convenience vs flexibility

Nice as it is to be able to choose which HTTP implementation you use, most of the time the application developer does not want to write

http   = Songkick::Transport::Curb.new('http://accounts-service')
client = Services::AccountsClient.new(http)
user   = client.find_user(params[:username])

every time they need to look up a record. The flexibility helps with testing and deployment concerns, but it’s not convenient. So, we put a layer of sugar over these flexible building blocks that means most of the things an application needs to do are one-liners. We have a Services module that provides canonical instances of all the service clients; it deals with knowing which hostnames to connect to, which HTTP library to use, and which client object to construct for each service.

module Services
  def self.accounts
    @accounts ||= begin
      http = Songkick::Transport::Curb.new('http://accounts-service')
      AccountsClient.new(http)
    end
  end
end

With this layer of sugar, getting a user account is one line:

user = Services.accounts.find_user(params[:username])

In our Cucumber tests, we tend to stub out methods on these canonical instances, or make a Services method return an entirely fake instance. The cukes are not complete full-stack tests; they are integration tests of the current project, rather than of the entire stack, and the lack of backend I/O keeps them very fast. The stability of the underlying service APIs means we aren’t taking a big risk with these fakes, and we have a few acceptance tests that run against our staging and production sites to make sure we don’t break anything really important.

What about error handling?

We want it to be as easy as possible to deal with errors, since messy error handling can hamper the maintainability of a project and introduce mistakes that make things harder for end users. For this reason, we made anything but 200, 201, 204 or 409 from a backend raise an exception, for example if the accounts service returns a 404 for this call, an exception is raised:

Services.accounts.find_user('santa_claus')

The exception raised by Songkick::Transport contains information about the request and response. This means you can put a catch-all error handler in your Rails or Sinatra app to catch Songkick::Transport::HttpError, and forward the 404 from the backend out to the user. The removes a lot of error handling code from the application.

In some cases though, you don’t want this behaviour. For example, say we’re rendering an artist’s page and we have a sidebar module showing related artists. If the main artist gives a 404, then the whole page response should be a 404. But if we can’t get the related artists, or their profile images, then we don’t want the whole page to fail, just that sidebar module. Such cases tend to be the minority in our applications, and it’s easy enough to catch the service exception and render nothing if the services backing a non-core component fail. Using an object model of our user interface helps to isolate these failures, and we hope to cover that in a future post.

Repeat after me: sometimes, you should repeat yourself

One open question when we moved to this model was: should we maintain client libraries for each service, or just make whatever calls we need in each application? The DRY principle suggests the former is obviously the best, but it’s worth asking this question if you do a project like this.

We went with the latter, for several reasons. First, since the services and Songkick::Transport encapsulate a lot of business and wire logic, the client and model classes in each application end up being pretty thin wrappers, and it isn’t hard to build just what you need in each project. Second, we got burned by having too many things depending on in-process Ruby APIs, where any change to a shared library would require us to re-test and re-start all downstream applications. This coupling tended to slow us down, and we found that sharing in-process code isn’t worth it unless it’s encapsulating substantial complexity.

Each application is free to tweak how it interacts with the service APIs, without affecting any other application, and this is a big win for us. It means no change to one application can have side effects or block work on another application, and we have’t actually found ourselves reinventing substantial pieces of logic since that’s all hidden behind the HTTP APIs.

And finally, having per-application service clients gives you a really accessible picture of what data each application actually relies on. Having one catch-all domain library made this sort of reasoning really difficult, and made it hard to assess the cost of changing anything.

Wrapping up

So that’s our architecture these days. If you decide to go down this route, remember there’s no ‘one right way’ to do things. You have to make trade-offs all the time, and the textbook engineering answer doesn’t always give your team the greatest velocity. Examine why you’re making each change, focus on long-term productivity, and you won’t go far wrong.

From 15 hours to 15 seconds: reducing a crushing build time

Over the past year we have reduced our website test suite build time by over 99.9%.

  • Build time a year ago: 15 hours.
    Across 15 EC2 build slaves it took “only” 1 hour of real time.

  • Build time today: 15 seconds
    On my laptop.

Having a build that took over an hour to run crippled the productivity of our team.

So, how did we make such a drastic improvement? There were no quick fixes, though Lord knows we tried to find them. Instead we have had to completely change the way we test.

Rather than any brilliant new techniques, there were instead three big mistakes that we had made that created such a monster build time. We went down a wrong path, and it took a lot of time and effort to fix it later.

Bad Practice #1: We favoured integration tests over unit tests

We used to be extremely thorough in our integration tests. We used them to test everything, usually instead of unit tests, which were comparatively thin on the ground. Since integration tests are far, far slower than unit tests, this caused a lot of unnecessary work.

To fix this we looked at each integration test in turn and either:

  • ditched it (i.e. we increased our tolerance for broken things in exchange for having a faster build)
  • rewrote it as a unit test on a specific class
  • kept it, as we still needed a few integration tests for each component

Bad Practice #2: We had many, many features that were relatively unimportant

Many of the less used or less strategic features on songkick.com have gone. This was an extremely painful decision to make, and we made it for bigger reasons than just improving our build time. But it certainly improved the build time a lot.

Fixing this and the previous point have turned a library of 1642 Cucumber scenarios into just 200.

Bad Practice #3: Our integration tests were actually acceptance tests

This test suite used to integrate over our website, domain library, database, message bus and background job workers. Each was spun up as a separate process in the test environment. We basically ran all our website tests against a working version of our entire system. Remember I said we tested virtually every code path? This added up to a lot of time.

Nowadays, our website integration tests are really only integration tests. They integrate over code inside a single project. Every interface to another project is stubbed.

All our database access code is isolated in a domain library behind a thin service layer and is stubbed in the website project.

Instead of over a thousand acceptance tests, we now have fewer than 10. They run against our staging and production environments, after deployment, instead of slowly booting up a test environment during the build.

Six months later

Productivity is up! Morale is up! It’s amazing just how much a faster build has improved our working experience.

Remember that the suite described above was only one of our builds. We had multiple projects with builds that took more than 30 minutes to run. Now none of our test builds take longer than 5 minutes, which is now considered “slow”.

These mistakes are far clearer to us in hindsight than they were at the time, so I’d recommend looking carefully to make sure you are not infected by any of these bad practices. If you are, take the time to fix the problem. If you’re not, congratulations on avoiding the pitfalls we fell in to!

PS: Bonus quick fixes for reading this far

Ok, so there were one or two things we did that probably count as Quick Fixes:

  • tweak our Ruby interpreter’s GC settings by trial and error (we use REE).
  • run the build entirely on the tmpfs in-memory file system

Both of these gave surprisingly significant improvements, and accounted for perhaps 10% of the speedup.