Songkick from a Tester’s point of view

Earlier this year we wrote about how we move fast but still test the code.

This was recently followed by another post about Developer happiness at Songkick which also focuses on the processes we have in place, as they provide a means to a productive working environment.

How does this all look from a tester’s point of view?

I have been asked a few times what a typical day looks like for a tester at Songkick. The post is about our processes that enable us to move fast from a tester’s point of view and how testing is integrated in our development lifecycle.

Organising our work

Teams at Songkick are organised around products and the process we follow is agile. Guided by the product manager and our team goals, we organise our sprints on a weekly basis with a prioritisation meeting. This allows us to update each other on the work in progress and determine the work that may get picked up during that week.

Prioritisation meetings also take into consideration things such as holidays and time spent doing other things (meetings, fire fighting, pairing).

On top of that we check our bug tracker, to see if any new bugs were raised that we need to act on.

Everyone in the company can raise bugs, enabling us to constantly make decisions on how to improve, not only our user facing products, but also our internal tools.

We also have daily stand ups at the beginning of each day, where we provide information on how we are getting on, and any blockers or other significant events that may impact our work positively or negatively.

Every 2 weeks we also a retrospective to assess how we are doing and what improvements we can make.


The kick-off

Sabina gave a great definition of the kick-off document here. Each feature or piece of work has a kick-off document. We try to always have a developer, product manager and tester in the conversation. More often than not we also include other developers, or experts, such as a member from tech ops or a frontline team. Frontline teams can be anyone using internal tools directly, members from our customer support team, or someone from the sales team.

Depending on the type of task; is it a technical task or a brand new feature, we use a slightly different template. The reasoning behind this is, that a technical non user facing change will require a different conversation than a user facing change.

But at the end of the day this is our source of truth, documenting, most importantly, the problem we are trying to solve, how we think we will do it, and any changes that we make to our initial plan along the way.

The kick-off conversation is where the tester can ask a tonne of questions. These range from anything about the technical implementation, potential performance issues, to what are the risks and what should our testing strategy be? Do we need to add a specific acceptance test for this feature, or are unit and integration tests enough?

A nice extra section in the document is the “Recurring bugs” section.

The recurring bugs consist of questions to make sure we are not implementing something we may have already solved and also bugs we see time and time again. These can range from field lengths and timezones, to nudges about considering how we order lists. What it doesn’t include is every bug we have ever seen. It is also not static and the section can evolve, removing certain questions or notes and adding others.

Having a recurring bugs section in a kick-off document is also great for on-boarding as you start to understand what previously has been an issue and you can ask why and what we do now to avoid it.

What’s next?

After the kick-off meeting, I personally tend to familiarise myself with where we are making the change.

For example, say we are adding a new address form to our check-out flow when you purchase tickets. I will perform a short exploratory test of this in our staging environment or on production. Anytime we do exploratory testing, we tend to record these as time-boxed test session in a lightweight format. This provides a nice record of the testing that was performed and also may lead to more questions for the kick-off document.

Once the developer(s) working on the feature have had a day or so, we do a test modelling session together.

Test Modelling

Similar to the kick-off this is an opportunity for the team to explore the new feature and how it may affect the rest of the system.

It consists of a short collaboration session, with at least a developer, tester and if applicable the design lead and/or other expert, where we mind map through test ideas, test data and scenarios.

We do this as it enables the developer to be testing early before releasing the product to a test/production environment, which in turn means we can deliver quality software and value sooner.

It is also a great way to share knowledge. Everyone who comes along brings different experiences and knowledge.

Test Model for one of our internal admin pages

Test Model for one of our internal admin pages

The collaborators work together to discuss what needs checking and what risks need exploring further.

We might also uncover questions about the feature we’re building. Sharing this before we build the feature can help us build the right feature, and save time.

For example, we recently improved one of our admin tools. During the test modelling session, we discovered a handful of questions, including some around date formats, and also default settings. By clearing these questions up early, we not only ensure that we build the right thing, but also that we build it in the most valuable way for the end user.

In this particular example, it transpired that following a certain logic for setting defaults, would not only save a lot of time, but also greatly reduce the likelihood of mistakes.

The team (mainly the developer) will use the resulting mind map for testing.

It becomes a record of test scenarios and cases we identified and covered as part of this bit of work.

As we mainly work in continuous deployment or delivery (depending on project and risk of the feature), testers often test in production using real data, to not block the deployment pipeline.

This has the advantage that the data is realistic (it is production data after all), there are no discrepancies in infrastructure, and performance can be adequately accessed.

Downsides can be that if we want to test purchases, we have to make actual purchases, which creates an overhead on the support team, as they will need to process refunds.

Testers and Bugs

Any issues we find during our testing on production or a staging environment (if we are doing continuous delivery), will be logged in our bug tracker and prioritised.

Some issues will be fixed straight away and others may be addressed at a later date.

As mentioned above, anyone at Songkick can raise issues.

If this issue relates to one of the products that your teams are working on, you (as the tester on the team(s)) will be notified and often it is good to verify the issue, ask for more information and also assess if this may be blocking the person who reported the issue, as soon as possible, or is it even an issue?

We do have guidelines to not even bother logging blockers but to come to the team directly, but this may not always be possible, so as testers we always have an eye on the bugs that are raised.

Want to know more?

In this post I described some of the common things testers at Songkick do.

Depending on the team and product there may also be other things, such as being involved in weekly performance tests, hands on mobile app testing, talking through A/B tests and coaching and educating the technology team and wider company on what testing is.

If any of that sounds interesting, we are always looking for testers. Just get in touch.

Testing iOS apps

We recently released an update to our iPhone app. The app was originally developed by a third-party, so releasing an update required bringing the app development and testing in-house. We develop our projects in a continuous build environment, with automated builds, unit and acceptance tests. It allows us to develop fast and release often, and we wanted the iPhone project to work in the same way.

This article covers some of the tools we used to handle building and testing the app.

Build Automation

We use Git for our version control system, and Jenkins for our continuous integration server. Automating the project build (i.e. building the project to check for compilation errors) seemed like a basic step and a good place to start.

A prerequisite to this was to create a Mac Jenkins Build Slave, which is outside of the scope of this blog post (but if you’re interested, I followed the “master launches slave agent via SSH” instructions of the Jenkins site).

A quick search of Jenkins plugins page revealed a Xcode plugin which allows for building Objective-C applications. Setting up the plugin was a snap – search and install the “XCode integration” plugin from the Jenkins server plugin page, point the plugin to your project directory on the build slave, enable keychain access, and save.

Now for every commit I made to the project, this task would automatically run, and send me a rude email if project compilation failed. In practice I found that this was an excellent way of reminding me of any files I had forgot to check-in to Git; the project would compile on my laptop but fail on the CI server due to missing classes, images, etc.

Unit testing

I looked briefly into the unit testing framework Apple provides, which ships with Xcode. I added a unit test project to the Songkick app, and looked into creating mocks using OCMock, an Objective-C implementation of mock objects.

We already have fairly extensive API tests to test for specific iPhone-related user-flows (such as signing up, tracking an artist, etc), and due to time constraints we opted to concentrate on building acceptance tests, and revisit unit tests if we had time.

Acceptance Testing

There are a bunch of acceptance testing applications available for iOS apps. Here’s a few of the tools I looked into in detail:


Frank is an iOS acceptance testing application which supports a Cucumber-style test syntax. I was interested in Frank as we already make use of Cucumber to test our Ruby projects, so the familiarity of the domain-specific language would have been a benefit.

I downloaded the project and got a sample test up-and-running fairly quickly. Frank ships with some useful tools, including a web inspector (“Symbiote”) which allows for inspecting app UI elements using the browser, and a “Frank console” for running ad-hoc commands against an iPhone simulator from the command line.

Frank seems to be a pretty feature rich application. The drawbacks for me were that Frank could not be run on real hardware (as of March 2013, this appears to now be possible), and Frank also requires recompiling your application to make a special “Frankified” version to work with the testing framework.


Apple provides an application called Instruments to handle testing, profiling and analysis of applications written with Xcode. Instruments allows for recording and editing UIAutomation scripts – runnable JavaScript test files for use against a simulated iOS app or a real hardware install.


Being able to launch your app with Instruments, perform some actions from within the app, and have those actions automatically converted into a runnable test script was a really quick and easy way of defining tests. Instruments also supports running scripts via the command line.

The drawback of test scripts created with Instruments is that they can be particularly verbose, and Instruments does not provide a convenient way of formatting and defining individual test files (outside of a single UIAutomation script per unique action).


Designed to be used as an accompaniment to UIAutomation scripts created using Instruments, Tuneup_js is a JavaScript library that helps to ease the pain of working with the long-winded UIAutomation syntax.

It provides a basic test structure for organising test steps, and a bunch of user-friendly assertions built on top of the standard ones supported by Instruments.


I found that recording tests in Instruments, and then converting them into the Tuneup_js test syntax was a really quick way of building acceptance tests for iOS apps. These tests could then be run using a script provided with the Tuneup_js package.


I settled on using Instruments and Tuneup_js to handle acceptance testing. Instruments because of the ability to quickly record acceptance test steps, and Tuneup_js because it could be used to wrap recorded test steps into repeatable tests and allowed for a nicer test syntax than offered out-of-the-box with UIAutomation. What was missing with these applications was a way to handle running the test files in an easily repeatable fashion, and against the iOS simulator as well as hardware devices.

I couldn’t find an existing application to do this, so I wrote Scenarios (Scenar-iOS, see what I did there?) to handle this task. Scenarios is a simple console Ruby app that performs the following steps:

  • Cleans any previous app installs from the target test device
  • Builds the latest version of the app
  • Installs the app on the target test device
  • Runs Tuneup_js-formatted tests against the installed app
  • Reports the test results

Scenarios accepts command-line parameters, such as the option to target the simulator or a hardware device (with the option of auto-detecting the hardware, or supplying a device ID). Scenarios also adds a couple of extra functions on top of the UIAutomation library:

  • withTimout – Can be used for potentially long-running calls (e.g. a button click to login, where the API call may be slow):
  • slowTap – Allows for slowing-down the speed at which taps are executed. Instruments can run test steps very fast, and sometimes it helps to slow down tests to see what they are doing, and help create a more realistic simulated user experience:

Scenarios ships with a sample project (app and tests) that can be run using the simulator or hardware. Here’s a video of the sample running on a simulator:

Jenkins Pipeline

Now I had build and acceptance tests in place, it was time to hook the tests up to Jenkins. I created the following Jenkins projects:

  • “ios-app” – runs the build automation
  • “ios-app-acceptance-tests-simulator” – runs the app (via Scenarios) on a simulator
  • “ios-app-acceptance-tests-iPhone3GS” – runs the app (via Scenarios) on an iPhone 3GS


Committing a code change to the iOS app Git repo caused the projects in the Jenkins pipeline to build the app, run the acceptance tests against the simulator, and finally run the acceptance tests on an iPhone 3GS. If any stage of the pipeline failed, I received an email informing me I had broken something.


Manual testing with TestFlight

As well as an automated setup, we also made use of the excellent TestFlight service, which enables over-the-air distribution of apps to testers. We had 12 users and 16 devices set up in TestFlight, and I was releasing builds (often daily) over-the-air. It enabled us to get some real-user feedback on the app, something that build and acceptance tests cannot replace.

Jenkins also has a TestFlight plugin, which enables you to automatically deploy a build to TestFlight as part of the pipeline. Very cool, but as we were committing code changes often throughout the day (and only wanted to release to TestFlight once a day), we decided to skip this step for the time being.

Overall, I think that the tools (both open-source and proprietary) available today for automated testing of iOS apps are feature rich (even if some are still in their infancy), and I’m pretty happy with our development setup at Songkick.

Introducing Aspec: A black box API testing DSL

Caltrak is the service that stores Songkick users’ tracked artists and cities. It has no other service dependencies. You put data into the Caltrak box, then you get it back out.

For instance, you might make two POST requests to store artist trackings, and then want to retrieve them, which would look like this:

# create and retrieve artist trackings
POST /users/7/artists/1    204
POST /users/7/artists/2    204
 GET /users/7/artists      200    application/json   [1, 2]

Did you understand basically what that was saying? I hope so, because that’s an executable spec from the Caltrak tests.

It’s pretty simple. Every line is both a request and an assertion. Every line says “If I make this request then I expect to get this back”.

This works because the behaviour of this service can be entirely described through the REST API. There are no “side affects” that are not visible through the API itself.

Here is a longer portion from the aspec file.

# no users have pending notifications
   GET /users/with-pending-notifications                200  application/json  []

# users with events on their calendar have pending notifications
  POST /users/764/metro-areas/999                       204
  POST /users/764/artists/123                           204
  POST /events/5?artist_ids=123&metro_area_id=999       204
  POST /events/5/enqueue-notifications                  204
   GET /users/with-pending-notifications                200  application/json  [[764, "ep"]]

# users are unique in the response
  POST /users/764/artists/123                           204
  POST /users/764/artists/456                           204
  POST /users/764/metro-areas/999                       204
  POST /events/5?artist_ids=123,456&metro_area_id=999   204
  POST /events/5/enqueue-notifications                  204
   GET /users/with-pending-notifications                200  application/json  [[764, "ep"]]

Some aspects:

  • Each line has the format Verb, Url (with Params), Status, Content Type, Body separated by whitespace. These are the only things that can be asserted about the service responses.
  • Each “paragraph” is a separate test. The database is cleared in-between.
  • Lines beginning with # are comments.
  • Aspec stubs time, so that the first line of the test occurs precisely on the epoch and each subsequent line occurs 2s after that. This allows us to test responses with creation timestamps in them.


When we began developing Caltrak, I wasn’t happy with the process of writing tests for this service.

I wanted the test framework to expose the simple nature of the API. You could make something almost as simple in RSpec or Cucumber with judicious use of helpers and so on, but you would end up with a DSL that obscured the underlying REST API.

In an Aspec file, there is no syntax that does not express real data either sent or received from the service. You’re basically writing down the actual HTTP requests and responses with lots of parts omitted. It is technical, but it is very readable. I think it is better documentation than most service tests.

Also, there is no context that is not immediately visible, as there might be with nested RSpec contexts, for example, where in a large test file the setup may be very distant from the test and assertion.


NB This project is very immature. Use at your own risk.

Aspec assumes your project uses Rack, and uses Rack/Test to talk to it. The code is published on GitHub and there is a tiny example API project.

It is very similar to rspec in operation. You write a .aspec file, and put an aspec_helper.rb next to it.

Then run

aspec aspec/my_service.aspec

I’d be interested in hearing your thoughts on this testing style.

Run the right tests at the right time

Way back in June, Dan Crow posted about some of the key principles that we at Songkick believe in. One that I spend some time thinking about every day is, ‘ship early, ship often’. We firmly believe that code should be shipped as soon as it’s ready. From a development point view this just makes sense. From a user’s point of view this just makes sense. From a testing point of view this proves to be a bit of a challenge.

Shipping fast doesn’t mean shipping untested code and hoping for the best. Every single thing that we release has been tested extensively. Obviously the only way we manage to ship often is by keeping the build/test/release cycle as short as possible. All builds are managed in Jenkins. Pushing code will automatically trigger our unit and integration test suites. If all the tests pass we end up with a green build which can be manually deployed to our test environment. Finally a suite of Acceptance tests run through the browser using Capybara and the Selenium Web Driver to confirm we haven’t broken any of our critical user journeys. These tests are pretty slow, taking roughly 4 minutes to run a handful of scenarios but this is the first check that the user will actually be able to interact with the website.

Only after all these tests have passed will we deploy code to Production. This applies to all new features, bug fixes and even changes to the tests themselves.

The problem

Despite our best intentions we were still struggling to ship changes as soon as they were ready:

In June 2011 we made 7 releases.

In the best case it took 3 hours to build, test and ship code. In reality we were spending around 2 days preparing each release. Something had to change.

Dan Lucraft wrote an excellent post about how we reduced the time it takes to run our tests. It feels pretty obvious to say you can increase release speed if you make your tests run faster but this was only part of the solution. Keeping the test suites fast requires constant diligence. Aiming for 100% test coverage is a distraction. Not only will you never achieve it but if you even came close then your builds would likely be taking far longer than needed to run.

Run the right tests

We took the step of identifying which features we wouldn’t want to break and plotting them against the overhead of running tests. In the case of unit tests you can pretty much add as many tests as you like without too much overhead. Integration tests need to be things that you actually care about. If you discovered a feature was broken during manual testing but wouldn’t hold a release to fix it then you shouldn’t have an automated test for that feature in your build (well, unless it was a super quick unit test).

An example of this is our automatic tweets when authenticated users mark their attendance to an event. It is a valid and highly used service that we wouldn’t want to be without but it is not business critical. If we were to have an automated test for this we would need a test which set up a user who appears authenticated with Twitter. The test user would then mark their attendance to an event and the test would need to check whether the tweet was fired for the correct event.

Not only is that a fair bit of work to write and maintain but the resulting test would be pretty slow to execute. The alternative, to push to production and monitor errors in the logs whilst also keeping an eye on the Songkick twitter feed (something we’re already monitoring) means we have one fewer test to run and maintain. The feedback comes later (post release rather than pre) but since we wouldn’t hold a release even if we knew that we had broken this feature then actual time to fix is roughly the same.

At the right time

To allow the team to ship fast we need to keep the release channel clear. Builds run through the test suites as cleanly and as quickly as possible to free up the channel for the next release. Part of our process involves establishing up-front how we will test a code change. Usually this will mean adding or modifying automated tests to cover the new functionality. However some of our changes need more than just an automated build run against them so we needed to come up with a way to separate testing from the actual releases.

Our solution was to use what we call Flippers, additional code which lets admins control whether a feature is visible to users. We can then turn features on and off on the live site without needing to make additional releases. As well as giving us a fast way to turn off problem features this has the benefit of allowing us to turn features on for a particular type of user. High risk or extensively changed features are released to production behind a flipper that makes them visible to admin users only. This means we can run the code on the live servers, using live data but test them as if we were working on a test environment.

Fix bugs fast

One problem with testing code on Production is that the bugs you find are also on Production. Obviously many of these bugs aren’t visible to users thanks to to the flippers but there will always be some bugs in live code. Our approach is a cultural one: yes, we move fast and accept that things might break, but we don’t leave them like that. We fix bugs as fast as possible.

Sounds interesting but does it work?

We spent 12 months looking at our tests, our process and probably ourselves. Changes were made and in June 2012 we made 113 releases. 14 of those were on the same day. In fact we released on every single working day that month (and there were a few sneaky weekend releases too!).

The client side of SOA

This article is part of a series on Songkick’s migration to a service-oriented architecture. The full series:

Following on from my previous article on what our backend services look like, it’s time to talk about the client side. How do our user-facing applications use the services, and how is it different from using ActiveRecord?

The nice thing about Rails is it doesn’t force you into using ActiveRecord. If you do, then a lot of conveniences are made available to you, but you’re really free to do whatever you want in your Rails controllers. So, instead of speaking to ActiveRecord models, our applications make HTTP calls to several backend services.

HTTP, do you speak it?

The first bit of the problem is, how do we make HTTP calls? We want this to be extremely convenient for people writing application code, which means avoiding as much boilerplate as possible. We don’t want application code cluttered with stuff like this:

uri = URI.parse("http://accounts-service/users/#{name}")
http =, uri.port)
response = http.request_get(uri.path)
if response.code == '200'
  raise NotFound

when we could just write:


And that’s the simple case. When making HTTP calls, you have to deal with a lot of complexity: serializing parameters, query strings vs entity bodies, multipart uploads, content types, service hostname lookups, keep-alive or not, response parsing and several classes of error detection: DNS failure, refused connections, timeouts, HTTP failure responses, user input validation errors, malformed or interrupted output formats… and good luck changing all that if you want to change which HTTP library you want to use.

So, the first thing we did is create an abstract HTTP API with several implementations, and released it as open-source. Songkick::Transport gives us a terse HTTP interface with backends based on Curb, HTTParty and Rack::Test, all with the same high-level feature set. This lets us switch HTTP library easily, and we’ve used this to tweak the performance of our internal code.

You use it by making a connection to a host, and issuing requests. It assumes anything but a 200, 201, 204 or 409 is a software error and raises an exception, otherwise it parses the response for you and returns it:

http ='http://accounts-service')
user = http.get('/users/jcoglan').data
# => {'id' => 18787, 'username' => 'jcoglan'}

Songkick::Transport also has some useful reporting facilities built into it, for example it makes it easy to record all the backend service requests made during a single call to our user-facing Rails app, and log the total time spent calling services, much like Rails does for DB calls. More details in the README.

Who needs FakeWeb?

The nice thing about having a simple flat API for doing HTTP means it’s really easy to test clients built on top of Songkick::Transport, as opposed to something like FakeWeb that fakes the whole complicated Net::HTTP interface. In each application, we have clients built on top of Songkick::Transport that take an HTTP client as a constructor argument. When they make an HTTP call, they wrap the response data in a model object, which allows the application to shield itself from potential changes to the API wire format.

module Services
  class AccountsClient
    def initialize(http_client)
      @http = http_client
    def find_user(username)
      data = @http.get("/users/#{username}").data

module Models
  class User
    def initialize(data)
      @data = data

    def username

This approach makes it really easy to stub out the response of a backend service for a test:

before do
  @http   = mock('Transport')
  @client =

it "returns a User" do
  response = mock('Response', :data => {'username' => 'jcoglan'})
  @client.find_user('jcoglan').username.should == 'jcoglan'

It also makes mock-based testing really easy:

it "tells the service to delete a User" do

Being able to stub HTTP calls like this is very powerful, especially when query strings or entity bodies are involved. Your backend probably treats foo=bar&something=else and something=else&foo=bar the same, and it’s much easier to mock/stub on such parameter sets when they’re expressed as a hash, as in

http.get '/', :foo => 'bar', :something => 'else'

rather than as an order-sensitive string:

http.get '/?foo=bar&something=else'

It’s also worth noting that the models are basically inert data objects, and in many cases they are immutable values. They don’t know anything about the services, or any other I/O device, they just accept and expose data. This means you can use real data objects in other tests, rather than hard-to-maintain fakes, and still your tests run fast.

Convenience vs flexibility

Nice as it is to be able to choose which HTTP implementation you use, most of the time the application developer does not want to write

http   ='http://accounts-service')
client =
user   = client.find_user(params[:username])

every time they need to look up a record. The flexibility helps with testing and deployment concerns, but it’s not convenient. So, we put a layer of sugar over these flexible building blocks that means most of the things an application needs to do are one-liners. We have a Services module that provides canonical instances of all the service clients; it deals with knowing which hostnames to connect to, which HTTP library to use, and which client object to construct for each service.

module Services
  def self.accounts
    @accounts ||= begin
      http ='http://accounts-service')

With this layer of sugar, getting a user account is one line:

user = Services.accounts.find_user(params[:username])

In our Cucumber tests, we tend to stub out methods on these canonical instances, or make a Services method return an entirely fake instance. The cukes are not complete full-stack tests; they are integration tests of the current project, rather than of the entire stack, and the lack of backend I/O keeps them very fast. The stability of the underlying service APIs means we aren’t taking a big risk with these fakes, and we have a few acceptance tests that run against our staging and production sites to make sure we don’t break anything really important.

What about error handling?

We want it to be as easy as possible to deal with errors, since messy error handling can hamper the maintainability of a project and introduce mistakes that make things harder for end users. For this reason, we made anything but 200, 201, 204 or 409 from a backend raise an exception, for example if the accounts service returns a 404 for this call, an exception is raised:


The exception raised by Songkick::Transport contains information about the request and response. This means you can put a catch-all error handler in your Rails or Sinatra app to catch Songkick::Transport::HttpError, and forward the 404 from the backend out to the user. The removes a lot of error handling code from the application.

In some cases though, you don’t want this behaviour. For example, say we’re rendering an artist’s page and we have a sidebar module showing related artists. If the main artist gives a 404, then the whole page response should be a 404. But if we can’t get the related artists, or their profile images, then we don’t want the whole page to fail, just that sidebar module. Such cases tend to be the minority in our applications, and it’s easy enough to catch the service exception and render nothing if the services backing a non-core component fail. Using an object model of our user interface helps to isolate these failures, and we hope to cover that in a future post.

Repeat after me: sometimes, you should repeat yourself

One open question when we moved to this model was: should we maintain client libraries for each service, or just make whatever calls we need in each application? The DRY principle suggests the former is obviously the best, but it’s worth asking this question if you do a project like this.

We went with the latter, for several reasons. First, since the services and Songkick::Transport encapsulate a lot of business and wire logic, the client and model classes in each application end up being pretty thin wrappers, and it isn’t hard to build just what you need in each project. Second, we got burned by having too many things depending on in-process Ruby APIs, where any change to a shared library would require us to re-test and re-start all downstream applications. This coupling tended to slow us down, and we found that sharing in-process code isn’t worth it unless it’s encapsulating substantial complexity.

Each application is free to tweak how it interacts with the service APIs, without affecting any other application, and this is a big win for us. It means no change to one application can have side effects or block work on another application, and we have’t actually found ourselves reinventing substantial pieces of logic since that’s all hidden behind the HTTP APIs.

And finally, having per-application service clients gives you a really accessible picture of what data each application actually relies on. Having one catch-all domain library made this sort of reasoning really difficult, and made it hard to assess the cost of changing anything.

Wrapping up

So that’s our architecture these days. If you decide to go down this route, remember there’s no ‘one right way’ to do things. You have to make trade-offs all the time, and the textbook engineering answer doesn’t always give your team the greatest velocity. Examine why you’re making each change, focus on long-term productivity, and you won’t go far wrong.

From 15 hours to 15 seconds: reducing a crushing build time

Over the past year we have reduced our website test suite build time by over 99.9%.

  • Build time a year ago: 15 hours.
    Across 15 EC2 build slaves it took “only” 1 hour of real time.

  • Build time today: 15 seconds
    On my laptop.

Having a build that took over an hour to run crippled the productivity of our team.

So, how did we make such a drastic improvement? There were no quick fixes, though Lord knows we tried to find them. Instead we have had to completely change the way we test.

Rather than any brilliant new techniques, there were instead three big mistakes that we had made that created such a monster build time. We went down a wrong path, and it took a lot of time and effort to fix it later.

Bad Practice #1: We favoured integration tests over unit tests

We used to be extremely thorough in our integration tests. We used them to test everything, usually instead of unit tests, which were comparatively thin on the ground. Since integration tests are far, far slower than unit tests, this caused a lot of unnecessary work.

To fix this we looked at each integration test in turn and either:

  • ditched it (i.e. we increased our tolerance for broken things in exchange for having a faster build)
  • rewrote it as a unit test on a specific class
  • kept it, as we still needed a few integration tests for each component

Bad Practice #2: We had many, many features that were relatively unimportant

Many of the less used or less strategic features on have gone. This was an extremely painful decision to make, and we made it for bigger reasons than just improving our build time. But it certainly improved the build time a lot.

Fixing this and the previous point have turned a library of 1642 Cucumber scenarios into just 200.

Bad Practice #3: Our integration tests were actually acceptance tests

This test suite used to integrate over our website, domain library, database, message bus and background job workers. Each was spun up as a separate process in the test environment. We basically ran all our website tests against a working version of our entire system. Remember I said we tested virtually every code path? This added up to a lot of time.

Nowadays, our website integration tests are really only integration tests. They integrate over code inside a single project. Every interface to another project is stubbed.

All our database access code is isolated in a domain library behind a thin service layer and is stubbed in the website project.

Instead of over a thousand acceptance tests, we now have fewer than 10. They run against our staging and production environments, after deployment, instead of slowly booting up a test environment during the build.

Six months later

Productivity is up! Morale is up! It’s amazing just how much a faster build has improved our working experience.

Remember that the suite described above was only one of our builds. We had multiple projects with builds that took more than 30 minutes to run. Now none of our test builds take longer than 5 minutes, which is now considered “slow”.

These mistakes are far clearer to us in hindsight than they were at the time, so I’d recommend looking carefully to make sure you are not infected by any of these bad practices. If you are, take the time to fix the problem. If you’re not, congratulations on avoiding the pitfalls we fell in to!

PS: Bonus quick fixes for reading this far

Ok, so there were one or two things we did that probably count as Quick Fixes:

  • tweak our Ruby interpreter’s GC settings by trial and error (we use REE).
  • run the build entirely on the tmpfs in-memory file system

Both of these gave surprisingly significant improvements, and accounted for perhaps 10% of the speedup.