From 15 hours to 15 seconds: reducing a crushing build time

Over the past year we have reduced our website test suite build time by over 99.9%.

  • Build time a year ago: 15 hours.
    Across 15 EC2 build slaves it took “only” 1 hour of real time.

  • Build time today: 15 seconds
    On my laptop.

Having a build that took over an hour to run crippled the productivity of our team.

So, how did we make such a drastic improvement? There were no quick fixes, though Lord knows we tried to find them. Instead we have had to completely change the way we test.

Rather than any brilliant new techniques, there were instead three big mistakes that we had made that created such a monster build time. We went down a wrong path, and it took a lot of time and effort to fix it later.

Bad Practice #1: We favoured integration tests over unit tests

We used to be extremely thorough in our integration tests. We used them to test everything, usually instead of unit tests, which were comparatively thin on the ground. Since integration tests are far, far slower than unit tests, this caused a lot of unnecessary work.

To fix this we looked at each integration test in turn and either:

  • ditched it (i.e. we increased our tolerance for broken things in exchange for having a faster build)
  • rewrote it as a unit test on a specific class
  • kept it, as we still needed a few integration tests for each component

Bad Practice #2: We had many, many features that were relatively unimportant

Many of the less used or less strategic features on songkick.com have gone. This was an extremely painful decision to make, and we made it for bigger reasons than just improving our build time. But it certainly improved the build time a lot.

Fixing this and the previous point have turned a library of 1642 Cucumber scenarios into just 200.

Bad Practice #3: Our integration tests were actually acceptance tests

This test suite used to integrate over our website, domain library, database, message bus and background job workers. Each was spun up as a separate process in the test environment. We basically ran all our website tests against a working version of our entire system. Remember I said we tested virtually every code path? This added up to a lot of time.

Nowadays, our website integration tests are really only integration tests. They integrate over code inside a single project. Every interface to another project is stubbed.

All our database access code is isolated in a domain library behind a thin service layer and is stubbed in the website project.

Instead of over a thousand acceptance tests, we now have fewer than 10. They run against our staging and production environments, after deployment, instead of slowly booting up a test environment during the build.

Six months later

Productivity is up! Morale is up! It’s amazing just how much a faster build has improved our working experience.

Remember that the suite described above was only one of our builds. We had multiple projects with builds that took more than 30 minutes to run. Now none of our test builds take longer than 5 minutes, which is now considered “slow”.

These mistakes are far clearer to us in hindsight than they were at the time, so I’d recommend looking carefully to make sure you are not infected by any of these bad practices. If you are, take the time to fix the problem. If you’re not, congratulations on avoiding the pitfalls we fell in to!

PS: Bonus quick fixes for reading this far

Ok, so there were one or two things we did that probably count as Quick Fixes:

  • tweak our Ruby interpreter’s GC settings by trial and error (we use REE).
  • run the build entirely on the tmpfs in-memory file system

Both of these gave surprisingly significant improvements, and accounted for perhaps 10% of the speedup.

15 thoughts on “From 15 hours to 15 seconds: reducing a crushing build time

  1. If you are limiting acceptance tests, how are you then defining your specifications? Are you not using Cucumber and high level BDD?

  2. I’m currently reading “Specification by Example” where your cited as an example of a company which has benefited from using a BDD style executable specification approach. Do “bad practices” 1 & 2 reflect a change in opinion on the matter? (Not that I’m disagreeing with the idea that Unit tests should do the bulk of the work).

    Would you say that having so many acceptance tests was a bad thing, or that running them on every build was a bad thing?

    I’d be fascinated to hear your opinions.

  3. Aidy Lewis: We do not always use Cucumber, It depends what we are working on. Not using Cucumber does not imply we do not do High level BDD. Cucumber is just a tool. I think what has happened is we have matured as a dev team and have become more confident in deciding what to do per situation. For example we have some of our internal dev services documented in Cucumber because the team working on it wanted to.

    Julian Haeger: I don’t think it discounts what BDD did to get us too where we are. But like anything as well as good stuff we did some bad things and made mistakes. I don’t think those mistakes tarnish BDD.

    Having so many acceptance test was a bad thing because it was covering too much of a very large, complex application. A very large system being covered by lots of acceptance tests was a bad thing. Breaking that system up and having more focused acceptance tests was a very good thing.

  4. Very interesting post. Thanks for sharing!

    1. Could you share the number of cucumber features before and after?

    2. Are you thinking to use another tool for your “BDD Spec by Example”, turnip maybe?

    3. Would you mind sharing sample from your spec_helper(s) ? Your unit tests must be very fast and I guess you are not loading Rails.

    Are you using NullDB to stub access to the DBs.

    Thanks

  5. Hi Jean-Michel

    1. We had 1642 cucumber scenarios before and 200 after. Some of those will have been Scenario Outlines, each containing maybe 5 Scenarios, so the numbers are higher. I don’t know how many Features there were.

    2. No, we’re sticking with Cucumber.

    3. We have 916 unit tests, in rspec. They take 13s to run from start to finish, of which 4s is Rails booting time. Our spec_helper has nothing interesting in it, as we are loading Rails as normal.

    We aren’t using NullDB, we have moved all our database access to behind a set of REST services, so our website project does not connect to any database any more.

    best
    Dan

  6. Hang on, you’ve deleted 87% of your tests, and a tonne of code and you’re build time has increased… wow

  7. Pingback: 078 RR Hexagonal Rails with Matt Wynne and Kevin Rutherford

  8. Alex:

    “Reality looks much more obvious in hindsight than in foresight. People who experience hindsight bias misapply current hindsight to past foresight. They perceive events that occurred to have been more predictable before the fact than was actually the case.”

    — Hersh Shefrin, Finance and the Psychology of Investing

    “Learn from the mistakes of others. You can’t live long enough to make them all yourself. ”

    — Eleanor Roosevelt

    Or you could just mock them, I guess.

  9. Very interesting results.

    One of the problems that I observed using Specification by Example approach is that with the time you will have so many Executable Specifications that the costs in time and maintenance will be an overkill to the team, and it’s very common that lots of executable specification touch the same parts of the system. So observing your results it’s a interesting idea to review those tests after some time and maybe fragment them into unity tests, or just disable a bunch of them. Don’t you feel like losing your living documentation though?

    Congratulations for the team.

  10. Pingback: A Smattering of Selenium #133 « Official Selenium Blog

  11. Pingback: Tea-Driven Development :: Optimising a slow build? You’re solving the wrong problem

  12. Pingback: Coding Is Like Cooking » Blog Archive » Joseph Wilk on Acceptance Testing in a Startup

  13. Well,

    I like to have an acceptance test to guide the development of each line of production code (ATDD, BDD, Especification by Example etc.). And I also like to test as many as possible. So I would not use an strategy like yours. I will give an example of one project here.

    – (1)unit (fast) tests: execution time: 3 seconds.
    – (2)acceptance tests using the layer behind the UI and stubbing the database: execution time: 10 seconds.
    – (3)acceptance tests using the layer behind the UI and the real database: execution time: 10 minutes.
    – (4)acceptance tests using the real application: execution time: 2 hour.

    It is about 1260 acceptance tests. At every moment developers use the tests in (1) and (2). Sometimes during the day they use the (3). The tests in (4) are only executed at the end of the day. In CI all the tests from (1) to (3) are executed at every commit.

    It seems a lot of work to develop three kinds of acceptance tests, but in practice it is not. And it is not because they share the same step definitions.

    Abraços,
    Josué
    @josuesantos