We’ve written multiple posts about how we reduced our build time and optimised our tests. Moving to continuous integration (CI) and continuous deployment (CD) allowed us to remove many of the frustrations we had with our build and deploy process. On its own a fast build was not going to move us to continuous deployment but it was a pretty significant enabler. We knew that we had plenty more that we could be improving; we were still reliant on having the right people around to sign off features before releasing and we still depended on manual testing to supplement our automation.
We wanted to avoid drawing a process diagram and then having to police it so we focused on a process which was natural to the way we worked but that improved our process as much as possible.
Don’t aim for perfection
One of our major hold-ups was our attempts to make every feature and every release perfect. We were spending days perfecting pixels and copy only to find out that the feature didn’t have the anticipated impact. There is a huge benefit in getting feedback from users on what works and what doesn’t before you invest a whole load of time in making it look perfect on multiple browsers. Over time we have moved from releasing features and then tweaking them to planning and running A/B tests to gather the information we need before we start designing the final feature.
QA has a key role to play in working with the Product and Design teams to define exactly how much breakage is acceptable. We were moving from a process where every release was tested and it was expected that almost all bugs would have been spotted and fixed before going to production. Now we were relying on our developers and our automation to keep things in a ‘good enough’ state. When something went wrong we stepped back and looked at what was missing – in most cases it was an up-front conversation about risks and expectations.
Of course this is not an excuse for having a website full of badly designed and half-working features. We accept that bugs will end up on production but we work hard to make sure they get fixed as soon as possible.
Managing how many more bugs went to production was a job for our automated tests. Reviewing all the tests as part of our ‘make all the tests fast’ overhaul started to convince us that we had decent coverage. Deciding that we were going to trust the tests gave us the freedom to say that any green build was a releasable build. If this turned out not to the be the case, either because manual testing discovered a bug or because of an issue in production then we amended the tests. Regular reviews and conversations, particularly between developers and QA, help us to keep the tests maintained and testing the right things.
Avoid red builds
Historically Songkick has had an unnatural tolerance for red builds. They didn’t appear to slow us down that much so we didn’t take the time to really try to avoid them. Once we started to seriously look at adopting continuous integration we realised that this would have to change. Frequent check-ins will only work if the builds are green. Loud and visible alerts that go to the whole team when a build fails not only means someone looks into the failure quickly but also helped us to view red builds as a delay. This coupled with having a very simple, and fast, way to run the tests on a dev environment before checking code in keeps our red builds to a minimum.
Integrate small changes frequently
A key part of CI is integrating frequently. In an ideal world you probably have everyone working off the master branch. We are careful to maintain a releasable master branch but opted for individual freedom around working on individual branches or directly off master. We like CI because it allows developers the freedom to work in a way that suits them whilst still having enough safeguards to keep the site running. Once we had a fast and painless way to integrate and release most developers naturally started integrating small changes on a more frequent basis.
Have a shared understanding of your goals
Make sure you, and everyone in the team understands what you’re trying to achieve at each stage of the build pipeline. At Songkick we expect to be able to build and test features on a local dev environment. If we discover something that forces us to test on a real test environment, such as missing data or missing services, then work gets prioritised to change that for next time.
Green builds have been tested on the CI server so we assume that a green build has the minimum required functionality to be releasable.
We use the test environment to test that the build can be deployed, and that the website works as we expect it to when running on multiple servers with lifelike data. Acceptance tests running with Selenium check that agreed business-critical functionality has not been broken. We have separated our build and deploy pipeline from feature launches so passing acceptance tests are our green flag to deploy to production.
Manual acceptance testing takes place on the production environment with the aid of feature flippers to control who can see which features. Once a feature has been tested we manually change the flipper to ‘launch’ the feature to the users.
Keep on learning
CI and CD are difficult to implement, and one of the hardest parts is imagining what the process will actually look like. Rather than trying to pin down the final process we introduced changes gradually, focusing on removing the biggest bottlenecks first. Once one bottleneck was removed it was pretty easy to see what the next one was. Speaking up when you feel frustrated along with analysing problems using the 5-Whys method has helped us improve the process to where we are today. It is fine to make a mistake but at least make sure it is an original one.