Songkick from a Tester’s point of view

Earlier this year we wrote about how we move fast but still test the code.

This was recently followed by another post about Developer happiness at Songkick which also focuses on the processes we have in place, as they provide a means to a productive working environment.

How does this all look from a tester’s point of view?

I have been asked a few times what a typical day looks like for a tester at Songkick. The post is about our processes that enable us to move fast from a tester’s point of view and how testing is integrated in our development lifecycle.

Organising our work

Teams at Songkick are organised around products and the process we follow is agile. Guided by the product manager and our team goals, we organise our sprints on a weekly basis with a prioritisation meeting. This allows us to update each other on the work in progress and determine the work that may get picked up during that week.

Prioritisation meetings also take into consideration things such as holidays and time spent doing other things (meetings, fire fighting, pairing).

On top of that we check our bug tracker, to see if any new bugs were raised that we need to act on.

Everyone in the company can raise bugs, enabling us to constantly make decisions on how to improve, not only our user facing products, but also our internal tools.

We also have daily stand ups at the beginning of each day, where we provide information on how we are getting on, and any blockers or other significant events that may impact our work positively or negatively.

Every 2 weeks we also a retrospective to assess how we are doing and what improvements we can make.


The kick-off

Sabina gave a great definition of the kick-off document here. Each feature or piece of work has a kick-off document. We try to always have a developer, product manager and tester in the conversation. More often than not we also include other developers, or experts, such as a member from tech ops or a frontline team. Frontline teams can be anyone using internal tools directly, members from our customer support team, or someone from the sales team.

Depending on the type of task; is it a technical task or a brand new feature, we use a slightly different template. The reasoning behind this is, that a technical non user facing change will require a different conversation than a user facing change.

But at the end of the day this is our source of truth, documenting, most importantly, the problem we are trying to solve, how we think we will do it, and any changes that we make to our initial plan along the way.

The kick-off conversation is where the tester can ask a tonne of questions. These range from anything about the technical implementation, potential performance issues, to what are the risks and what should our testing strategy be? Do we need to add a specific acceptance test for this feature, or are unit and integration tests enough?

A nice extra section in the document is the “Recurring bugs” section.

The recurring bugs consist of questions to make sure we are not implementing something we may have already solved and also bugs we see time and time again. These can range from field lengths and timezones, to nudges about considering how we order lists. What it doesn’t include is every bug we have ever seen. It is also not static and the section can evolve, removing certain questions or notes and adding others.

Having a recurring bugs section in a kick-off document is also great for on-boarding as you start to understand what previously has been an issue and you can ask why and what we do now to avoid it.

What’s next?

After the kick-off meeting, I personally tend to familiarise myself with where we are making the change.

For example, say we are adding a new address form to our check-out flow when you purchase tickets. I will perform a short exploratory test of this in our staging environment or on production. Anytime we do exploratory testing, we tend to record these as time-boxed test session in a lightweight format. This provides a nice record of the testing that was performed and also may lead to more questions for the kick-off document.

Once the developer(s) working on the feature have had a day or so, we do a test modelling session together.

Test Modelling

Similar to the kick-off this is an opportunity for the team to explore the new feature and how it may affect the rest of the system.

It consists of a short collaboration session, with at least a developer, tester and if applicable the design lead and/or other expert, where we mind map through test ideas, test data and scenarios.

We do this as it enables the developer to be testing early before releasing the product to a test/production environment, which in turn means we can deliver quality software and value sooner.

It is also a great way to share knowledge. Everyone who comes along brings different experiences and knowledge.

Test Model for one of our internal admin pages

Test Model for one of our internal admin pages

The collaborators work together to discuss what needs checking and what risks need exploring further.

We might also uncover questions about the feature we’re building. Sharing this before we build the feature can help us build the right feature, and save time.

For example, we recently improved one of our admin tools. During the test modelling session, we discovered a handful of questions, including some around date formats, and also default settings. By clearing these questions up early, we not only ensure that we build the right thing, but also that we build it in the most valuable way for the end user.

In this particular example, it transpired that following a certain logic for setting defaults, would not only save a lot of time, but also greatly reduce the likelihood of mistakes.

The team (mainly the developer) will use the resulting mind map for testing.

It becomes a record of test scenarios and cases we identified and covered as part of this bit of work.

As we mainly work in continuous deployment or delivery (depending on project and risk of the feature), testers often test in production using real data, to not block the deployment pipeline.

This has the advantage that the data is realistic (it is production data after all), there are no discrepancies in infrastructure, and performance can be adequately accessed.

Downsides can be that if we want to test purchases, we have to make actual purchases, which creates an overhead on the support team, as they will need to process refunds.

Testers and Bugs

Any issues we find during our testing on production or a staging environment (if we are doing continuous delivery), will be logged in our bug tracker and prioritised.

Some issues will be fixed straight away and others may be addressed at a later date.

As mentioned above, anyone at Songkick can raise issues.

If this issue relates to one of the products that your teams are working on, you (as the tester on the team(s)) will be notified and often it is good to verify the issue, ask for more information and also assess if this may be blocking the person who reported the issue, as soon as possible, or is it even an issue?

We do have guidelines to not even bother logging blockers but to come to the team directly, but this may not always be possible, so as testers we always have an eye on the bugs that are raised.

Want to know more?

In this post I described some of the common things testers at Songkick do.

Depending on the team and product there may also be other things, such as being involved in weekly performance tests, hands on mobile app testing, talking through A/B tests and coaching and educating the technology team and wider company on what testing is.

If any of that sounds interesting, we are always looking for testers. Just get in touch.

How Docker is changing the way we develop, test & ship apps at Songkick

We’re really excited to have shipped our first app that uses Docker throughout our entire release cycle; from development, through to running tests on our CI server, and finally to our production environment. This article explains a bit about why we came to choose Docker, how we’re using it, and the benefits it brings.

Since Songkick and Crowdsurge merged last year we’ve had a mix of infrastructures, and in a long-term quest to consolidate platforms we’ve been looking at how to create a great development experience that would work cross-platform. We started by asking what a great development environment looks like, and came up with the following requirements:

  • Isolate dependencies (trying to run two different versions of a language or database on the same machine isn’t fun!)
  • Match production accurately
  • Fast to set up, and fast to work with day-to-day
  • Simple to use (think make run)
  • Easy for developers to change

We’ve aspired to created a development environment that gets out of the way and allows developers to focus on building great products. We believe that if you want a happy, productive development team it’s essential to get this right, and with the right decisions and a bit of work Docker is a great tool to achieve that.

We’ve broken down some advice and examples of how we’re using Docker for one of our new internal apps.

Install the Docker Toolbox

The Docker Toolbox provides you with all the right tools to work with Docker on Mac or Windows.

A few of us have also been playing with Docker for Mac that provides a more native experience. It’s still in beta but it’s a fantastic step forwards compared to the Docker toolbox and docker-machine.

Use VMWare Fusion instead of Virtualbox

Although Docker Toolbox comes with Virtualbox included, we chose to use VMWare Fusion instead. File change notifications are significantly better using VMWare Fusion, allowing features like Rails auto-reloading to work properly.

Creating a different Docker machine is simple:

Use existing services where possible

In development we connect directly to our staging database, removing a set of dependencies (running a local database, seeding structure and data) and giving us a useful, rich dataset to develop against.

Having a production-like set of data to develop and test against is really important, helping us catch bugs, edge-cases and data-related UX problems early.

Test in isolation

For testing we use docker-compose to run the tests against an ephemeral local database, making our tests fast and reliable.

Because you may not want to run your entire test suite each time, we also have a test shell ideal for running specific sets of tests:

Proper development tooling

As well as running the Ruby web server through Docker, we also provide a development shell container, aliased for convenience. This is great for trying out commands in the Rails console or installing new gems without needing Ruby or other dependencies on your Mac.

Use separate Dockerfiles for development and production

We build our development and production images slightly differently. They both declare the same system dependencies but differ in how they install gems and handle assets. Let’s run through each one and see how they work:

Here we deliberately copy the Gemfile, corresponding lock file and the vendor/cache directory first, then run bundle install.

When steps in the Dockerfile change, Docker only re-runs that step and steps after. This means we only run bundle install when there’s a change to the Gemfile or the cached gems, but when other files in the app change we can skip this step, significantly speeding up build time.

We deliberately chose to cache the gems rather than install afresh from each time for three reasons. First, it removes a deployment dependency–when you’re deploying several times a day it’s not great having to rely on more external services than necessary. Second, it means we don’t have to authenticate for installing private or Git-based gems from inside containers. Finally, it’s also much faster installing gems from the filesystem, using the –local flag to avoid hitting Rubygems altogether.

For production we install our gems differently, skipping test and development groups and precompiling assets into the image.


To release this image we tag it as the latest version, as well as the git SHA. This is then pushed to our private ECR.

We deliberately deploy that specific version of the image, meaning rolling back is as simple re-deploying a previous version from Jenkins.

Running in production

For running containers in production, we’re doing the simplest possible thing–using Docker to solve a dependency management problem only.

We’re running one container per node, using host networking and managing the process using upstart. When deploying we simply tell the upstart service to restart, which pulls the relevant image from the registry, stops the existing container and starts the new one.

This isn’t the most scalable or resource-efficient way of running containers but for a low-traffic internal app it’s a great balance of simplicity and effectiveness.

Next steps

One thing we’re still missing on production is downtime-less deploys. Amazon’s ECS handles this automatically (by spinning up a new pool of containers before automatically swapping them out in the load balancer) so we’re looking to move towards using that instead.

We’re still learning a lot about using Docker but so far it’s been a powerful, reliable and enjoyable tool to use for both developers and ops.

Recent talks on Songkick Engineering

Since I joined Songkick a little over four years ago, our development team has done some amazing things. Our technology, process and culture have improved an enormous amount.

We’ve always been eager to share our progress on this blog and elsewhere, and we often talk about what we’ve learned and where we are still trying to improve.

Here are some recent talks given by members of our team discussing various aspects of how we work.

Testing iOS apps

We recently released an update to our iPhone app. The app was originally developed by a third-party, so releasing an update required bringing the app development and testing in-house. We develop our projects in a continuous build environment, with automated builds, unit and acceptance tests. It allows us to develop fast and release often, and we wanted the iPhone project to work in the same way.

This article covers some of the tools we used to handle building and testing the app.

Build Automation

We use Git for our version control system, and Jenkins for our continuous integration server. Automating the project build (i.e. building the project to check for compilation errors) seemed like a basic step and a good place to start.

A prerequisite to this was to create a Mac Jenkins Build Slave, which is outside of the scope of this blog post (but if you’re interested, I followed the “master launches slave agent via SSH” instructions of the Jenkins site).

A quick search of Jenkins plugins page revealed a Xcode plugin which allows for building Objective-C applications. Setting up the plugin was a snap – search and install the “XCode integration” plugin from the Jenkins server plugin page, point the plugin to your project directory on the build slave, enable keychain access, and save.

Now for every commit I made to the project, this task would automatically run, and send me a rude email if project compilation failed. In practice I found that this was an excellent way of reminding me of any files I had forgot to check-in to Git; the project would compile on my laptop but fail on the CI server due to missing classes, images, etc.

Unit testing

I looked briefly into the unit testing framework Apple provides, which ships with Xcode. I added a unit test project to the Songkick app, and looked into creating mocks using OCMock, an Objective-C implementation of mock objects.

We already have fairly extensive API tests to test for specific iPhone-related user-flows (such as signing up, tracking an artist, etc), and due to time constraints we opted to concentrate on building acceptance tests, and revisit unit tests if we had time.

Acceptance Testing

There are a bunch of acceptance testing applications available for iOS apps. Here’s a few of the tools I looked into in detail:


Frank is an iOS acceptance testing application which supports a Cucumber-style test syntax. I was interested in Frank as we already make use of Cucumber to test our Ruby projects, so the familiarity of the domain-specific language would have been a benefit.

I downloaded the project and got a sample test up-and-running fairly quickly. Frank ships with some useful tools, including a web inspector (“Symbiote”) which allows for inspecting app UI elements using the browser, and a “Frank console” for running ad-hoc commands against an iPhone simulator from the command line.

Frank seems to be a pretty feature rich application. The drawbacks for me were that Frank could not be run on real hardware (as of March 2013, this appears to now be possible), and Frank also requires recompiling your application to make a special “Frankified” version to work with the testing framework.


Apple provides an application called Instruments to handle testing, profiling and analysis of applications written with Xcode. Instruments allows for recording and editing UIAutomation scripts – runnable JavaScript test files for use against a simulated iOS app or a real hardware install.


Being able to launch your app with Instruments, perform some actions from within the app, and have those actions automatically converted into a runnable test script was a really quick and easy way of defining tests. Instruments also supports running scripts via the command line.

The drawback of test scripts created with Instruments is that they can be particularly verbose, and Instruments does not provide a convenient way of formatting and defining individual test files (outside of a single UIAutomation script per unique action).


Designed to be used as an accompaniment to UIAutomation scripts created using Instruments, Tuneup_js is a JavaScript library that helps to ease the pain of working with the long-winded UIAutomation syntax.

It provides a basic test structure for organising test steps, and a bunch of user-friendly assertions built on top of the standard ones supported by Instruments.


I found that recording tests in Instruments, and then converting them into the Tuneup_js test syntax was a really quick way of building acceptance tests for iOS apps. These tests could then be run using a script provided with the Tuneup_js package.


I settled on using Instruments and Tuneup_js to handle acceptance testing. Instruments because of the ability to quickly record acceptance test steps, and Tuneup_js because it could be used to wrap recorded test steps into repeatable tests and allowed for a nicer test syntax than offered out-of-the-box with UIAutomation. What was missing with these applications was a way to handle running the test files in an easily repeatable fashion, and against the iOS simulator as well as hardware devices.

I couldn’t find an existing application to do this, so I wrote Scenarios (Scenar-iOS, see what I did there?) to handle this task. Scenarios is a simple console Ruby app that performs the following steps:

  • Cleans any previous app installs from the target test device
  • Builds the latest version of the app
  • Installs the app on the target test device
  • Runs Tuneup_js-formatted tests against the installed app
  • Reports the test results

Scenarios accepts command-line parameters, such as the option to target the simulator or a hardware device (with the option of auto-detecting the hardware, or supplying a device ID). Scenarios also adds a couple of extra functions on top of the UIAutomation library:

  • withTimout – Can be used for potentially long-running calls (e.g. a button click to login, where the API call may be slow):
  • slowTap – Allows for slowing-down the speed at which taps are executed. Instruments can run test steps very fast, and sometimes it helps to slow down tests to see what they are doing, and help create a more realistic simulated user experience:

Scenarios ships with a sample project (app and tests) that can be run using the simulator or hardware. Here’s a video of the sample running on a simulator:

Jenkins Pipeline

Now I had build and acceptance tests in place, it was time to hook the tests up to Jenkins. I created the following Jenkins projects:

  • “ios-app” – runs the build automation
  • “ios-app-acceptance-tests-simulator” – runs the app (via Scenarios) on a simulator
  • “ios-app-acceptance-tests-iPhone3GS” – runs the app (via Scenarios) on an iPhone 3GS


Committing a code change to the iOS app Git repo caused the projects in the Jenkins pipeline to build the app, run the acceptance tests against the simulator, and finally run the acceptance tests on an iPhone 3GS. If any stage of the pipeline failed, I received an email informing me I had broken something.


Manual testing with TestFlight

As well as an automated setup, we also made use of the excellent TestFlight service, which enables over-the-air distribution of apps to testers. We had 12 users and 16 devices set up in TestFlight, and I was releasing builds (often daily) over-the-air. It enabled us to get some real-user feedback on the app, something that build and acceptance tests cannot replace.

Jenkins also has a TestFlight plugin, which enables you to automatically deploy a build to TestFlight as part of the pipeline. Very cool, but as we were committing code changes often throughout the day (and only wanted to release to TestFlight once a day), we decided to skip this step for the time being.

Overall, I think that the tools (both open-source and proprietary) available today for automated testing of iOS apps are feature rich (even if some are still in their infancy), and I’m pretty happy with our development setup at Songkick.

Testing your database backups: the test environment database refresh pattern

When did you last try restoring your database backups? A month ago, a week ago? A year ago? Never? When was the last time you refreshed the data in your test environments? When I joined Songkick, one of the first things I asked was when we last tested a restore of our database backups. The answer, pleasingly, was at 03:00 UK time that morning and not coincidentally, that’s when we last refreshed the data in our test environments.

Here’s how we get the warm and fuzzy feeling of knowing that our backups contain data that can be restored and makes sense.

  1. Every morning, our database servers run their scheduled backups, copying the resulting images to a backup server in the data centre.
  2. Overnight those backups get copied to the office, giving us an offsite copy.
  3. In the small hours, when most of us are asleep, each of the database servers in our staging environment retrieve the backups, erase their local data files and then restore the production backups over the top of them.
  4. We perform sanitisation on the data, to make it suitable for use in a testing environment.
  5. And finally, and most importantly, we use the databases in our testing.

By doing this, we identified one case when our backups seemed to work, produced plausible looking backups, but MySQL failed to apply InnoDB log records during recovery. It was inconvenient to discover this problem in our staging environment, but far less inconvenient than discovering it when we needed the backups to put our production system back into operation.

Here are some practical tips based on our experience implementing and managing this system at Songkick:

Back all databases up at the same time

If your system is composed of services backed by independent databases on different machines, it’s possible that there’s some implicit consistency between them. For example, a common situation at Songkick is to have an accounts service responsible for storing user accounts and another service that stores user data keyed against a user, then there’s an expectation that those databases have some degree of consistency.

If you back them up at different times, you’ll find inconsistencies, that a service might have a reference to a user that doesn’t yet exist. If the ID of the user is exposed to other services and that ID can be reused, you may find that newly created users in your test environment have existing data associated with them and this can cause significant problems in testing.

It’s worth noting that, in the case of a production restore, these issues would need to be diagnosed and solved in the heat of the moment. By finding them in your test environment, you’re giving yourself the space to solve them earlier, under less pressure.

Design the backups to be regularly exercised

Some types of backups are more amenable to being restored regularly in test environments. For example, our initial MongoDB database backups performed snapshots of our MongoDB database path. These proved difficult to restore, because they included local databases which contained information on replica set membership. This means that on startup, our staging MongoDB server would forget its existing replica set membership and try to talk to the production servers instead.

We switched to using mongodump to take a logical export of the database, simply so that we could restore it on the primary member of our existing staging replica set and update the entire replica set.

Sanitisation tips

After we’ve restored the databases, there are certain things we do to make them safe and usable in our testing environments.

  • Remove or obfuscate email addresses. We’re not fond of accidentally emailing people with test events we’ve created in staging, so we change people’s email addresses to be unusable, so that can’t happen. We leave people’s email addresses alone if they work at Songkick, so we can test email features by emailing ourselves.
  • Remove or obfuscate payment tokens. If it’s uncool to accidentally email people, accidentally charging them is positively hostile. Anything that’s used for payment needs to be removed.
  • Fix or replace information about the environment. It’s best to avoid keeping references to your technical environment in the same database as your application data, but sometimes it’s tricky to workaround. For example, our MogileFS installation needs to be kept in sync with our production one, to avoid problems with missing media. This means that we need to manually update the database to substitute the hostnames of the mogilefs servers.

Write code that can withstand the database going away

Unless you’ve put some work in, almost no database driver will gracefully handle the disappearance of a database server and then its re-appearance some time later. If the restore in your test environment is the first time you’ve tried this, you may find that you need to manually restart services, even after the database re-appears on the network.

The solution will vary depending on the database client being used, but often it’s a case of catching an exception, or changing some options when you establish the connection.

By making your applications reconnect to the database with no manual input, you are again fixing a problem that will eventually occur in production – a much more stressful time for it to be diagnosed and fixed.


Testing your database backups by restoring them automatically and regularly in your test environments is a great way to battle-harden your backups and applications and to make sure that your test environment looks like the real production environment.

If you’ve liked what you’ve read, why not head over to our jobs page? We’re looking for a Systems Engineer to add more touches like these to our infrastructure.

Safely dealing with magical text

Boy, what a week it’s been. A remote-code-execution bug was discovered in Ruby on Rails, and we’ve all been scrambling to patch our servers (please patch your apps before reading any further, there is an automated exploit out there that gives people a shell on your boxes otherwise).

What the Ruby community, and those of other dynamic languages, must realize from recent Rails security blunders is that very similar problems can easily exist in any non-trivial web application. Indeed, I found a remote-execution bug in my own open-source project Faye yesterday, 3.5 years into the life of the project (again: patch before reading on).

There are a lot of lessons to be had from recent Rails security blunders, since they involve so many co-operating factors: excessive trust of user input, insufficient input validation and output encoding, the behavioural capabilities of Ruby objects and certain Rails classes, ignorance of cryptography and the computational complexity of data transport formats. In this post I’d like to focus on one in particular: safely encoding data for output and execution.

Ugh, do I have to?

I know, I know, booooooring, but so many people are still getting this really badly wrong and it continues punish end users by exposing their data to malicious manipulation.

Robert Hansen and Meredith Patterson have a really good slide deck on stopping injection attacks with computational theory. One core message in that paper is that injection exploits (including SQL injection and cross-site scripting) involve crafting input such that it creates new and unexpected syntactic elements in code executed by the software, essentially introducing new instructions for the software to execute. Let’s look at a simple example.

Learn you a query string

I found the code that prompted me to write this post while updating some Google Maps URLs on our site this afternoon. Some of this code was constructing URLs by doing something like this:

You can see the intent here: whoever wrote this code assumes the URL is going to end up being embedded in HTML, and so they have encoded the query string delimiters as & entities. But this doesn’t fix the problem entities are designed to solve, namely: safely representing characters that usually have special meaning in HTML. What is telling is that the comma in the query string should really also be encoded as %2C, but isn’t.

So although the ampersands are being encoded, the actual query data is not, and that means anyone calling this function can use it to inject HTML, for example:

By abusing the maps_url() function, I have managed to inject characters with special meaning — <, >, etc. — into the output and thereby added new HTML elements to the output that shouldn’t be there. By passing unexpected input I’ve created a lovely little cross-site scripting exploit and stolen all your users’ sessions!

Note that you cannot cleanly fix this by using an HTML-escaping function like ERB::Util.h() on the output of maps_url(), because this would serve to re-encode the ampersands, leaving strings like &amp;amp; in the href attribute.

Stacks of languages

Meredith Patterson of the above-linked paper gave another presentation at 28C3 called The Science of Insecurity. I’ve been telling absolutely everyone to watch it recently, so here it is.

This talk describes how we should think of data transfer formats, network protocols and the like as languages, because in fact that’s what they are. It covers the different levels of language power – regular languages, context-free languages and Turing-complete languages – and how use of each affects the security of our systems. It also explains why, if your application relies on Turing-complete protocols, it will take an infinite amount of time to secure it.

When you build HTML pages, you are using a handful of languages that all run together in the same document. There’s HTML itself, and embedded URLs, and CSS, and JavaScript, and JavaScript embedded in CSS, and CSS selectors embedded in CSS and JavaScript, and base64 encoded images, and … well this list is long. All of these are languages and have formal definitions about how to parse them, and your browser needs to know which type of data it’s dealing with whenever it’s parsing your code.

Every character of output you generate is an instruction that tells the browser what do next. If it’s parsing an HTML attribute and sees the " character, it truncates the attribute at that point. If it thinks it’s reading a text node and sees a <, it starts parsing the input as an HTML tag.

Instead of thinking of your pages as data, you should think of them as executable language.

Back to reality

Let’s apply this idea to our URL:

Outside of an HTML document, the meaning of this list of characters changes: those &amp; blobs only have meaning when interpreting HTML, and if we treat this query string verbatim we get these parameters out:

(This assumes your URL parser doesn’t treat ; as a value delimiter, or complain that the comma is not encoded.)

We’ve seen what happens when we embed HTML-related characters in the URL: inserting the characters "> chops the <a> tag short and allows injection of new HTML elements. But that behaviour comes from HTML, not from anything about URLs; when the browser is parsing an href attribute, it just reads until it hits the closing quote symbol and then HTML-decodes whatever it read up to that point to get the attribute value. It could be a URL, or any other text value, the browser does not care. At that level of parsing, it only matters that the text is HTML-encoded.

In fact, you could have a query string like foo=true&bar="> and parsing it with a URL parser will give you the data {'foo' => 'true', 'bar' => '">'}. The characters "> mean something in the HTML language, but not in the query string language.

So, we have a stack of languages, each nested inside the other. Symbols with no special meaning at one level can gain meaning at the next. What to do?

Stacks of encodings

What we’re really doing here is taking a value and putting it into a query string inside a URL, then putting that URL inside an HTML document.

At each layer, the template views the value being injected in as an opaque string — it deosn’t care what it is, it just needs to make sure it’s encoded properly. The problem with our original example is that it pre-emptively applies HTML encoding to data because it anticipates that the value will be used in HTML, but does not apply encodings relevant to the task at hand, namely URL construction. This is precisely backwards: considering the problem as above we see that we should instead:

  1. Decide what type of string we’re creating — is it a URL, an HTML doc, etc.
  2. Apply all encoding relevant to the type of string being made
  3. Do not apply encodings for languages further up the stack

In other words, we should make a URL-constructing function apply URL-related encoding to its inputs, and an HTML-constructing function should apply HTML encoding. This means each layer’s functions can be recombined with others and still work correctly, becasue their outputs don’t make assumptions about where they will be used. So we would rewrite our code as:

Now we see that we get two valid pieces of data: url is a valid URL with all its query parameters correctly encoded but no HTML entities present, and html is a valid HTML fragment with its attributes correctly entity-encoded.

Also, note how we have treated all incoming data as literal (i.e. not already encoded for the task at hand), and we have not hand-written any encoding ourselves (e.g. hand-writing entities like &amp;). You should deal with data assuming it contains the literal information it represents and use library functions to encode it correctly. There’s a very good chance you don’t know all the text transformations required by each layer.

Thinking in types

At this point you’re probably thinking that I’ve made something quite simple seem very complicated. But thinking in terms of types of strings, treating your output as a language stack and following the bullet list above is a good discipline to follow if you want to make sure you handle data safely.

There are some systems that do this for you, for example Rails 3 automatically HTML-escapes any value you insert into an ERB template by default. I’m working on a more general version of this idea: Coping is a templating language that checks your templates conform to the language you’re producing, and doesn’t let input introduce new syntactic elements.

If you’re feeling very brave, I recommend taking the Coursera Compilers course. Although it doesn’t seem immediately relevant to web devs, many concepts from parser theory, type checking and code generation can be applied to security and are well worth learning.

Above all, learn from other people’s security failures and consider where you may have made similar mistakes.

validates_uniqueness_of :nothing

Warning: this article contains rather a lot of silly decisions.

I’ve recently been working out some bugs in our OAuth implementation, including our OAuth2::Provider library. One of the biggest gotchas I found while diagnosing problems with our client apps was the existence of duplicate Authorization records.

An Authorization is a link between a ResouceOwner (i.e. a Songkick user) and a Client, for example our iPhone application. It represents that the user has granted the client access to their resources on Songkick. There should only be one of these per owner-client pair, and somehow we had a few thousand duplicates in our database. Getting more concrete, the table’s columns include the following:

Each combination of values for these three columns must only appear once in the table.

A series of unfortunate events

Now the Rails Way to make such guarantees is to use validates_uniqueness_of, or use a find_or_create_by_* call to check if something exists before creating it. And that’s basically what I’d done; OAuth2::Provider has a method called Authorization.for(owner, client) that would either find a suitable record or create a new one.

But despite implementing this, we were still getting duplicates. I removed an alternative code path for getting Authorization records, and still the duplicates continued. I figured something in our applications must be creating them, so I made new() and create() private on the Authorization model. No dice.

And then I remembered: concurrency! Trying to enforce uniqueness on the client doesn’t work, unless all the clients subscribe to a distributed decision-making protocol. If two requests are in flight, both can run a SELECT query, find there’s no existing record, and then both decide to create the record. Something like this:

This may look familiar to you. In fact, I lifted straight out of the ActiveRecord source where it explains why validates_uniqueness_ofdoesn’t work when you have concurrent requests.

Users do the funniest things

I agree with you – in theory. In theory, communism works. In theory.

— Homer J. Simpson

There can be a tendency among some programmers to dismiss these arguments as things that probably won’t be a problem in practice. Why would two requests arrive at the same time, close enough to cause this race condition in the database, for the same user’s resources? This is the same thinking that tells you timing attacks are impossible over the Internet.

And I subscribed to this belief for a long time. Not that I thought it was impossible, I just thought there were likelier causes – hence all the attempts to shut down record creation code paths. But I was wrong, and here’s why:

People double-click on things on the Web.

Over time, we designers of software systems have instilled some confusing habits in the people who use our products, and one of those habits means that there is a set of people that always double-click links and form buttons on web pages. Looking at the updated_at timestamps on the duplicate records showed that most of them were modified very close together in time, certainly close enough to cause database race conditions. This fact by itself makes client-enforced uniqueness checks a waste of time. Even if you’re not getting a lot of requests, one little user action can blow your validation.

This is the database’s job

Here’s how this thing should be done, even if you think you’re not at risk:

Then, when you try to create a record, you should catch the potential exception that this index will through if the new record violates the uniqueness constraint. Rails 3 introduced a new exception called ActiveRecord::RecordNotUnique for its core adapters, but if you’re still supporting older Rails versions you need to catch ActiveRecord::StatementInvalid and check the error message. Here’s how our OAuth library does things.

In the Authorization.for(owner, client) method, there’s a rescue clause that uses duplicate_record_error? to check the exception raised. If it’s a duplicate record error, we retry the method call since the second time it should find the new record that was inserted since the first call started.

Get your objects out of my session

Last week I had the pleasant job of fixing a feature that broke due to a change in a third-party API. Specifically, Twitter changed part of their authentication API and this broke our ‘post your attendance to Twitter’ feature. After a while spelunking through several layers of HTTP indirection inside the twitter and oauth gems, it became apparent that an upgrade was in order – we implemented this feature so long ago that our twitter gem was lagging four major releases behind the current version.

But this isn’t about Twitter, or OAuth, or even those specific Ruby libraries. It’s about an antipattern I was reminded of while updating our code and reading the OAuth gem documentation. Here is how it suggests you start the authorization process in your Twitter client app:

This code contains a bug that’s bitten me so many times it jumped right off the page:

Here’s the bug: you just stored the Marshal.dump of some random object in the session. One day, you will refactor this object – change its class name, adjust its instance variables – and next time you deploy, no-one will be able to access your site. It doesn’t matter whether the session is stored in the cookie (and therefore on the user’s computer) or on your servers, the problem is that you’ve stored a representation of state that’s tightly coupled to its implementation.

A simple example

Let’s see this in action. Imagine we have a little Sinatra app with two endpoints. One of these endpoints puts an object in the session, and another one retrieves data from the stored object:

We boot the app, and see that it works:

A little change

So, this seems to work, and we leave the site running like this for a while, and people visit the site and create sessions. Then one day we decide we need to refactor the State class, by changing that hash into an array:

Now if we retry our request we find this buried among the stack traces:

A peek at Rack’s guts

To understand why this happens you need to see how Rack represents the session. Basically, it takes the session hash, such as {:state => => 'sign_up')}, runs it through Marshal.dump and base64-encodes the result. Here’s what Marshal emits:

Marshal produces a literal representation of the object – its class, its instance variables and their values. It is a snapshot of the object that can be completely reconstructed later through Marshal.load.

When you store objects in the session, you are dumping part of your program’s implementation into storage and, if you use cookie-stored sessions, sending that representation to the user for them to give back later. Now, fortunately, cookies are signed by Rack using HMAC-SHA1 so the user should not be able to construct arbitrary Marshal output and inject objects into your program – don’t forget to set :session_secret unless you want people sending forged objects to you! But there is still the problem that your code is effectively injecting objects into processes running in the future, when those objects may no longer be valid.

If you change the name of a class, then Marshal.load will fail, and you’ll get an empty session object. But if all the types referenced in the session dump still exist, it will happily reconstruct all those objects and their state may not reflect what the current process expects.

And as a bonus, once you’ve deployed the session-breaking change, you can’t revert it, because recent visitors will have the new representation in their session. We’ve got various classes in our codebase with multiple names to work around times when we made this mistake.

A better way

In light of the above, you should treat your sessions with a certain degree of paranoia. You should treat them with the same care as a public API, making sure you only put stable representations of state into them. Personally I stick to Ruby’s core data types – strings, numbers, booleans, arrays, hashes. I don’t put user-defined classes (including anything from stdlib or gems) in there. Similarly, you should not assume any given session key exists, since the session may become corrupt, the user may delete their cookies, and so on. Always check for nil values before using any session data, unless you want your site to become unreachable.

A future-proof Twitter client

So how should you use the Twitter gem and avoid these problems? Easy – just store the credentials from the request token, and reconstruct the token when Twitter calls you back:

Note how we only store strings in the session and the database, and we store just enough of the credentials that we can construct an OAuth or Twitter client later, whenever we need one.

This approach only stores stable representations – tokens used in the OAuth protocol – and constructs objects by hand when they are needed rather than relying on Marshal dumps. This makes the application more resilient when the libraries you depend on inevitably need upgrading.

A month at Songkick

I love Songkick.

Not in a soppy “no you hang up first” kinda way, but in a “I haven’t missed a great gig in over a year” way. Which is why when I was given the opportunity to work here, I jumped at it.

After working at Songkick for a few weeks now, I thought I’d write about my experiences so far, from the interview process through to day-to-day development.

Here are the six simple steps I took to Songkick happiness.

Step 1 – Network

I’ve been a fan of Songkick’s service for a long time, and after I met some of the team at the Silicon Milkroundabout event in May 2012, I was invited to start the interview process. This was great news (Songkick are awesome[1]).

Initially, I did have a few concerns about my technical compatibility with the company; I’ve spent the last few years in a Windows and .NET environment, and Songkick are a long way from that. I was soon to find that these worries were misplaced.

Step 2 – Code

To kick off the interview process, I received an email from Songkick – “Hey Aaron, You seem pretty rad, fancy taking a technical test?”. At least that’s how I remember it.

The rules:

  • Complete an hour long programming challenge
  • From home, at a time that suited you
  • In a programming language of your choice

I let them know when I could set an hour aside, and at the agreed time I was emailed a PDF describing the challenge. I can’t give too much away, but the challenge was really interesting, and Songkick-specific.

I hacked away in C#, making use of third-party libraries as required, and after the hour was up, emailed my solution. I didn’t have time to fully complete the challenge, but I had concentrated on getting a clean design, stubbing all core interfaces, classes and methods, and adding comments and pseudo-code where necessary.

After a few days, I received an email informing me that I was through to round two.

Step 3 – More Code

I was invited to have a couple of face-to-face interviews, and sit another coding test. This time I was to complete a 90-minute pair-programming exercise, in Ruby.

The test was a little daunting as I was a complete Ruby novice. However, with it being a pair-programming exercise, I had a friendly developer (Sabrina) sitting with me to help with syntax questions. Any time I was unaware of the syntax in Ruby (quite a lot!), I could scribble on a notepad how I would solve the problem in C#, and Sabrina would show me the equivalent syntax in Ruby.

This was a test-driven development exercise, and I was introduced to the challenge with a brief overview of the task, and a collection of failing Cucumber tests. I wrote code to gradually pass each test, until all passed – and in the nick of time too. I had a couple of minutes to discuss my solution and what I would add to it if I had more time, and the 1.5 hours were up.

Step 4 – Meet and Greet

As a firm believer in The Joel Test, I agree that writing code during the interview process is important, but equally important is the rapport between yourself and your potential colleagues.

During the interview process, I met a large percentage of the company over a number of interviews, including a coffee and chat with the entire development team. It’s pretty intimidating stuff, but it gives both parties the opportunity to make sure each will be a good fit for the other.

After a few more days of waiting, I received the call I was hoping for.

Step 5 – On-boarding

Joining Songkick was a super-smooth operation. We run a tight ship (as I was to find out), and my first few days were as follows.

Day 1

I spent the morning being shown around the office: an open plan environment with everything a professional developer needs to maintain a high level of productivity (ping pong table, foosball table, a fully-stocked kitchen and a proper coffee machine).

I was provided with a mentor for the week – Robin. Having someone to sit with you, explain the development environment and application design really helped me to become productive quickly. In fact, I made my first code commit on day one.

Day 2 & 3

I spent the next two days divided between coding (with Robin) and various presentations from the different departments in Songkick. These ranged from the data science team (who handle making sense of the huge amounts of data we have), to QA and infrastructure.

Day 4

The whole company boarded a vintage Routemaster bus, and we were taken to End of the Road festival for the weekend. Did I mention Songkick are awesome[1]?

Step 6 – Develop

By far the biggest change (and probably worry) in my move to Songkick was the development environment. I’ve been working in a .NET ecosystem for a number of years, the framework is stable and Visual Studio is in my opinion, a great IDE; it’s feature-rich and has some useful plugins. On the other hand, Songkick’s development environment is entirely Unix-based, making use of (and contributing back to) lots of open-source projects.

I do have experience developing in a Linux environment, but haven’t touched it for a few years, so had a feeling I was going to be rusty. After a few days, I was pleasantly surprised to see how far the tools and frameworks have come. Again, having a mentor to guide me through this transition was crucial; I could ask questions and receive answers immediately.

All in all, joining Songkick has been an amazing experience. I’m surrounded by different teams of people (ranging from developers and testers, through to UX experts and designers), all of which are the best at what they do (but don’t take my word for it, check out the team page). Having a passion for the product is essential, but if you love live music, Songkick is for you.

[1] How about developing for a platform that has millions of users, and enables fans from across the world to see their favourite artists live. And the perks are pretty amazing too; great office, free food and drink, table tennis and foosball, monthly ticket allowance, annual festival trip for the company, etc. I could go on, but you should probably just apply.

Run the right tests at the right time

Way back in June, Dan Crow posted about some of the key principles that we at Songkick believe in. One that I spend some time thinking about every day is, ‘ship early, ship often’. We firmly believe that code should be shipped as soon as it’s ready. From a development point view this just makes sense. From a user’s point of view this just makes sense. From a testing point of view this proves to be a bit of a challenge.

Shipping fast doesn’t mean shipping untested code and hoping for the best. Every single thing that we release has been tested extensively. Obviously the only way we manage to ship often is by keeping the build/test/release cycle as short as possible. All builds are managed in Jenkins. Pushing code will automatically trigger our unit and integration test suites. If all the tests pass we end up with a green build which can be manually deployed to our test environment. Finally a suite of Acceptance tests run through the browser using Capybara and the Selenium Web Driver to confirm we haven’t broken any of our critical user journeys. These tests are pretty slow, taking roughly 4 minutes to run a handful of scenarios but this is the first check that the user will actually be able to interact with the website.

Only after all these tests have passed will we deploy code to Production. This applies to all new features, bug fixes and even changes to the tests themselves.

The problem

Despite our best intentions we were still struggling to ship changes as soon as they were ready:

In June 2011 we made 7 releases.

In the best case it took 3 hours to build, test and ship code. In reality we were spending around 2 days preparing each release. Something had to change.

Dan Lucraft wrote an excellent post about how we reduced the time it takes to run our tests. It feels pretty obvious to say you can increase release speed if you make your tests run faster but this was only part of the solution. Keeping the test suites fast requires constant diligence. Aiming for 100% test coverage is a distraction. Not only will you never achieve it but if you even came close then your builds would likely be taking far longer than needed to run.

Run the right tests

We took the step of identifying which features we wouldn’t want to break and plotting them against the overhead of running tests. In the case of unit tests you can pretty much add as many tests as you like without too much overhead. Integration tests need to be things that you actually care about. If you discovered a feature was broken during manual testing but wouldn’t hold a release to fix it then you shouldn’t have an automated test for that feature in your build (well, unless it was a super quick unit test).

An example of this is our automatic tweets when authenticated users mark their attendance to an event. It is a valid and highly used service that we wouldn’t want to be without but it is not business critical. If we were to have an automated test for this we would need a test which set up a user who appears authenticated with Twitter. The test user would then mark their attendance to an event and the test would need to check whether the tweet was fired for the correct event.

Not only is that a fair bit of work to write and maintain but the resulting test would be pretty slow to execute. The alternative, to push to production and monitor errors in the logs whilst also keeping an eye on the Songkick twitter feed (something we’re already monitoring) means we have one fewer test to run and maintain. The feedback comes later (post release rather than pre) but since we wouldn’t hold a release even if we knew that we had broken this feature then actual time to fix is roughly the same.

At the right time

To allow the team to ship fast we need to keep the release channel clear. Builds run through the test suites as cleanly and as quickly as possible to free up the channel for the next release. Part of our process involves establishing up-front how we will test a code change. Usually this will mean adding or modifying automated tests to cover the new functionality. However some of our changes need more than just an automated build run against them so we needed to come up with a way to separate testing from the actual releases.

Our solution was to use what we call Flippers, additional code which lets admins control whether a feature is visible to users. We can then turn features on and off on the live site without needing to make additional releases. As well as giving us a fast way to turn off problem features this has the benefit of allowing us to turn features on for a particular type of user. High risk or extensively changed features are released to production behind a flipper that makes them visible to admin users only. This means we can run the code on the live servers, using live data but test them as if we were working on a test environment.

Fix bugs fast

One problem with testing code on Production is that the bugs you find are also on Production. Obviously many of these bugs aren’t visible to users thanks to to the flippers but there will always be some bugs in live code. Our approach is a cultural one: yes, we move fast and accept that things might break, but we don’t leave them like that. We fix bugs as fast as possible.

Sounds interesting but does it work?

We spent 12 months looking at our tests, our process and probably ourselves. Changes were made and in June 2012 we made 113 releases. 14 of those were on the same day. In fact we released on every single working day that month (and there were a few sneaky weekend releases too!).