Songkick’s first engineering open house

Here at Songkick HQ, we’ve been working on some pretty exciting projects over the last year. With over 6 million monthly uniques, the most comprehensive live music dataset on the planet, and successful apps on Spotify, iPhone, Android and Facebook, we help the world’s music fans go to more concerts.

Come and find out more about the technology behind Songkick. Meet the engineering team, ask questions and – most importantly – enjoy free beer and pizza.

We have four short presentations for you, with plenty of time for you to talk with the team.

When: Wednesday October 10th, 6pm – 8pm
Where:
Songkick HQ, Hoxton Street

Speakers:

Dan Lucraft
Hyperadmin and our Service-Oriented Architecture
How SOA let us build self-documenting APIs

Sabrina Leandro
Data ingestion
How we handle concert data from multiple sources

Phil Cowans
Data Science
Analyzing Songkick’s mountains of data

Amy Phillips
Testing and Continuous Deployment
The heart of Songkick’s Agile process

If you’d like to come along, register below. Spaces are strictly limited, so sign up now. If we have space for you, we’ll send you a confirmation email. If you don’t get in this time, don’t worry, we’ll notify you of future events.

Registration is now closed

Follow us on Twitter: @songkicktech
Read our devblog
We’re hiring: Songkick jobs

Run the right tests at the right time

Way back in June, Dan Crow posted about some of the key principles that we at Songkick believe in. One that I spend some time thinking about every day is, ‘ship early, ship often’. We firmly believe that code should be shipped as soon as it’s ready. From a development point view this just makes sense. From a user’s point of view this just makes sense. From a testing point of view this proves to be a bit of a challenge.

Shipping fast doesn’t mean shipping untested code and hoping for the best. Every single thing that we release has been tested extensively. Obviously the only way we manage to ship often is by keeping the build/test/release cycle as short as possible. All builds are managed in Jenkins. Pushing code will automatically trigger our unit and integration test suites. If all the tests pass we end up with a green build which can be manually deployed to our test environment. Finally a suite of Acceptance tests run through the browser using Capybara and the Selenium Web Driver to confirm we haven’t broken any of our critical user journeys. These tests are pretty slow, taking roughly 4 minutes to run a handful of scenarios but this is the first check that the user will actually be able to interact with the website.

Only after all these tests have passed will we deploy code to Production. This applies to all new features, bug fixes and even changes to the tests themselves.

The problem

Despite our best intentions we were still struggling to ship changes as soon as they were ready:

In June 2011 we made 7 releases.

In the best case it took 3 hours to build, test and ship code. In reality we were spending around 2 days preparing each release. Something had to change.

Dan Lucraft wrote an excellent post about how we reduced the time it takes to run our tests. It feels pretty obvious to say you can increase release speed if you make your tests run faster but this was only part of the solution. Keeping the test suites fast requires constant diligence. Aiming for 100% test coverage is a distraction. Not only will you never achieve it but if you even came close then your builds would likely be taking far longer than needed to run.

Run the right tests

We took the step of identifying which features we wouldn’t want to break and plotting them against the overhead of running tests. In the case of unit tests you can pretty much add as many tests as you like without too much overhead. Integration tests need to be things that you actually care about. If you discovered a feature was broken during manual testing but wouldn’t hold a release to fix it then you shouldn’t have an automated test for that feature in your build (well, unless it was a super quick unit test).

An example of this is our automatic tweets when authenticated users mark their attendance to an event. It is a valid and highly used service that we wouldn’t want to be without but it is not business critical. If we were to have an automated test for this we would need a test which set up a user who appears authenticated with Twitter. The test user would then mark their attendance to an event and the test would need to check whether the tweet was fired for the correct event.

Not only is that a fair bit of work to write and maintain but the resulting test would be pretty slow to execute. The alternative, to push to production and monitor errors in the logs whilst also keeping an eye on the Songkick twitter feed (something we’re already monitoring) means we have one fewer test to run and maintain. The feedback comes later (post release rather than pre) but since we wouldn’t hold a release even if we knew that we had broken this feature then actual time to fix is roughly the same.

At the right time

To allow the team to ship fast we need to keep the release channel clear. Builds run through the test suites as cleanly and as quickly as possible to free up the channel for the next release. Part of our process involves establishing up-front how we will test a code change. Usually this will mean adding or modifying automated tests to cover the new functionality. However some of our changes need more than just an automated build run against them so we needed to come up with a way to separate testing from the actual releases.

Our solution was to use what we call Flippers, additional code which lets admins control whether a feature is visible to users. We can then turn features on and off on the live site without needing to make additional releases. As well as giving us a fast way to turn off problem features this has the benefit of allowing us to turn features on for a particular type of user. High risk or extensively changed features are released to production behind a flipper that makes them visible to admin users only. This means we can run the code on the live servers, using live data but test them as if we were working on a test environment.

Fix bugs fast

One problem with testing code on Production is that the bugs you find are also on Production. Obviously many of these bugs aren’t visible to users thanks to to the flippers but there will always be some bugs in live code. Our approach is a cultural one: yes, we move fast and accept that things might break, but we don’t leave them like that. We fix bugs as fast as possible.

Sounds interesting but does it work?

We spent 12 months looking at our tests, our process and probably ourselves. Changes were made and in June 2012 we made 113 releases. 14 of those were on the same day. In fact we released on every single working day that month (and there were a few sneaky weekend releases too!).

Our object-based Rails frontend

Part of the rewrite of Songkick’s website was a re-architecture of the main client application, affectionately known as skweb (pronounced /skwɛb/, not /ɛskeɪwɛb/). Skweb, as has been mentioned in other posts, had grown into a monster, not just in size but also in complexity. I was asked to suggest an improved structure for the new simplified application. Based on my observations working on our application and the one I’d worked on at the Guardian, I noticed that a lot of complexity was introduced to make rendering web pages easier. It was as if, since we were so focused on modelling the business logic of the company, we had neglected to model a core function of a web site: presenting HTML pages to the user.

With this in mind I proposed splitting out the modelling of webpages into ‘page models’ that would sit alongside the application models and focus on taking Songkick’s data and turning it in to web pages. Each type of page on the website would have a ‘page model’ responsible for rendering the page. This separation would eventually lead naturally to suggesting that we use services to drive skweb, since the page models were built to be agnostic about where their data came from so we could migrate away from our single database more easily.

These days, all the business logic that drives Songkick is contained within internal web services, and skweb’s main job is creating web pages from that information. Certainly there are pages about artists and concerts with tickets and venues so all that vocabulary remains, but it is not the business model of Songkick we are modelling. What we are concerned with is presenting that information in web pages.

Pages, Components, Elements

Once we settled on having page models, it became straightforward to break the page up into its constituent parts. A page has a collection of components, and the components consist of elements. The component is given any data it needs by its enclosing page. Any sufficiently complex components can have their own models that the page model invokes when needed.

The default behaviour for a component which has no data to render is to render nothing. For example if the service that provides data to the component is down, the component should contain the error and emit no output. There should be no stray markup hanging around on the page, and if components need to display something when empty it is up to the page to allow this.

What makes a component?

A component is a discrete module of functionality on the page, that can function independently of other components. Typically you can easily draw a box around a component and it will probably contain a heading and some supporting information. I decided (somewhat arbitrarily) that components are not nestable: you cannot have components inside components. While this constraint is not a technical one, I imposed it to try and reduce complexity in the design. Since components aren’t nestable, if we do need to break them into parts or share code between components then we use elements instead. Components that appear on more than one type of page are called shared components.

An element is something smaller and usually less complex than a component, and may appear in more than one component (if this happens it is called a shared element). An example of this is the attendance buttons that appear all over our site and appear both in the event listings like those found on an artist page and on the individual event pages.

We arrange the view code around pages and components with each page having its own stylesheet, and each component having its own stylesheet, JavaScript and images. We use the same name for each page model and its associated assets, so it’s easy to understand which static assets the component depends on. An advantage of this approach is when a component is removed or refactored there is no ambiguity about which images, CSS files, and JavaScript must be removed or updated.

So how does all this work in practice?

Let’s examine how this works, by following one component through its rendering process. I’m going to use the Map component on the Venue page.

Skweb is still a Rails app and still has the familiar layout, but we’ve added some conventions of our own. First, all pages have a type – ‘venue’, for example – that also provides the name for the CSS file for the page to link to. The page provides methods that expose its components, and it constructs each component by passing in whatever data that component needs: the component has no access to databases, services or the HTTP request, everything they need is given to them via the page model and controller. By convention the name of the component is also the name of the template in the views folder, in fact it is the use of common names that makes understanding component dependencies easier.

A small fragment of our app might look like this:

skweb/
    app/
        controllers/
            venues_controller.rb
        models/
            page_models/
                venue.rb
            skweb/
                models/
                    venue.rb
        views/
            shared/
                components/
                    _calendar_summary.html.erb
                elements/
                    _attendance_buttons_element.html.erb
                    _event_listings.html.erb
            venues/
                _brief.html.erb
                _map.html.erb
                show.html.erb
    public/
        javascripts/
            songkick/
                component/
                    tickets.js
        stylesheets/
            components/
                venue-brief.css
                venue-map.css
            shared/
                elements/
                    pagination.css
                components/
                    brief.css
            venue.css

When a user visits the a Venue page the controller creates a new page object:

class VenuesController < ApplicationController
  def show
    @page = PageModels::Venue.new(venue, logged_in_user)
  end
end

The page model for the Venue includes something to this effect:

module PageModels
  class Venue < PageModels::Base
    def initialize(venue, logged_in_user)
      @venue = venue
      @logged_in_user = logged_in_user
    end

    def brief
      Brief.new(@venue, upcoming_events.total_entries, @logged_in_user)
    end
 end
end

The Brief component is responsible for displaying the venue’s address, map, image, image and so on, but the Ruby objects only expose data. Markup is confined to the view templates, and rendering is performed by glueing a page model and a view template together.

module PageModels
  class Venue
    class Brief
      def geolocation
        @venue.geolocation
      end
    end
  end
end

Moving to the view, the ‘show’ page for a venue might look like this:

<div class="primary col">
  <%= component('brief', @page.brief) %>
  <%= component('map', @page.brief.geolocation) %>
  <%= shared_component('calendar_summary',   @page.calendar_summary) %>
  <%= shared_component('media_summary',      @page.media_summary) %>
  <%= shared_component('media_links',        @page.media_links) %>
  <%= shared_component('gigography_summary', @page.gigography_summary) %>
</div>

component() and shared_component() are defined in ApplicationHelper and look like this:

def component(component_name, object)
  return '' if object.nil?
  render :partial => component_name, :object => object
end

def shared_component(component_name, object)
  component("shared/components/#{component_name}", object)
end

As you can see really just a thin wrapper around partials, but, it also enforces that we do not render if there is no data to give to the component.

The content of the component is pretty standard ERB:

<div class="component venue-map">
  <a href="<%= google_maps_url(map, :zoom => 15) %>" target="_blank">
    <img src="<%= static_google_maps_image_url(map, :width => 640, :height => 220, :zoom => 15) %>">
  </a>
</div>

As a convenience, the object passed in to the component by its page will have the same name as the component. That is where map comes from in the above code. this is also useful in shared components as they don’t need to know anything about the context in which they are being used and what instance variables it might be using.

The Venue page will link to its venue.css file, which looks like:

@import 'shared/components/brief.css';
@import 'components/venue-brief.css';
@import 'components/venue-map.css';
@import 'shared/components/media-summary.css';
@import 'shared/components/event-listings.css';

And the venue-map.css file is short and sweet:

.venue-map
{
  padding: 0;
  position: relative;
  z-index: 5;
  -webkit-box-shadow: 0 4px 2px -2px rgba(0, 0, 0, 0.2);
     -moz-box-shadow: 0 4px 2px -2px rgba(0, 0, 0, 0.2);
          box-shadow: 0 4px 2px -2px rgba(0, 0, 0, 0.2);
}

.venue-map img
{
  vertical-align: bottom;
}

.venue-map
{
  margin-bottom: 26px;
}

@media only screen and (max-width: 767px)
{
  .mobile-enabled .venue-map img
  {
    width: 100%
  }

  .mobile-enabled .venue-map
  {
    padding-left: 0;
    padding-right: 0;
  }
}

The CSS file contains only the CSS that this component needs and includes any CSS for the small screen rendering of that component.

What is that called?

Another aspect of the design was to use pervasive language. The idea is that everyone at Songkick – product managers, designers, and developers – uses the same name for pages and components on the website. The advantage of having a shared language across the company comes through when talking about the site. If someone says, ‘the ticket component is broken,’ I know exactly they mean. It will correspond to a file called tickets.html.erb in the views, the page model for the component will be called Tickets; its CSS will live in stylesheets/components/tickets.css, the HTML class name on the component is tickets; any JavaScript needed for the component lives in javascript/songkick/component/tickets.js. The strong naming convention makes navigating around the project easy and makes finding dependencies very straightforward.

What does this give us?

The page/component/element structure makes deciding where to put code easier by having very strong conventions. The page models made migrating skweb onto services simpler as it provided a separation between the rendering stack and the source of the data it uses. We were able to behave like we were building on top of services when in some cases the services didn’t exist yet.

We have now also used this architecture on a new application and again the clear demarcation of responsibilities makes deciding where to put code and how to structure it easier and more predictable. That’s not say that there aren’t costs to this approach: certainly some find the shear number of files, especially for CSS, difficult to navigate. Others find the insistence on rigidly mapping names across types of files excessive. While this is somewhat down to personal taste, in our experience having a predictable structure of small files with focussed responsibilities has made it easier to maintain our codebase.

The path to SOA

So far, James has explained what Songkick’s current Service Oriented Architecture looks like. I want to step back and talk about one of the hardest things we had to do: once we decided to undertake such a big change, how did we take the first step?

In our case, it made sense to start where it hurt the most: rewriting our biggest project, the songkick.com Rails app, to be a simpler web app without direct access to the ActiveRecord domain models. This would also give us the opportunity to understand the types of resources and API endpoints needed, so the services could later be built based on how they were used by clients. Another benefit of starting with the Rails app itself, instead of the services, was that we would have the immediate benefits of a simpler, decoupled web app.

The plan was for an “inside-out rewrite”, that is, we didn’t start a new project from scratch. Instead, we went template after template on Songkick’s website and re-wrote it end to end, from the models and controller to the views, CSS and JavaScript. This way, our code was continuously integrated, which meant the benefits and flaws of our design were seen as soon as a template was done, instead of emerging with a completely new project months later. The drawback of this approach is that it takes a lot of effort to work with evolving code. However, I think that this is an important skill for us to learn as developers.

We started crossing the SOA chasm by creating application-specific “client model” classes that wrapped ActiveRecord models, and “service” classes that would call the respective methods on those models, decoupling the domain model from the presentation layer.

For example, if this is how an event was loaded on an event page:

class EventsController < ApplicationController
  def show
    @event = Event.find(params[:id])
  end
end

class Event < ActiveRecord::Base
end

This was rewritten to be:

class EventsController
  def show
    @event = Services::EventListings.event_from_id(params[:id])
  end
end

module Services
  class EventListings
    def self.event_from_id(event_id)
      active_record_event = Event.find(params[:id])
      ClientModels::Event.new(active_record_event.to_hash)
    end
  end
end

module ClientModels
  class Event
    def initialize(event_info)
      @id   = event_info[‘id’]
      @date = Date.parse(event_info[‘date’])
      # etc.
    end
  end
end

class Event < ActiveRecord::Base
  def to_hash
    {
      'id'   => id, 
      'date' => date.to_s, 
      # etc.
    }
  end
end

Instead of accessing an ActiveRecord instance directly, all code in our Rails app would access it via the “service” classes. Those were the only classes allowed to talk to ActiveRecord models. Any response returned by those classes must be a client model instance that is initialized with the same information we would eventually return from our internal APIs.

Starting out like this meant we could easily change the data returned by the “to_hash” method to suit our needs, and still have the benefits of encapsulating what would eventually be the service client code.

When the time came and the services were ready, we simply changed the client service classes over to use HTTP:

module Services
  class EventListings
    def self.event_from_id(event_id)
      event_hash = JSON.parse(http.get("/events/#{event_id}").body)
      ClientModels::Event.new(event_hash)
    end
  end
end

And that’s it! All the application code talking to the service and client model classes remains completely unchanged.

Understanding your product and the domain you are modelling is crucial to being successful on an effort like this. Songkick’s product and design team were essential parts of this project. We were simplifying our technical architecture, but also simplifying and focusing Songkick’s proposition.

Once we had a plan, it took us around 10 weeks to rewrite our Rails app so that every single controller and view was using the new client models. During this period, we also rewrote our front end code to have an architecture that mirrors more closely the pages and visual components used on the website. Stay tuned for more details!