Our object-based Rails frontend

Part of the rewrite of Songkick’s website was a re-architecture of the main client application, affectionately known as skweb (pronounced /skwɛb/, not /ɛskeɪwɛb/). Skweb, as has been mentioned in other posts, had grown into a monster, not just in size but also in complexity. I was asked to suggest an improved structure for the new simplified application. Based on my observations working on our application and the one I’d worked on at the Guardian, I noticed that a lot of complexity was introduced to make rendering web pages easier. It was as if, since we were so focused on modelling the business logic of the company, we had neglected to model a core function of a web site: presenting HTML pages to the user.

With this in mind I proposed splitting out the modelling of webpages into ‘page models’ that would sit alongside the application models and focus on taking Songkick’s data and turning it in to web pages. Each type of page on the website would have a ‘page model’ responsible for rendering the page. This separation would eventually lead naturally to suggesting that we use services to drive skweb, since the page models were built to be agnostic about where their data came from so we could migrate away from our single database more easily.

These days, all the business logic that drives Songkick is contained within internal web services, and skweb’s main job is creating web pages from that information. Certainly there are pages about artists and concerts with tickets and venues so all that vocabulary remains, but it is not the business model of Songkick we are modelling. What we are concerned with is presenting that information in web pages.

Pages, Components, Elements

Once we settled on having page models, it became straightforward to break the page up into its constituent parts. A page has a collection of components, and the components consist of elements. The component is given any data it needs by its enclosing page. Any sufficiently complex components can have their own models that the page model invokes when needed.

The default behaviour for a component which has no data to render is to render nothing. For example if the service that provides data to the component is down, the component should contain the error and emit no output. There should be no stray markup hanging around on the page, and if components need to display something when empty it is up to the page to allow this.

What makes a component?

A component is a discrete module of functionality on the page, that can function independently of other components. Typically you can easily draw a box around a component and it will probably contain a heading and some supporting information. I decided (somewhat arbitrarily) that components are not nestable: you cannot have components inside components. While this constraint is not a technical one, I imposed it to try and reduce complexity in the design. Since components aren’t nestable, if we do need to break them into parts or share code between components then we use elements instead. Components that appear on more than one type of page are called shared components.

An element is something smaller and usually less complex than a component, and may appear in more than one component (if this happens it is called a shared element). An example of this is the attendance buttons that appear all over our site and appear both in the event listings like those found on an artist page and on the individual event pages.

We arrange the view code around pages and components with each page having its own stylesheet, and each component having its own stylesheet, JavaScript and images. We use the same name for each page model and its associated assets, so it’s easy to understand which static assets the component depends on. An advantage of this approach is when a component is removed or refactored there is no ambiguity about which images, CSS files, and JavaScript must be removed or updated.

So how does all this work in practice?

Let’s examine how this works, by following one component through its rendering process. I’m going to use the Map component on the Venue page.

Skweb is still a Rails app and still has the familiar layout, but we’ve added some conventions of our own. First, all pages have a type – ‘venue’, for example – that also provides the name for the CSS file for the page to link to. The page provides methods that expose its components, and it constructs each component by passing in whatever data that component needs: the component has no access to databases, services or the HTTP request, everything they need is given to them via the page model and controller. By convention the name of the component is also the name of the template in the views folder, in fact it is the use of common names that makes understanding component dependencies easier.

A small fragment of our app might look like this:

skweb/
    app/
        controllers/
            venues_controller.rb
        models/
            page_models/
                venue.rb
            skweb/
                models/
                    venue.rb
        views/
            shared/
                components/
                    _calendar_summary.html.erb
                elements/
                    _attendance_buttons_element.html.erb
                    _event_listings.html.erb
            venues/
                _brief.html.erb
                _map.html.erb
                show.html.erb
    public/
        javascripts/
            songkick/
                component/
                    tickets.js
        stylesheets/
            components/
                venue-brief.css
                venue-map.css
            shared/
                elements/
                    pagination.css
                components/
                    brief.css
            venue.css

When a user visits the a Venue page the controller creates a new page object:

class VenuesController < ApplicationController
  def show
    @page = PageModels::Venue.new(venue, logged_in_user)
  end
end

The page model for the Venue includes something to this effect:

module PageModels
  class Venue < PageModels::Base
    def initialize(venue, logged_in_user)
      @venue = venue
      @logged_in_user = logged_in_user
    end

    def brief
      Brief.new(@venue, upcoming_events.total_entries, @logged_in_user)
    end
 end
end

The Brief component is responsible for displaying the venue’s address, map, image, image and so on, but the Ruby objects only expose data. Markup is confined to the view templates, and rendering is performed by glueing a page model and a view template together.

module PageModels
  class Venue
    class Brief
      def geolocation
        @venue.geolocation
      end
    end
  end
end

Moving to the view, the ‘show’ page for a venue might look like this:

<div class="primary col">
  <%= component('brief', @page.brief) %>
  <%= component('map', @page.brief.geolocation) %>
  <%= shared_component('calendar_summary',   @page.calendar_summary) %>
  <%= shared_component('media_summary',      @page.media_summary) %>
  <%= shared_component('media_links',        @page.media_links) %>
  <%= shared_component('gigography_summary', @page.gigography_summary) %>
</div>

component() and shared_component() are defined in ApplicationHelper and look like this:

def component(component_name, object)
  return '' if object.nil?
  render :partial => component_name, :object => object
end

def shared_component(component_name, object)
  component("shared/components/#{component_name}", object)
end

As you can see really just a thin wrapper around partials, but, it also enforces that we do not render if there is no data to give to the component.

The content of the component is pretty standard ERB:

<div class="component venue-map">
  <a href="<%= google_maps_url(map, :zoom => 15) %>" target="_blank">
    <img src="<%= static_google_maps_image_url(map, :width => 640, :height => 220, :zoom => 15) %>">
  </a>
</div>

As a convenience, the object passed in to the component by its page will have the same name as the component. That is where map comes from in the above code. this is also useful in shared components as they don’t need to know anything about the context in which they are being used and what instance variables it might be using.

The Venue page will link to its venue.css file, which looks like:

@import 'shared/components/brief.css';
@import 'components/venue-brief.css';
@import 'components/venue-map.css';
@import 'shared/components/media-summary.css';
@import 'shared/components/event-listings.css';

And the venue-map.css file is short and sweet:

.venue-map
{
  padding: 0;
  position: relative;
  z-index: 5;
  -webkit-box-shadow: 0 4px 2px -2px rgba(0, 0, 0, 0.2);
     -moz-box-shadow: 0 4px 2px -2px rgba(0, 0, 0, 0.2);
          box-shadow: 0 4px 2px -2px rgba(0, 0, 0, 0.2);
}

.venue-map img
{
  vertical-align: bottom;
}

.venue-map
{
  margin-bottom: 26px;
}

@media only screen and (max-width: 767px)
{
  .mobile-enabled .venue-map img
  {
    width: 100%
  }

  .mobile-enabled .venue-map
  {
    padding-left: 0;
    padding-right: 0;
  }
}

The CSS file contains only the CSS that this component needs and includes any CSS for the small screen rendering of that component.

What is that called?

Another aspect of the design was to use pervasive language. The idea is that everyone at Songkick – product managers, designers, and developers – uses the same name for pages and components on the website. The advantage of having a shared language across the company comes through when talking about the site. If someone says, ‘the ticket component is broken,’ I know exactly they mean. It will correspond to a file called tickets.html.erb in the views, the page model for the component will be called Tickets; its CSS will live in stylesheets/components/tickets.css, the HTML class name on the component is tickets; any JavaScript needed for the component lives in javascript/songkick/component/tickets.js. The strong naming convention makes navigating around the project easy and makes finding dependencies very straightforward.

What does this give us?

The page/component/element structure makes deciding where to put code easier by having very strong conventions. The page models made migrating skweb onto services simpler as it provided a separation between the rendering stack and the source of the data it uses. We were able to behave like we were building on top of services when in some cases the services didn’t exist yet.

We have now also used this architecture on a new application and again the clear demarcation of responsibilities makes deciding where to put code and how to structure it easier and more predictable. That’s not say that there aren’t costs to this approach: certainly some find the shear number of files, especially for CSS, difficult to navigate. Others find the insistence on rigidly mapping names across types of files excessive. While this is somewhat down to personal taste, in our experience having a predictable structure of small files with focussed responsibilities has made it easier to maintain our codebase.

The path to SOA

So far, James has explained what Songkick’s current Service Oriented Architecture looks like. I want to step back and talk about one of the hardest things we had to do: once we decided to undertake such a big change, how did we take the first step?

In our case, it made sense to start where it hurt the most: rewriting our biggest project, the songkick.com Rails app, to be a simpler web app without direct access to the ActiveRecord domain models. This would also give us the opportunity to understand the types of resources and API endpoints needed, so the services could later be built based on how they were used by clients. Another benefit of starting with the Rails app itself, instead of the services, was that we would have the immediate benefits of a simpler, decoupled web app.

The plan was for an “inside-out rewrite”, that is, we didn’t start a new project from scratch. Instead, we went template after template on Songkick’s website and re-wrote it end to end, from the models and controller to the views, CSS and JavaScript. This way, our code was continuously integrated, which meant the benefits and flaws of our design were seen as soon as a template was done, instead of emerging with a completely new project months later. The drawback of this approach is that it takes a lot of effort to work with evolving code. However, I think that this is an important skill for us to learn as developers.

We started crossing the SOA chasm by creating application-specific “client model” classes that wrapped ActiveRecord models, and “service” classes that would call the respective methods on those models, decoupling the domain model from the presentation layer.

For example, if this is how an event was loaded on an event page:

class EventsController < ApplicationController
  def show
    @event = Event.find(params[:id])
  end
end

class Event < ActiveRecord::Base
end

This was rewritten to be:

class EventsController
  def show
    @event = Services::EventListings.event_from_id(params[:id])
  end
end

module Services
  class EventListings
    def self.event_from_id(event_id)
      active_record_event = Event.find(params[:id])
      ClientModels::Event.new(active_record_event.to_hash)
    end
  end
end

module ClientModels
  class Event
    def initialize(event_info)
      @id   = event_info[‘id’]
      @date = Date.parse(event_info[‘date’])
      # etc.
    end
  end
end

class Event < ActiveRecord::Base
  def to_hash
    {
      'id'   => id, 
      'date' => date.to_s, 
      # etc.
    }
  end
end

Instead of accessing an ActiveRecord instance directly, all code in our Rails app would access it via the “service” classes. Those were the only classes allowed to talk to ActiveRecord models. Any response returned by those classes must be a client model instance that is initialized with the same information we would eventually return from our internal APIs.

Starting out like this meant we could easily change the data returned by the “to_hash” method to suit our needs, and still have the benefits of encapsulating what would eventually be the service client code.

When the time came and the services were ready, we simply changed the client service classes over to use HTTP:

module Services
  class EventListings
    def self.event_from_id(event_id)
      event_hash = JSON.parse(http.get("/events/#{event_id}").body)
      ClientModels::Event.new(event_hash)
    end
  end
end

And that’s it! All the application code talking to the service and client model classes remains completely unchanged.

Understanding your product and the domain you are modelling is crucial to being successful on an effort like this. Songkick’s product and design team were essential parts of this project. We were simplifying our technical architecture, but also simplifying and focusing Songkick’s proposition.

Once we had a plan, it took us around 10 weeks to rewrite our Rails app so that every single controller and view was using the new client models. During this period, we also rewrote our front end code to have an architecture that mirrors more closely the pages and visual components used on the website. Stay tuned for more details!

The client side of SOA

This article is part of a series on Songkick’s migration to a service-oriented architecture. The full series:

Following on from my previous article on what our backend services look like, it’s time to talk about the client side. How do our user-facing applications use the services, and how is it different from using ActiveRecord?

The nice thing about Rails is it doesn’t force you into using ActiveRecord. If you do, then a lot of conveniences are made available to you, but you’re really free to do whatever you want in your Rails controllers. So, instead of speaking to ActiveRecord models, our applications make HTTP calls to several backend services.

HTTP, do you speak it?

The first bit of the problem is, how do we make HTTP calls? We want this to be extremely convenient for people writing application code, which means avoiding as much boilerplate as possible. We don’t want application code cluttered with stuff like this:

uri = URI.parse("http://accounts-service/users/#{name}")
http = Net::HTTP.new(uri.host, uri.port)
response = http.request_get(uri.path)
if response.code == '200'
  JSON.parse(response.body)
else
  raise NotFound
end

when we could just write:

http_client.get("/users/#{name}").data

And that’s the simple case. When making HTTP calls, you have to deal with a lot of complexity: serializing parameters, query strings vs entity bodies, multipart uploads, content types, service hostname lookups, keep-alive or not, response parsing and several classes of error detection: DNS failure, refused connections, timeouts, HTTP failure responses, user input validation errors, malformed or interrupted output formats… and good luck changing all that if you want to change which HTTP library you want to use.

So, the first thing we did is create an abstract HTTP API with several implementations, and released it as open-source. Songkick::Transport gives us a terse HTTP interface with backends based on Curb, HTTParty and Rack::Test, all with the same high-level feature set. This lets us switch HTTP library easily, and we’ve used this to tweak the performance of our internal code.

You use it by making a connection to a host, and issuing requests. It assumes anything but a 200, 201, 204 or 409 is a software error and raises an exception, otherwise it parses the response for you and returns it:

http = Songkick::Transport::Curb.new('http://accounts-service')
user = http.get('/users/jcoglan').data
# => {'id' => 18787, 'username' => 'jcoglan'}

Songkick::Transport also has some useful reporting facilities built into it, for example it makes it easy to record all the backend service requests made during a single call to our user-facing Rails app, and log the total time spent calling services, much like Rails does for DB calls. More details in the README.

Who needs FakeWeb?

The nice thing about having a simple flat API for doing HTTP means it’s really easy to test clients built on top of Songkick::Transport, as opposed to something like FakeWeb that fakes the whole complicated Net::HTTP interface. In each application, we have clients built on top of Songkick::Transport that take an HTTP client as a constructor argument. When they make an HTTP call, they wrap the response data in a model object, which allows the application to shield itself from potential changes to the API wire format.

module Services
  class AccountsClient
    def initialize(http_client)
      @http = http_client
    end
    
    def find_user(username)
      data = @http.get("/users/#{username}").data
      Models::User.new(data)
    end
  end
end

module Models
  class User
    def initialize(data)
      @data = data
    end

    def username
      @data['username']
    end
  end
end

This approach makes it really easy to stub out the response of a backend service for a test:

before do
  @http   = mock('Transport')
  @client = Services::AccountsClient.new(@http)
end

it "returns a User" do
  response = mock('Response', :data => {'username' => 'jcoglan'})
  @http.stub(:get).with('/users/jcoglan').and_return(response)
  @client.find_user('jcoglan').username.should == 'jcoglan'
end

It also makes mock-based testing really easy:

it "tells the service to delete a User" do
  @http.should_receive(:delete).with('/users/jcoglan')
  @client.delete_user('jcoglan')
end

Being able to stub HTTP calls like this is very powerful, especially when query strings or entity bodies are involved. Your backend probably treats foo=bar&something=else and something=else&foo=bar the same, and it’s much easier to mock/stub on such parameter sets when they’re expressed as a hash, as in

http.get '/', :foo => 'bar', :something => 'else'

rather than as an order-sensitive string:

http.get '/?foo=bar&something=else'

It’s also worth noting that the models are basically inert data objects, and in many cases they are immutable values. They don’t know anything about the services, or any other I/O device, they just accept and expose data. This means you can use real data objects in other tests, rather than hard-to-maintain fakes, and still your tests run fast.

Convenience vs flexibility

Nice as it is to be able to choose which HTTP implementation you use, most of the time the application developer does not want to write

http   = Songkick::Transport::Curb.new('http://accounts-service')
client = Services::AccountsClient.new(http)
user   = client.find_user(params[:username])

every time they need to look up a record. The flexibility helps with testing and deployment concerns, but it’s not convenient. So, we put a layer of sugar over these flexible building blocks that means most of the things an application needs to do are one-liners. We have a Services module that provides canonical instances of all the service clients; it deals with knowing which hostnames to connect to, which HTTP library to use, and which client object to construct for each service.

module Services
  def self.accounts
    @accounts ||= begin
      http = Songkick::Transport::Curb.new('http://accounts-service')
      AccountsClient.new(http)
    end
  end
end

With this layer of sugar, getting a user account is one line:

user = Services.accounts.find_user(params[:username])

In our Cucumber tests, we tend to stub out methods on these canonical instances, or make a Services method return an entirely fake instance. The cukes are not complete full-stack tests; they are integration tests of the current project, rather than of the entire stack, and the lack of backend I/O keeps them very fast. The stability of the underlying service APIs means we aren’t taking a big risk with these fakes, and we have a few acceptance tests that run against our staging and production sites to make sure we don’t break anything really important.

What about error handling?

We want it to be as easy as possible to deal with errors, since messy error handling can hamper the maintainability of a project and introduce mistakes that make things harder for end users. For this reason, we made anything but 200, 201, 204 or 409 from a backend raise an exception, for example if the accounts service returns a 404 for this call, an exception is raised:

Services.accounts.find_user('santa_claus')

The exception raised by Songkick::Transport contains information about the request and response. This means you can put a catch-all error handler in your Rails or Sinatra app to catch Songkick::Transport::HttpError, and forward the 404 from the backend out to the user. The removes a lot of error handling code from the application.

In some cases though, you don’t want this behaviour. For example, say we’re rendering an artist’s page and we have a sidebar module showing related artists. If the main artist gives a 404, then the whole page response should be a 404. But if we can’t get the related artists, or their profile images, then we don’t want the whole page to fail, just that sidebar module. Such cases tend to be the minority in our applications, and it’s easy enough to catch the service exception and render nothing if the services backing a non-core component fail. Using an object model of our user interface helps to isolate these failures, and we hope to cover that in a future post.

Repeat after me: sometimes, you should repeat yourself

One open question when we moved to this model was: should we maintain client libraries for each service, or just make whatever calls we need in each application? The DRY principle suggests the former is obviously the best, but it’s worth asking this question if you do a project like this.

We went with the latter, for several reasons. First, since the services and Songkick::Transport encapsulate a lot of business and wire logic, the client and model classes in each application end up being pretty thin wrappers, and it isn’t hard to build just what you need in each project. Second, we got burned by having too many things depending on in-process Ruby APIs, where any change to a shared library would require us to re-test and re-start all downstream applications. This coupling tended to slow us down, and we found that sharing in-process code isn’t worth it unless it’s encapsulating substantial complexity.

Each application is free to tweak how it interacts with the service APIs, without affecting any other application, and this is a big win for us. It means no change to one application can have side effects or block work on another application, and we have’t actually found ourselves reinventing substantial pieces of logic since that’s all hidden behind the HTTP APIs.

And finally, having per-application service clients gives you a really accessible picture of what data each application actually relies on. Having one catch-all domain library made this sort of reasoning really difficult, and made it hard to assess the cost of changing anything.

Wrapping up

So that’s our architecture these days. If you decide to go down this route, remember there’s no ‘one right way’ to do things. You have to make trade-offs all the time, and the textbook engineering answer doesn’t always give your team the greatest velocity. Examine why you’re making each change, focus on long-term productivity, and you won’t go far wrong.

SOA: what our services look like

This article is part of a series on Songkick’s migration to a service-oriented architecture. The full series:

Since I began mentioning to people that Songkick is migrating its user-facing Rails app and lots more supporting components to service-oriented architecture, I’ve been asked many times to explain how we’re doing it. Truth is, it took us a while to figure this out. Any departure from the Rails Way suddenly requires all sorts of dangerous things like Decisions and Creativity and Disagreement. What we have today is the mostly-stable result of rounds of iteration, trial-and-error and debate.

What do you mean by services, exactly?

When we say SOA, we mean we’re replacing all the ActiveRecord-based data access and business logic in our applications with a number of orthogonal web services:

  • event-listings handles data relating to concerts, artists, venues, and so on
  • accounts handles users’ account data and authentication
  • taste-imports processes sets of artists uploaded by various sources of user taste data
  • caltrak handles users’ taste data and calendar generation
  • attendance stores concerts users have said they are going to
  • media handles and stores file uploads – photos, videos and the like
  • recommendations determines sets of similar artists

These are all just Sinatra apps that return JSON for the most part. They encapsulate all the business logic previously held in our ActiveRecord models, indeed they are still based on these models at present. But they don’t simply mirror the ActiveRecord APIs: they reflect how data is used rather than how it’s stored.

ActiveRecord models tend to reflect the design of normalized databases, which reflect the static properties of entities involved. Let’s take an example. Say I ask you to design an ActiveRecord schema for modelling concerts. Most people would come up with something close to our actual model, which is:

  • A polymorphic type Event with two subtypes Concert and Festival
  • The Event class has a date property and optionally an end_date
  • The Event belongs to a Venue, which belongs to a City, which belongs to a MetroArea, which belongs to a Country, and all of these entities have a name
  • The Event has many Performances
  • Each Performance belongs to an Artist and has a billing, either headline or support
  • All Artists have a name, and other metadata like popularity

This makes sense as a database design, but doesn’t reflect how the data is used. Usually, when dealing with an event, you want all of the above information, which means accessing about seven tables. Hope you didn’t miss a JOIN somewhere!

So, we could have exposed all these as distinct resources in our services, with links from each resource to those related to it, but that would be a giant waste of HTTP requests when you always want all this information all at once. It also makes it harder to write client code for the common case – you’d need to write code to follow all those links in every app you build on top of such a service. That’s what I mean when I say the services should reflect how data is used rather than how it is stored. Here’s a request I just made to find out all about Grandaddy’s upcoming show at the Shepherds Bush Empire in September.

$ curl appserver:9101/events/12511498

{
    "id":           12511498,
    "type":         "Concert",
    "status":       "ok",
    "path":         "/concerts/12511498-grandaddy-at-o2-shepherds-bush-empire",
    "date":         "2012-09-04",
    "startTime":    "2012-09-04T19:00:00+0000",
    "endDate":      null,
    "upcoming":     true,
    "profileImage": {"id": 523306, "type": "Image"},
    "series": null,
    "performances": [{
        "artist": {
            "id":           63366,
            "name":         "Grandaddy",
            "path":         "/artists/63366-grandaddy",
            "popularity":   0.044921,
            "active":       true,
            "profileImage": {"id": 523306, "type": "Image"},
            "upcomingEventsCount": 11
        },
        "id":       24380668,
        "billing":  "headline"
    }],
    "venue": {
        "id":         38320,
        "internalId": 38320,
        "name":       "O2 Shepherd's Bush Empire",
        "path":       "/venues/38320-o2-shepherds-bush-empire",
        "smallCityLongName": "London, UK",
        "unknown":    false
    }
}

Everything an app wants to know about an event, in one HTTP call. I’m reminded of this quote, which always springs to mind when I’m putting a boundary between my business logic and a user interface:

Remember that the job of your model layer is not to represent objects but to answer questions. Provide an API that answers the questions your application has, as simply and efficiently as possible. Sometimes these answers will be painfully specific, in a way that seems “wrong” to even a seasoned OO developer.

ORM is an anti-pattern

As well as encapsulating common queries, the services encapsulate operations we often need to perform, such as recording that someone likes At The Drive-In, or creating a new concert. The services are not a thin skin over the database, they encapsulate all our domain logic so it does not get replicated in various applications. The amount of code you need in an app in order to access this logic is fairly minimal, and I’ll explain what it looks like in a future post.

What is this buying us?

The core maintainability problem with a large monolithic application, like our songkick-domain library, is that internal coupling tends to creep in over time, rendering it hard to change one thing without affecting a lot of unexpected components. Every time you commit a change to the monolithic core, all the apps depending on it need re-testing and re-starting.

Monolithic database abstractions in particular are problematic because they’re coupled to, well, a monolithic database. If you have everything in one big MySQL DB, chances are parts of that DB are under much heavier load than others. It’s hard to add more capacity in this situation without replicating the whole database; you’d rather have your data split into chunks that can be horizontally scaled independently. This both makes scaling easier and reduces cost, since you’re not wasting DB machine capacity on lots of data that probably doesn’t need replicating (yet).

Creating a set of decoupled services gives us a way to deal with that: by creating an explicit boundary layer that’s designed to be kept stable, we can change the internals of the services without breaking the apps downstream, and do it faster than if the apps were still coupled to the Ruby implementation of this logic. As our applications are moved off of ActiveRecord and onto these service APIs, the volume of code coupled to our models is going down by orders of magnitude, so we can more easily chip away at these models and begin to split them up, assigning them just to the services that need them.

I mentioned in my previous post that, because of the amount of coupled Ruby code living in one process, we’ve been stuck on old versions of Ruby, Rails and other libraries for some time. Splitting our code up like this greatly reduces the amount of code living in the same process, and makes it easier for us to upgrade our dependencies, or totally change what language or hosting platform a service runs on.

The boundary creates an awareness among the team that this is a deliberate stable API, and makes the abstraction boundary more obvious than it is with bag of Ruby APIs that all live in the same process. But we can only do this because we understand the problem domain sufficiently. We’ve been working on Songkick for five years, and so we have a much better understanding of how to divide the domain up than when we started. Of course, when you start a project, you have no idea about half the stuff that’s going to end up in it, so this migration should be seen as refactoring, rather than cookie-cutter architecture to adopt from day one.

Service-oriented Songkick

This article is part of a series on Songkick’s migration to a service-oriented architecture. The full series:

For a few months now, we’ve been transitioning Songkick to a service-oriented architecture (SOA). This is the first in what will hopefully be a series of articles on what that means and how we’re doing it, and what benefits it’s bringing us. But first, some history.

In the beginning

Songkick has, for its five-year history, been a Rails app. (Well, there was a prototype in PHP but you didn’t hear that from me, right.) It was still a Rails app by the time I joined, two years into the project in 2009. And I mean it was a Rails app and nothing else. Although the system consisted of a database, a website, message queues, background processing, file storage, daily tasks like email notifications, and so on, it was all one big project.

Oh sure, we told ourselves all the non-web components were separate projects, but they were included in the Rails app via git submodules. They all shared code from app/models and lib. They all changed together. If you changed our background job processor you had to bump a submodule in the Rails app, run all the tests and deploy the entire thing to all the machines.

Oh and did I mention the build took two hours spread across a handful of Jenkins (then Hudson) build slaves? It’s a wonder we ever shipped anything.

Time for some house-cleaning

If you’ve worked on any early-stage, rapidly growing product you probably recognize this scenario. You’ve been adding features and tests all over the place, you’re not sure which ones have value but you keep all of them anyway, and you focus on releasing as fast as possible. We went through two major versions of the product like this, and it’s fine when your team and the codebase are relatively small. Everyone knows where everything is, it’s not that hard to maintain.

But in the medium- and long-term, this doesn’t scale. The Big Ball of Mud makes it increasingly harder to experiment, to bring new hires up to speed or deal with sudden scaling issues. We needed to do something.

Step 1: get organized

We began this process in mid-2010 by extracting the shared part of our codebase into a couple of libraries, songkick-core and songkick-domain. Core mostly contains fairly generic infrastructure stuff: APIs for locating and connecting to databases and message queues, managing per-environment config, logging/diagnostics support etc. Domain contains all the shared business logic, which given our history means all our ActiveRecord models and observers, and anything else we needed to share between the website and the background processes: the web routes, third-party API clients, file management, factory_girl definitions and shared cucumber steps, etc.

This was a really useful step forward since it let us take all our background process code out of the Rails app and deploy it separately. Each project just included Core and Domain as submodules and through them got access to all our infrastructure and business logic. Happy days. It was a great feeling not to have all that stuff gunking up our website’s codebase, and it meant it didn’t need re-testing and re-deploying quite so often.

Step 2: encourage small projects

One great way to keep development sustainable is to favour small components: components and libraries with focused responsibilities that you can easily reuse. Encouraging this style of development means you need to make it easy to develop, integrate, and deploy many small components rather than one big ball. The easier this is, the more likely people are to create such libraries.

Unfortunately, despite our restructured codebase this was nowhere near easy enough. Using git submodules meant that any time Core or Domain was changed, one had to bump those submodules in all downstream projects, re-test and re-deploy them. We needed something more dynamic that would ease this workload.

The first thing we tried was Rubygems. We started packaging Core as a gem and a Debian package, which is how we distribute all the libraries we rely on. We thought that by using semantic versioning we could force ourselves to pay better attention to our API design. This turned out to be wishful thinking: this is a core component on which everything depends, and has to change fairly frequently. It’s the sort of thing that should be deployed from git by Capistrano, not through formal versioning and apt-get. The fact that it was now a globally installed library also made it really hard to test and do incremental roll-out. Long story short, we ended up at version 0.3.27 before giving up on this system.

(I can already hear everyone saying we should have used Bundler. Another consequence of the time we started the project is that we run Rails 2.2 on Ruby 1.8.7 and Rubygems 1.3.x, and making Bundler work has proved more trouble than it’s worth. Upgrading Rails and Ruby is, let’s say, Somewhat Difficult, especially with the volume of code and libraries we have, and at a startup there’s always something more urgent to do. These days we have a bunch of apps and services running on modern Ruby stacks, but it’s still not pervasive. Part of this process is about decoupling things so we can change their runtimes more easily.)

Step 3: tools, tools, tools

So we needed a migration path to get to a more sustainable model. In 2011 we built a dependency tracker called Janda (don’t ask) to make it easier to manage and encourage lots of small projects. It was based on a few key ideas borrowed from Bundler and elsewhere:

  • Every project declares which others it depends on
  • Circular dependencies are not allowed
  • Dependencies can be loaded from a global location or vendored inside the project
  • A project cannot load code from anything not in its dependency graph
  • Versioning is done with git
  • Builds are run by checking dependencies out into the project itself and the system tracks which versions of components have been tested together
  • We only deploy one version of each project to production at any time
  • The deployment system makes sure the set of versions we deploy are mutually compatible, based on build results

This gave us several important things: a system for dynamically locating and loading dependencies, which let us stop using submodules and manually updating them; a dependency-aware build and deployment system that made it easy to check what needed testing as a result of every change; and a framework imposing some light restrictions on how code could be structured.

Building this tool exposed dozens of places in our code where we had implicit and circular dependencies we weren’t aware of. To make our software work with this system, we had to get it into better shape through refactoring. This process itself led to several new libraries being extracted so they could be safely shared and tracked. It was a big step forward, and helped us ship code faster and with more confidence.

Step 4: break the dependencies

That probably sounds like a weird thing to say after spending all that effort on a dependency tracker. But in truth it was always going to be an interim measure; we want to be using the same Ruby toolchain everyone else is, it’s just easier that way. Plus, we have mounting pressure in other areas. Domain is still a big project, full of dozens of classes that know too much about each other. Every ActiveRecord model we have is free to interact with the others. It’s hard to change it without breaking anything downstream, and it’s making it harder for us to split our monolithic database into chunks that can scale independently. All familiar scaling woes.

So, since late last year we’ve been working on the current stage of growing our codebase: replacing all our couplings to ActiveRecord, and the Domain project as a whole, with web services. We have a handful of services that expose JSON representations of various facets of our domain. One service handles concert data, one handles user accounts, one deals with uploaded media, and so on. Long-term, the aim is to get to a stage where we can change the internals of these services – both their code and their datastores – independently of each other, and independently of the apps that use them.

These services put an explicit stable boundary layer into our stack that makes it easier to work on components on either side of the line independently. They reduce coupling, because apps are now making HTTP calls to stable language-agnostic APIs rather than loading giant globs of Ruby code, and it simplifies deployment – if you change a service, you don’t need to restart all the clients of the service since there’s no code they need to reload.

Enough pontificating, show us the code!

We’re going to get into the details of how we’re implementing this in later articles. There’s a lot we can talk about, so if you have any questions you should drop us a line on Twitter.

From 15 hours to 15 seconds: reducing a crushing build time

Over the past year we have reduced our website test suite build time by over 99.9%.

  • Build time a year ago: 15 hours.
    Across 15 EC2 build slaves it took “only” 1 hour of real time.

  • Build time today: 15 seconds
    On my laptop.

Having a build that took over an hour to run crippled the productivity of our team.

So, how did we make such a drastic improvement? There were no quick fixes, though Lord knows we tried to find them. Instead we have had to completely change the way we test.

Rather than any brilliant new techniques, there were instead three big mistakes that we had made that created such a monster build time. We went down a wrong path, and it took a lot of time and effort to fix it later.

Bad Practice #1: We favoured integration tests over unit tests

We used to be extremely thorough in our integration tests. We used them to test everything, usually instead of unit tests, which were comparatively thin on the ground. Since integration tests are far, far slower than unit tests, this caused a lot of unnecessary work.

To fix this we looked at each integration test in turn and either:

  • ditched it (i.e. we increased our tolerance for broken things in exchange for having a faster build)
  • rewrote it as a unit test on a specific class
  • kept it, as we still needed a few integration tests for each component

Bad Practice #2: We had many, many features that were relatively unimportant

Many of the less used or less strategic features on songkick.com have gone. This was an extremely painful decision to make, and we made it for bigger reasons than just improving our build time. But it certainly improved the build time a lot.

Fixing this and the previous point have turned a library of 1642 Cucumber scenarios into just 200.

Bad Practice #3: Our integration tests were actually acceptance tests

This test suite used to integrate over our website, domain library, database, message bus and background job workers. Each was spun up as a separate process in the test environment. We basically ran all our website tests against a working version of our entire system. Remember I said we tested virtually every code path? This added up to a lot of time.

Nowadays, our website integration tests are really only integration tests. They integrate over code inside a single project. Every interface to another project is stubbed.

All our database access code is isolated in a domain library behind a thin service layer and is stubbed in the website project.

Instead of over a thousand acceptance tests, we now have fewer than 10. They run against our staging and production environments, after deployment, instead of slowly booting up a test environment during the build.

Six months later

Productivity is up! Morale is up! It’s amazing just how much a faster build has improved our working experience.

Remember that the suite described above was only one of our builds. We had multiple projects with builds that took more than 30 minutes to run. Now none of our test builds take longer than 5 minutes, which is now considered “slow”.

These mistakes are far clearer to us in hindsight than they were at the time, so I’d recommend looking carefully to make sure you are not infected by any of these bad practices. If you are, take the time to fix the problem. If you’re not, congratulations on avoiding the pitfalls we fell in to!

PS: Bonus quick fixes for reading this far

Ok, so there were one or two things we did that probably count as Quick Fixes:

  • tweak our Ruby interpreter’s GC settings by trial and error (we use REE).
  • run the build entirely on the tmpfs in-memory file system

Both of these gave surprisingly significant improvements, and accounted for perhaps 10% of the speedup.