SOA: what our services look like

This article is part of a series on Songkick’s migration to a service-oriented architecture. The full series:

Since I began mentioning to people that Songkick is migrating its user-facing Rails app and lots more supporting components to service-oriented architecture, I’ve been asked many times to explain how we’re doing it. Truth is, it took us a while to figure this out. Any departure from the Rails Way suddenly requires all sorts of dangerous things like Decisions and Creativity and Disagreement. What we have today is the mostly-stable result of rounds of iteration, trial-and-error and debate.

What do you mean by services, exactly?

When we say SOA, we mean we’re replacing all the ActiveRecord-based data access and business logic in our applications with a number of orthogonal web services:

  • event-listings handles data relating to concerts, artists, venues, and so on
  • accounts handles users’ account data and authentication
  • taste-imports processes sets of artists uploaded by various sources of user taste data
  • caltrak handles users’ taste data and calendar generation
  • attendance stores concerts users have said they are going to
  • media handles and stores file uploads – photos, videos and the like
  • recommendations determines sets of similar artists

These are all just Sinatra apps that return JSON for the most part. They encapsulate all the business logic previously held in our ActiveRecord models, indeed they are still based on these models at present. But they don’t simply mirror the ActiveRecord APIs: they reflect how data is used rather than how it’s stored.

ActiveRecord models tend to reflect the design of normalized databases, which reflect the static properties of entities involved. Let’s take an example. Say I ask you to design an ActiveRecord schema for modelling concerts. Most people would come up with something close to our actual model, which is:

  • A polymorphic type Event with two subtypes Concert and Festival
  • The Event class has a date property and optionally an end_date
  • The Event belongs to a Venue, which belongs to a City, which belongs to a MetroArea, which belongs to a Country, and all of these entities have a name
  • The Event has many Performances
  • Each Performance belongs to an Artist and has a billing, either headline or support
  • All Artists have a name, and other metadata like popularity

This makes sense as a database design, but doesn’t reflect how the data is used. Usually, when dealing with an event, you want all of the above information, which means accessing about seven tables. Hope you didn’t miss a JOIN somewhere!

So, we could have exposed all these as distinct resources in our services, with links from each resource to those related to it, but that would be a giant waste of HTTP requests when you always want all this information all at once. It also makes it harder to write client code for the common case – you’d need to write code to follow all those links in every app you build on top of such a service. That’s what I mean when I say the services should reflect how data is used rather than how it is stored. Here’s a request I just made to find out all about Grandaddy’s upcoming show at the Shepherds Bush Empire in September.

$ curl appserver:9101/events/12511498

{
    "id":           12511498,
    "type":         "Concert",
    "status":       "ok",
    "path":         "/concerts/12511498-grandaddy-at-o2-shepherds-bush-empire",
    "date":         "2012-09-04",
    "startTime":    "2012-09-04T19:00:00+0000",
    "endDate":      null,
    "upcoming":     true,
    "profileImage": {"id": 523306, "type": "Image"},
    "series": null,
    "performances": [{
        "artist": {
            "id":           63366,
            "name":         "Grandaddy",
            "path":         "/artists/63366-grandaddy",
            "popularity":   0.044921,
            "active":       true,
            "profileImage": {"id": 523306, "type": "Image"},
            "upcomingEventsCount": 11
        },
        "id":       24380668,
        "billing":  "headline"
    }],
    "venue": {
        "id":         38320,
        "internalId": 38320,
        "name":       "O2 Shepherd's Bush Empire",
        "path":       "/venues/38320-o2-shepherds-bush-empire",
        "smallCityLongName": "London, UK",
        "unknown":    false
    }
}

Everything an app wants to know about an event, in one HTTP call. I’m reminded of this quote, which always springs to mind when I’m putting a boundary between my business logic and a user interface:

Remember that the job of your model layer is not to represent objects but to answer questions. Provide an API that answers the questions your application has, as simply and efficiently as possible. Sometimes these answers will be painfully specific, in a way that seems “wrong” to even a seasoned OO developer.

ORM is an anti-pattern

As well as encapsulating common queries, the services encapsulate operations we often need to perform, such as recording that someone likes At The Drive-In, or creating a new concert. The services are not a thin skin over the database, they encapsulate all our domain logic so it does not get replicated in various applications. The amount of code you need in an app in order to access this logic is fairly minimal, and I’ll explain what it looks like in a future post.

What is this buying us?

The core maintainability problem with a large monolithic application, like our songkick-domain library, is that internal coupling tends to creep in over time, rendering it hard to change one thing without affecting a lot of unexpected components. Every time you commit a change to the monolithic core, all the apps depending on it need re-testing and re-starting.

Monolithic database abstractions in particular are problematic because they’re coupled to, well, a monolithic database. If you have everything in one big MySQL DB, chances are parts of that DB are under much heavier load than others. It’s hard to add more capacity in this situation without replicating the whole database; you’d rather have your data split into chunks that can be horizontally scaled independently. This both makes scaling easier and reduces cost, since you’re not wasting DB machine capacity on lots of data that probably doesn’t need replicating (yet).

Creating a set of decoupled services gives us a way to deal with that: by creating an explicit boundary layer that’s designed to be kept stable, we can change the internals of the services without breaking the apps downstream, and do it faster than if the apps were still coupled to the Ruby implementation of this logic. As our applications are moved off of ActiveRecord and onto these service APIs, the volume of code coupled to our models is going down by orders of magnitude, so we can more easily chip away at these models and begin to split them up, assigning them just to the services that need them.

I mentioned in my previous post that, because of the amount of coupled Ruby code living in one process, we’ve been stuck on old versions of Ruby, Rails and other libraries for some time. Splitting our code up like this greatly reduces the amount of code living in the same process, and makes it easier for us to upgrade our dependencies, or totally change what language or hosting platform a service runs on.

The boundary creates an awareness among the team that this is a deliberate stable API, and makes the abstraction boundary more obvious than it is with bag of Ruby APIs that all live in the same process. But we can only do this because we understand the problem domain sufficiently. We’ve been working on Songkick for five years, and so we have a much better understanding of how to divide the domain up than when we started. Of course, when you start a project, you have no idea about half the stuff that’s going to end up in it, so this migration should be seen as refactoring, rather than cookie-cutter architecture to adopt from day one.

3 thoughts on “SOA: what our services look like

  1. This is a fantastic format, I could see it application becoming more accessible with MVC js formats like backbone.js etc, loading in component form similar to fb. How have you seen the improvement with speed with this change?

    There is going to be a higher ratio of units/boxes to power this, smaller in each, or does this compensate against having fewer high-powered units in forms of cost? in a sense “an once of gold could be beat out to 1mm thick, and cut with ease, but try breaking the lump! “

  2. Brendan, we have realized an increase in speed, but this is largely because of other ways in which we refactored the front end code — the controllers and view classes, rather than how we changed the data access layer. We went through a process of simplification that exposed inefficient code and made it easier to clean up; hopefully we’ll get to this in future articles.

  3. Sorry in the delayed reply, great stuff, I was wondering in the process to get the information that you kept a controllers and views of Rails, or one Sinatra driving a cluster of smaller ones =)