Safely dealing with magical text

Boy, what a week it’s been. A remote-code-execution bug was discovered in Ruby on Rails, and we’ve all been scrambling to patch our servers (please patch your apps before reading any further, there is an automated exploit out there that gives people a shell on your boxes otherwise).

What the Ruby community, and those of other dynamic languages, must realize from recent Rails security blunders is that very similar problems can easily exist in any non-trivial web application. Indeed, I found a remote-execution bug in my own open-source project Faye yesterday, 3.5 years into the life of the project (again: patch before reading on).

There are a lot of lessons to be had from recent Rails security blunders, since they involve so many co-operating factors: excessive trust of user input, insufficient input validation and output encoding, the behavioural capabilities of Ruby objects and certain Rails classes, ignorance of cryptography and the computational complexity of data transport formats. In this post I’d like to focus on one in particular: safely encoding data for output and execution.

Ugh, do I have to?

I know, I know, booooooring, but so many people are still getting this really badly wrong and it continues punish end users by exposing their data to malicious manipulation.

Robert Hansen and Meredith Patterson have a really good slide deck on stopping injection attacks with computational theory. One core message in that paper is that injection exploits (including SQL injection and cross-site scripting) involve crafting input such that it creates new and unexpected syntactic elements in code executed by the software, essentially introducing new instructions for the software to execute. Let’s look at a simple example.

Learn you a query string

I found the code that prompted me to write this post while updating some Google Maps URLs on our site this afternoon. Some of this code was constructing URLs by doing something like this:

def maps_url(lat, lng, zoom, width, height)
  params = [ "center=#{lat},#{lng}",
             "zoom=#{zoom}",
             "size=#{width}x#{height}" ]

  "http://maps.google.com/?" + params.join("&")
end

maps_url(51.4651204, -0.1148897, 15, 640, 220)

# => "http://maps.google.com/?center=51.4651204,-0.1148897& ...
#                             zoom=15& ...
#                             size=640x220"

You can see the intent here: whoever wrote this code assumes the URL is going to end up being embedded in HTML, and so they have encoded the query string delimiters as & entities. But this doesn’t fix the problem entities are designed to solve, namely: safely representing characters that usually have special meaning in HTML. What is telling is that the comma in the query string should really also be encoded as %2C, but isn’t.

So although the ampersands are being encoded, the actual query data is not, and that means anyone calling this function can use it to inject HTML, for example:

link = '<a href="' + maps_url(0, 0, 1, 0, '"><script>alert("Hello!")</script>') +
           '">Link text</a>'

# => '<a href="http://maps.google.com/?center=0,0&amp; ... 
#     zoom=1&amp; ... 
#     size=0x"> ...
#     <script>alert("Hello!")</script> ...
#     ">Link text</a>'

By abusing the maps_url() function, I have managed to inject characters with special meaning — <, >, etc. — into the output and thereby added new HTML elements to the output that shouldn’t be there. By passing unexpected input I’ve created a lovely little cross-site scripting exploit and stolen all your users’ sessions!

Note that you cannot cleanly fix this by using an HTML-escaping function like ERB::Util.h() on the output of maps_url(), because this would serve to re-encode the ampersands, leaving strings like &amp;amp; in the href attribute.

Stacks of languages

Meredith Patterson of the above-linked paper gave another presentation at 28C3 called The Science of Insecurity. I’ve been telling absolutely everyone to watch it recently, so here it is.

This talk describes how we should think of data transfer formats, network protocols and the like as languages, because in fact that’s what they are. It covers the different levels of language power – regular languages, context-free languages and Turing-complete languages – and how use of each affects the security of our systems. It also explains why, if your application relies on Turing-complete protocols, it will take an infinite amount of time to secure it.

When you build HTML pages, you are using a handful of languages that all run together in the same document. There’s HTML itself, and embedded URLs, and CSS, and JavaScript, and JavaScript embedded in CSS, and CSS selectors embedded in CSS and JavaScript, and base64 encoded images, and … well this list is long. All of these are languages and have formal definitions about how to parse them, and your browser needs to know which type of data it’s dealing with whenever it’s parsing your code.

Every character of output you generate is an instruction that tells the browser what do next. If it’s parsing an HTML attribute and sees the " character, it truncates the attribute at that point. If it thinks it’s reading a text node and sees a <, it starts parsing the input as an HTML tag.

Instead of thinking of your pages as data, you should think of them as executable language.

Back to reality

Let’s apply this idea to our URL:

http://maps.google.com/?center=51.4651204,-0.1148897&amp;zoom=15&amp;size=640x220

Outside of an HTML document, the meaning of this list of characters changes: those & blobs only have meaning when interpreting HTML, and if we treat this query string verbatim we get these parameters out:

{
  'center'   => '51.4651204,-0.1148897',
  'amp;zoom' => '15',
  'amp;size' => '640x220'
}

(This assumes your URL parser doesn’t treat ; as a value delimiter, or complain that the comma is not encoded.)

We’ve seen what happens when we embed HTML-related characters in the URL: inserting the characters "> chops the <a> tag short and allows injection of new HTML elements. But that behaviour comes from HTML, not from anything about URLs; when the browser is parsing an href attribute, it just reads until it hits the closing quote symbol and then HTML-decodes whatever it read up to that point to get the attribute value. It could be a URL, or any other text value, the browser does not care. At that level of parsing, it only matters that the text is HTML-encoded.

In fact, you could have a query string like foo=true&bar="> and parsing it with a URL parser will give you the data {'foo' => 'true', 'bar' => '">'}. The characters "> mean something in the HTML language, but not in the query string language.

So, we have a stack of languages, each nested inside the other. Symbols with no special meaning at one level can gain meaning at the next. What to do?

Stacks of encodings

What we’re really doing here is taking a value and putting it into a query string inside a URL, then putting that URL inside an HTML document.

                                +-------------------------+
                                | "51.4651204,-0.1148897" |
                                +------------+------------+
                                             |
    +----------------------------------------|--------+
    |                                +-------V------+ |
    | http://maps.google.com/?center=| centre_value | |
    |                                +--------------+ |
    +------------------------+------------------------+
                             |
                       +-----V-----+
              <a href="| url_value |">Link</a>
                       +-----------+

At each layer, the template views the value being injected in as an opaque string — it deosn’t care what it is, it just needs to make sure it’s encoded properly. The problem with our original example is that it pre-emptively applies HTML encoding to data because it anticipates that the value will be used in HTML, but does not apply encodings relevant to the task at hand, namely URL construction. This is precisely backwards: considering the problem as above we see that we should instead:

  1. Decide what type of string we’re creating — is it a URL, an HTML doc, etc.
  2. Apply all encoding relevant to the type of string being made
  3. Do not apply encodings for languages further up the stack

In other words, we should make a URL-constructing function apply URL-related encoding to its inputs, and an HTML-constructing function should apply HTML encoding. This means each layer’s functions can be recombined with others and still work correctly, becasue their outputs don’t make assumptions about where they will be used. So we would rewrite our code as:

def maps_url(lat, lng, zoom, width, height)
  params = { "center" => "#{lat},#{lng}",
             "zoom"   => zoom,
             "size"   => "#{width}x#{height}" }

  query = params.map do |key, value|
    "#{CGI.escape key.to_s}=#{CGI.escape value.to_s}"
  end
  "http://maps.google.com/?" + query.join("&")
end

url = maps_url(51.4651204, -0.1148897, 15, 640, 220)

# => "http://maps.google.com/?center=51.4651204%2C-0.1148897& ...
#                             zoom=15& ...
#                             size=640x220"

html = '<a href="' + ERB::Util.h(url) + '">Link</a>'

# => '<a href="http://maps.google.com/?center=51.4651204%2C-0.114889
#     &amp; ... 
#     zoom=15&amp; ... 
#     size=640x220">Link</a>'

Now we see that we get two valid pieces of data: url is a valid URL with all its query parameters correctly encoded but no HTML entities present, and html is a valid HTML fragment with its attributes correctly entity-encoded.

Also, note how we have treated all incoming data as literal (i.e. not already encoded for the task at hand), and we have not hand-written any encoding ourselves (e.g. hand-writing entities like &amp;). You should deal with data assuming it contains the literal information it represents and use library functions to encode it correctly. There’s a very good chance you don’t know all the text transformations required by each layer.

Thinking in types

At this point you’re probably thinking that I’ve made something quite simple seem very complicated. But thinking in terms of types of strings, treating your output as a language stack and following the bullet list above is a good discipline to follow if you want to make sure you handle data safely.

There are some systems that do this for you, for example Rails 3 automatically HTML-escapes any value you insert into an ERB template by default. I’m working on a more general version of this idea: Coping is a templating language that checks your templates conform to the language you’re producing, and doesn’t let input introduce new syntactic elements.

If you’re feeling very brave, I recommend taking the Coursera Compilers course. Although it doesn’t seem immediately relevant to web devs, many concepts from parser theory, type checking and code generation can be applied to security and are well worth learning.

Above all, learn from other people’s security failures and consider where you may have made similar mistakes.

validates_uniqueness_of :nothing

Warning: this article contains rather a lot of silly decisions.

I’ve recently been working out some bugs in our OAuth implementation, including our OAuth2::Provider library. One of the biggest gotchas I found while diagnosing problems with our client apps was the existence of duplicate Authorization records.

An Authorization is a link between a ResouceOwner (i.e. a Songkick user) and a Client, for example our iPhone application. It represents that the user has granted the client access to their resources on Songkick. There should only be one of these per owner-client pair, and somehow we had a few thousand duplicates in our database. Getting more concrete, the table’s columns include the following:

+---------------------+--------------+
| Field               | Type         |
+---------------------+--------------+
| resource_owner_type | varchar(255) |
| resource_owner_id   | int(11)      |
| client_id           | int(11)      |
+---------------------+--------------+

Each combination of values for these three columns must only appear once in the table.

A series of unfortunate events

Now the Rails Way to make such guarantees is to use validates_uniqueness_of, or use a find_or_create_by_* call to check if something exists before creating it. And that’s basically what I’d done; OAuth2::Provider has a method called Authorization.for(owner, client) that would either find a suitable record or create a new one.

But despite implementing this, we were still getting duplicates. I removed an alternative code path for getting Authorization records, and still the duplicates continued. I figured something in our applications must be creating them, so I made new() and create() private on the Authorization model. No dice.

And then I remembered: concurrency! Trying to enforce uniqueness on the client doesn’t work, unless all the clients subscribe to a distributed decision-making protocol. If two requests are in flight, both can run a SELECT query, find there’s no existing record, and then both decide to create the record. Something like this:

             User 1                 |               User 2
------------------------------------+--------------------------------------
# User 1 checks whether there's     |
# already a comment with the title  |
# 'My Post'. This is not the case.  |
SELECT * FROM comments              |
WHERE title = 'My Post'             |
                                    |
                                    | # User 2 does the same thing and also
                                    | # infers that his title is unique.
                                    | SELECT * FROM comments
                                    | WHERE title = 'My Post'
                                    |
# User 1 inserts his comment.       |
INSERT INTO comments                |
(title, content) VALUES             |
('My Post', 'hi!')                  |
                                    |
                                    | # User 2 does the same thing.
                                    | INSERT INTO comments
                                    | (title, content) VALUES
                                    | ('My Post', 'hello!')
                                    |
                                    | # ^^^^^^
                                    | # Boom! We now have a duplicate
                                    | # title!

This may look familiar to you. In fact, I lifted straight out of the ActiveRecord source where it explains why validates_uniqueness_ofdoesn’t work when you have concurrent requests.

Users do the funniest things

I agree with you – in theory. In theory, communism works. In theory.

— Homer J. Simpson

There can be a tendency among some programmers to dismiss these arguments as things that probably won’t be a problem in practice. Why would two requests arrive at the same time, close enough to cause this race condition in the database, for the same user’s resources? This is the same thinking that tells you timing attacks are impossible over the Internet.

And I subscribed to this belief for a long time. Not that I thought it was impossible, I just thought there were likelier causes – hence all the attempts to shut down record creation code paths. But I was wrong, and here’s why:

People double-click on things on the Web.

Over time, we designers of software systems have instilled some confusing habits in the people who use our products, and one of those habits means that there is a set of people that always double-click links and form buttons on web pages. Looking at the updated_at timestamps on the duplicate records showed that most of them were modified very close together in time, certainly close enough to cause database race conditions. This fact by itself makes client-enforced uniqueness checks a waste of time. Even if you’re not getting a lot of requests, one little user action can blow your validation.

This is the database’s job

Here’s how this thing should be done, even if you think you’re not at risk:

class AddUniqueIndexToThings < ActiveRecord::Migration
  def self.up
    add_index :oauth_authorizations,
              [:client_id, :resource_owner_type, :resource_owner_id],
              :unique => true
  end
  
  def self.down
    remove_index :oauth_authorizations,
                 [:client_id, :resource_owner_type, :resource_owner_id]
  end
end

Then, when you try to create a record, you should catch the potential exception that this index will through if the new record violates the uniqueness constraint. Rails 3 introduced a new exception called ActiveRecord::RecordNotUnique for its core adapters, but if you’re still supporting older Rails versions you need to catch ActiveRecord::StatementInvalid and check the error message. Here’s how our OAuth library does things.

DUPLICATE_RECORD_ERRORS = [
  /^Mysql::Error:\s+Duplicate\s+entry\b/,
  /^PG::Error:\s+ERROR:\s+duplicate\s+key\b/,
  /\bConstraintException\b/
]

def self.duplicate_record_error?(error)
  error.class.name == 'ActiveRecord::RecordNotUnique' or
  DUPLICATE_RECORD_ERRORS.any? { |re| re =~ error.message }
end

In the Authorization.for(owner, client) method, there’s a rescue clause that uses duplicate_record_error? to check the exception raised. If it’s a duplicate record error, we retry the method call since the second time it should find the new record that was inserted since the first call started.

Get your objects out of my session

Last week I had the pleasant job of fixing a feature that broke due to a change in a third-party API. Specifically, Twitter changed part of their authentication API and this broke our ‘post your attendance to Twitter’ feature. After a while spelunking through several layers of HTTP indirection inside the twitter and oauth gems, it became apparent that an upgrade was in order – we implemented this feature so long ago that our twitter gem was lagging four major releases behind the current version.

But this isn’t about Twitter, or OAuth, or even those specific Ruby libraries. It’s about an antipattern I was reminded of while updating our code and reading the OAuth gem documentation. Here is how it suggests you start the authorization process in your Twitter client app:

@callback_url = "http://127.0.0.1:3000/oauth/callback"
@consumer = OAuth::Consumer.new("key", "secret", :site => "https://agree2")
@request_token = @consumer.get_request_token(:oauth_callback => @callback_url)
session[:request_token] = @request_token
redirect_to @request_token.authorize_url(:oauth_callback => @callback_url)

This code contains a bug that’s bitten me so many times it jumped right off the page:

session[:request_token] = @request_token

Here’s the bug: you just stored the Marshal.dump of some random object in the session. One day, you will refactor this object – change its class name, adjust its instance variables – and next time you deploy, no-one will be able to access your site. It doesn’t matter whether the session is stored in the cookie (and therefore on the user’s computer) or on your servers, the problem is that you’ve stored a representation of state that’s tightly coupled to its implementation.

A simple example

Let’s see this in action. Imagine we have a little Sinatra app with two endpoints. One of these endpoints puts an object in the session, and another one retrieves data from the stored object:

require 'sinatra'
set :sessions, true
set :session_secret, 'some very large random value'

class State
  def initialize(params = {})
    @params = params
  end

  def get
    @params.values.first
  end
end

get '/' do
  session[:state] = State.new(:flow => 'sign_up')
  'Hello'
end

get '/state' do
  session[:state].get
end

We boot the app, and see that it works:

$ curl -i localhost:4567/
HTTP/1.1 200 OK
Content-Type: text/html;charset=utf-8
Content-Length: 5
Set-Cookie: rack.session=BAh7CEk...; path=/; HttpOnly

Hello

$ curl localhost:4567/state -H 'Cookie: rack.session=BAh7CEk...'
sign_up

A little change

So, this seems to work, and we leave the site running like this for a while, and people visit the site and create sessions. Then one day we decide we need to refactor the State class, by changing that hash into an array:

class State
  def initialize(params = [])
    @params = params
  end

  def get
    @params.last
  end
end

get '/' do
  session[:state] = State.new(['sign_up'])
  'Hello'
end

Now if we retry our request we find this buried among the stack traces:

$ curl localhost:4567/state -H 'Cookie: rack.session=BAh7CEk...'

NoMethodError at /state
undefined method `last' for {:flow=>"sign_up"}:Hash

A peek at Rack’s guts

To understand why this happens you need to see how Rack represents the session. Basically, it takes the session hash, such as {:state => State.new(:flow => 'sign_up')}, runs it through Marshal.dump and base64-encodes the result. Here’s what Marshal emits:

>> session = {:state => State.new(:flow => 'sign_up')}
=> {:state=>#"sign_up"}>}
>> Marshal.dump session
=> "\x04\b{\x06:\nstateo:\nState\x06:\f@params{\x06:\tflowI\"\fsign_up\x06:\x06ET"

Marshal produces a literal representation of the object – its class, its instance variables and their values. It is a snapshot of the object that can be completely reconstructed later through Marshal.load.

When you store objects in the session, you are dumping part of your program’s implementation into storage and, if you use cookie-stored sessions, sending that representation to the user for them to give back later. Now, fortunately, cookies are signed by Rack using HMAC-SHA1 so the user should not be able to construct arbitrary Marshal output and inject objects into your program – don’t forget to set :session_secret unless you want people sending forged objects to you! But there is still the problem that your code is effectively injecting objects into processes running in the future, when those objects may no longer be valid.

If you change the name of a class, then Marshal.load will fail, and you’ll get an empty session object. But if all the types referenced in the session dump still exist, it will happily reconstruct all those objects and their state may not reflect what the current process expects.

And as a bonus, once you’ve deployed the session-breaking change, you can’t revert it, because recent visitors will have the new representation in their session. We’ve got various classes in our codebase with multiple names to work around times when we made this mistake.

A better way

In light of the above, you should treat your sessions with a certain degree of paranoia. You should treat them with the same care as a public API, making sure you only put stable representations of state into them. Personally I stick to Ruby’s core data types – strings, numbers, booleans, arrays, hashes. I don’t put user-defined classes (including anything from stdlib or gems) in there. Similarly, you should not assume any given session key exists, since the session may become corrupt, the user may delete their cookies, and so on. Always check for nil values before using any session data, unless you want your site to become unreachable.

A future-proof Twitter client

So how should you use the Twitter gem and avoid these problems? Easy – just store the credentials from the request token, and reconstruct the token when Twitter calls you back:

Twitter.configure do |c|
  c.consumer_key    = 'twitter_key'
  c.consumer_secret = 'twitter_secret'
end

def consumer
  OAuth::Consumer.new('twitter_key',
                      'twitter_secret',
                      :site => 'https://www.example.com')
end

def callback_url
  'https://www.example.com/auth/twitter/callback'
end

get '/auth/twitter' do
  request_token = consumer.get_request_token(:oauth_callback => callback_url)
  session[:request_token] = @request_token.token
  session[:request_secret] = @request_token.secret
  redirect request_token.authorize_url(:oauth_callback => callback_url)
end

get '/auth/twitter/callback' do
  token  = session[:request_token]
  secret = session[:request_secret]

  halt 400 unless token and secret
  session[:request_token] = session[:request_secret] = nil
  
  request_token = OAuth::RequestToken.from_hash(consumer,
                      :oauth_token => token,
                      :oauth_token_secret => secret)
  
  access_token = request_token.get_access_token(:oauth_verifier => params[:oauth_verifier])
  
  client = Twitter::Client.new(
               :oauth_token => access_token.token,
               :oauth_token_secret => access_token.secret)
  
  user_details = client.verify_credentials
  
  store_twitter_tokens(user_details.screen_name,
                       access_token.token,
                       access_token.secret)
  
  redirect '/auth/twitter/success'
end

Note how we only store strings in the session and the database, and we store just enough of the credentials that we can construct an OAuth or Twitter client later, whenever we need one.

This approach only stores stable representations – tokens used in the OAuth protocol – and constructs objects by hand when they are needed rather than relying on Marshal dumps. This makes the application more resilient when the libraries you depend on inevitably need upgrading.

Fun with turtles: how Songkick uses OAuth for just about everything

Nearly a year ago, Songkick waded into the weird and wonderful world of native apps. We released an iPhone app, and more recently a Spotify app. Before we launched either of these projects, we’d been letting people use Facebook to log into our website. All of which throws up the rather hairy problem of how to implement authentication.

It turns out that you can bend OAuth to allow quite a lot of authentication and authorization flows, and we decided to start using it when we developed our iPhone app. We wrote an open-source library called OAuth2::Provider to help us add OAuth 2.0 provision to Ruby web apps, and it’s served us very well when adding new applications and use cases.

What is OAuth?

If you’re not familiar with OAuth, you’ve probably used it if you’ve signed into another site using Facebook, or used various mobile applications. The basic model is quite simple: you have a ‘provider’ app, say www.songkick.com, and a ‘client’ app that’s registered with the provider. The provider issues the client with an ID and a secret, and the client tells the provider its redirect URI. When the client wants to authenticate a user using the provider, it redirects the user’s browser to the provider’s site, for example one of our apps whose redirect URI is https://www.example.com/oauth/callback might redirect to:

https://www.songkick.com/oauth/login?client_id=2sb8nskp5ijmallhnbtvj2p2u&redirect_uri=https%3A%2F%2Fwww.example.com%2Foauth%2Fcallback&response_type=code

The provider then lets the user log in via whatever process it likes, and checks that the user wants to grant the app at www.example.com access to their resources. After this process is complete, the provider redirects the browser back to the client’s redirect URI with a ‘code’:

https://www.example.com/oauth/callback?code=e6lma1389ksglysuek9okwdof

The client application then takes this code, which represents the authenticated user’s access grant, and on the server side it makes a call to the provider. It supplies the code, and also its client ID and secret, to prove it’s really the app the user has granted access to:

curl -X POST https://www.songkick.com/oauth/exchange \
     -d 'client_id=2sb8nskp5ijmallhnbtvj2p2u' \
     -d 'client_secret=d6khk8prcdpeh0zcq97pftaqz' \
     -d 'redirect_uri=https%3A%2F%2Fwww.example.com%2Foauth%2Fcallback' \
     -d 'grant_type=authorization_code' \
     -d 'code=e6lma1389ksglysuek9okwdof'

{"access_token":"4x02z6xy2c40lrn7lmt9ejnx5"}

The provider returns JSON containing an access token, which the client can then use to access a user’s data stored on the provider site. This token represents an authorization relationship between the user and client; every user-client pair gets a unique access token. This simple mechanism provides accountability of who accessed the user’s data, and it gives the user the power to revoke access for individual applications without changing their password.

How does this work in practice?

For client-side applications OAuth provides a usability and security win by relieving the app of the need to store passwords on the user’s device. This makes sure that the security credentials stored on the device only allow access to very specific things rather than the user’s entire digital life (supposing the user, as many do, use the same password for everything). It also means the user can change their password without being logged out of all their mobile applications.

The apps we’ve released for iPhone and Spotify are not web apps, but you can still use the same scheme. For the iPhone we use https://0.0.0.0/ as the redirect URI, and embed a web view containing the Songkick OAuth login page. The iPhone app monitors this embedded view to spot it redirecting to https://0.0.0.0/ so it can extract the code before making the exchange request. In Spotify we can genuinely redirect back to the app using Spotify’s routing system; the redirect URI in this case is spotify:app:songkickconcerts:action:callback.

The problem with client-side applications is that they cannot keep their client secret, well, a secret. Anyone can crack open the download and find the data in there. This is where the redirect URI comes in: imagine someone cracks our iPhone app and extracts the client_id, client_secret and redirect_uri. They could put up a malicious site at www.evil.com and ask users to log in via Songkick, trying to get users to expose their data to www.evil.com. But the provider only processes the request if the redirect_uri is the one registered for that client_id, so the attacker is forced to use the real redirect_uri when redirecting to www.songkick.com if they want the user to log in. But then Songkick will redirect the user’s browser back to the real application, and www.evil.com will never get the code it so desperately wants.

What about Facebook?

As I mentioned, Songkick lets people log in using Facebook, and we needed to continue to let them do that in our native apps so they could access the same account everywhere. OAuth has two features that help with implementing this and we’ve tried both of them.

Assertions

The first of these features is assertions. Instead of sending the user to the provider’s site to make them log in, the client can just pass some authentication token it already has from somewhere else. If that token is unguessable and can be used to identify the user, it can take the place of the code in the exchange request. Say the client app has somehow acquired a Facebook access token for the user (Facebook also implement OAuth 2.0 for their Open Graph system), it can then make one request to authenticate the user with Songkick, using the grant_type=assertion exchange:

curl -X POST https://www.songkick.com/oauth/exchange \
     -d 'client_id=2sb8nskp5ijmallhnbtvj2p2u' \
     -d 'client_secret=d6khk8prcdpeh0zcq97pftaqz' \
     -d 'redirect_uri=https%3A%2F%2Fwww.example.com%2Foauth%2Fcallback' \
     -d 'grant_type=assertion' \
     -d 'assertion_type=https%3A%2F%2Fgraph.facebook.com%2Fme'
     -d 'assertion=the_facebook_access_token'

{"access_token":"4x02z6xy2c40lrn7lmt9ejnx5"}

The assertion_type is some arbitrary URI the provider uses to identify the type of credential being used in the assertion. When Songkick receives this, it makes a call to the Facebook Open Graph:

curl https://graph.facebook.com/me?oauth_token=the_facebook_access_token

This gives us the user’s details and we map those to a Songkick account; we then return our own access token for that account and the iPhone app can then interact with the user’s data. To get the user’s Facebook token it interacts with the Facebook application on the device before communicating with Songkick’s servers.

We also use assertions to provide a useful experience on the iPhone before the user creates an account. We use the phone’s UDID as an assertion and exchange it for a Songkick access token. This way we can personalize the app without the user needing to sign in first.

The state parameter

When we started on the Spotify app, we knew we didn’t want to reimplement Facebook login for that application. We didn’t want to redeploy all our native apps every time we needed to fix a bug from interacting with a third party, or when we wanted to add new login mechanisms. We decided the app should only know how to talk to www.songkick.com, and our website would provide all the login mechanisms we need.

Remember that the provider site is free to implement authentication in whatever way it likes, as long as it eventually redirects to the client with a code. This means we can offer the choice of username/password or Facebook authentication on our website without the client having any idea this choice exists. Recall that the client will redirect to us with a request like:

https://www.songkick.com/oauth/login?client_id=2sb8nskp5ijmallhnbtvj2p2u&redirect_uri=https%3A%2F%2Fwww.example.com%2Foauth%2Fcallback&response_type=code

If the user chooses to log in with Facebook, we’ll make a similar redirect to Facebook, passing our own OAuth client credentials for their service. In other words, our OAuth login page contains a link to

https://www.facebook.com/dialog/oauth?client_id=...&scope=...&redirect_uri=...&state=...

If the user clicks the link, they’ll go through an OAuth transaction on facebook.com and be redirected back to us. The important thing here is the state parameter: an OAuth provider is required to echo this value back unmodified when they redirect back to the client; this is so the client can figure out what it was doing before sending the user off to authenticate and can resume its work.

In our case, this means picking up the conversation with the OAuth client talking to us: the Spotify app. Once we’ve received a code from Facebook and converted it into a Songkick account, we just need to know where to redirect back to with a code for this Songkick account. What we do is, we take the params the client called us with, that is:

params = {
  "client_id"     => "2sb8nskp5ijmallhnbtvj2p2u",
  "redirect_uri"  => "https://www.example.com/oauth/callback",
  "response_type" => "code"
}

Then we convert them into a string for the state parameter we send through Facebook. We JSON-encode them, and Base-64-encode that JSON document, and we also compute an HMAC-SHA1 tag so we can make sure the value is not modified on its way back to us:

string = Base64.encode64(JSON.dump(params)).strip
sha1   = OpenSSL::Digest::Digest.new('sha1')
tag    = OpenSSL::HMAC.hexdigest(sha1, SECRET, string)
state  = string + ':' + tag

When Facebook redirects back to us with a code and state, we use the code to get the user’s Facebook data and map this to a Songkick account, then we unpack the state and use it to reconstruct the original OAuth request to us. We generate a code for the Songkick account and send it back to the client, and the client has no idea how the user actually logged in.

There’s one obvious way to do it

All of this is a little confusing at first, and trying to discuss recursive OAuth transactions certainly eats up a lot of whiteboard space, but the benefit of having one standardized protocol for authentication and authorization is huge. Authentication code can be one of the messiest parts of an application, and getting it wrong can have dire consequences. It’s also something you don’t want to reinvent in every application, and using a protocol as adaptable and extensible as OAuth really helps on this front. We’ve even started using it internally, so that instead of having one ‘traditional’ login page and one OAuth page, we’re beginning to route website logins through our OAuth endpoint and using an internal client to turn a normal login request into an OAuth one, so we can route it through the same logic as everything else. It’s a big maintenance win, and using an open standard should mean it’s friendlier to newcomers trying to maintain it.