Get your objects out of my session

Last week I had the pleasant job of fixing a feature that broke due to a change in a third-party API. Specifically, Twitter changed part of their authentication API and this broke our ‘post your attendance to Twitter’ feature. After a while spelunking through several layers of HTTP indirection inside the twitter and oauth gems, it became apparent that an upgrade was in order – we implemented this feature so long ago that our twitter gem was lagging four major releases behind the current version.

But this isn’t about Twitter, or OAuth, or even those specific Ruby libraries. It’s about an antipattern I was reminded of while updating our code and reading the OAuth gem documentation. Here is how it suggests you start the authorization process in your Twitter client app:

@callback_url = "http://127.0.0.1:3000/oauth/callback"
@consumer = OAuth::Consumer.new("key", "secret", :site => "https://agree2")
@request_token = @consumer.get_request_token(:oauth_callback => @callback_url)
session[:request_token] = @request_token
redirect_to @request_token.authorize_url(:oauth_callback => @callback_url)

This code contains a bug that’s bitten me so many times it jumped right off the page:

session[:request_token] = @request_token

Here’s the bug: you just stored the Marshal.dump of some random object in the session. One day, you will refactor this object – change its class name, adjust its instance variables – and next time you deploy, no-one will be able to access your site. It doesn’t matter whether the session is stored in the cookie (and therefore on the user’s computer) or on your servers, the problem is that you’ve stored a representation of state that’s tightly coupled to its implementation.

A simple example

Let’s see this in action. Imagine we have a little Sinatra app with two endpoints. One of these endpoints puts an object in the session, and another one retrieves data from the stored object:

require 'sinatra'
set :sessions, true
set :session_secret, 'some very large random value'

class State
  def initialize(params = {})
    @params = params
  end

  def get
    @params.values.first
  end
end

get '/' do
  session[:state] = State.new(:flow => 'sign_up')
  'Hello'
end

get '/state' do
  session[:state].get
end

We boot the app, and see that it works:

$ curl -i localhost:4567/
HTTP/1.1 200 OK
Content-Type: text/html;charset=utf-8
Content-Length: 5
Set-Cookie: rack.session=BAh7CEk...; path=/; HttpOnly

Hello

$ curl localhost:4567/state -H 'Cookie: rack.session=BAh7CEk...'
sign_up

A little change

So, this seems to work, and we leave the site running like this for a while, and people visit the site and create sessions. Then one day we decide we need to refactor the State class, by changing that hash into an array:

class State
  def initialize(params = [])
    @params = params
  end

  def get
    @params.last
  end
end

get '/' do
  session[:state] = State.new(['sign_up'])
  'Hello'
end

Now if we retry our request we find this buried among the stack traces:

$ curl localhost:4567/state -H 'Cookie: rack.session=BAh7CEk...'

NoMethodError at /state
undefined method `last' for {:flow=>"sign_up"}:Hash

A peek at Rack’s guts

To understand why this happens you need to see how Rack represents the session. Basically, it takes the session hash, such as {:state => State.new(:flow => 'sign_up')}, runs it through Marshal.dump and base64-encodes the result. Here’s what Marshal emits:

>> session = {:state => State.new(:flow => 'sign_up')}
=> {:state=>#"sign_up"}>}
>> Marshal.dump session
=> "\x04\b{\x06:\nstateo:\nState\x06:\f@params{\x06:\tflowI\"\fsign_up\x06:\x06ET"

Marshal produces a literal representation of the object – its class, its instance variables and their values. It is a snapshot of the object that can be completely reconstructed later through Marshal.load.

When you store objects in the session, you are dumping part of your program’s implementation into storage and, if you use cookie-stored sessions, sending that representation to the user for them to give back later. Now, fortunately, cookies are signed by Rack using HMAC-SHA1 so the user should not be able to construct arbitrary Marshal output and inject objects into your program – don’t forget to set :session_secret unless you want people sending forged objects to you! But there is still the problem that your code is effectively injecting objects into processes running in the future, when those objects may no longer be valid.

If you change the name of a class, then Marshal.load will fail, and you’ll get an empty session object. But if all the types referenced in the session dump still exist, it will happily reconstruct all those objects and their state may not reflect what the current process expects.

And as a bonus, once you’ve deployed the session-breaking change, you can’t revert it, because recent visitors will have the new representation in their session. We’ve got various classes in our codebase with multiple names to work around times when we made this mistake.

A better way

In light of the above, you should treat your sessions with a certain degree of paranoia. You should treat them with the same care as a public API, making sure you only put stable representations of state into them. Personally I stick to Ruby’s core data types – strings, numbers, booleans, arrays, hashes. I don’t put user-defined classes (including anything from stdlib or gems) in there. Similarly, you should not assume any given session key exists, since the session may become corrupt, the user may delete their cookies, and so on. Always check for nil values before using any session data, unless you want your site to become unreachable.

A future-proof Twitter client

So how should you use the Twitter gem and avoid these problems? Easy – just store the credentials from the request token, and reconstruct the token when Twitter calls you back:

Twitter.configure do |c|
  c.consumer_key    = 'twitter_key'
  c.consumer_secret = 'twitter_secret'
end

def consumer
  OAuth::Consumer.new('twitter_key',
                      'twitter_secret',
                      :site => 'https://www.example.com')
end

def callback_url
  'https://www.example.com/auth/twitter/callback'
end

get '/auth/twitter' do
  request_token = consumer.get_request_token(:oauth_callback => callback_url)
  session[:request_token] = @request_token.token
  session[:request_secret] = @request_token.secret
  redirect request_token.authorize_url(:oauth_callback => callback_url)
end

get '/auth/twitter/callback' do
  token  = session[:request_token]
  secret = session[:request_secret]

  halt 400 unless token and secret
  session[:request_token] = session[:request_secret] = nil
  
  request_token = OAuth::RequestToken.from_hash(consumer,
                      :oauth_token => token,
                      :oauth_token_secret => secret)
  
  access_token = request_token.get_access_token(:oauth_verifier => params[:oauth_verifier])
  
  client = Twitter::Client.new(
               :oauth_token => access_token.token,
               :oauth_token_secret => access_token.secret)
  
  user_details = client.verify_credentials
  
  store_twitter_tokens(user_details.screen_name,
                       access_token.token,
                       access_token.secret)
  
  redirect '/auth/twitter/success'
end

Note how we only store strings in the session and the database, and we store just enough of the credentials that we can construct an OAuth or Twitter client later, whenever we need one.

This approach only stores stable representations – tokens used in the OAuth protocol – and constructs objects by hand when they are needed rather than relying on Marshal dumps. This makes the application more resilient when the libraries you depend on inevitably need upgrading.

5 thoughts on “Get your objects out of my session

  1. The complications didn’t even occur to me. I’m glad I came across your post when i did because I’m currently saving objects in my session willy nilly.

  2. What about the store_twitter_tokens method? Do you always store the token an secret values in the database? Don’t they expire?

  3. Nicholas: the store_twitter_tokens method referred to here is an abstraction — this code does not specify how the tokens are stored. They could be in the database, the session, or somewhere else depending on the needs of your application.

  4. And yes, OAuth tokens can expire or be revoked at any time. When making OAuth-based requests you need to take this into account and fail gracefully.