Apple tvOS Tech Talks, London 2016

Apple tvOS Tech Talks
London 2016
by Michael May

opening-slide

As part of Apple’s plan to get more apps onto the Apple TV platform they instigated one of their irregular Tech Talks World Tours. It came to London on January 11th 2016 and I got a golden ticket to attend the one day event.

The agenda for the day was

Apple TV Tech Talks Kickoff
Designing for Apple TV
Focus Driven Interfaces with UIKit
Break
Siri Remote & Game Controllers
On-Demand Resources & Data Storage
Lunch
Media Playback
Leveraging TVML for Media Apps
Best Practices for Designing tvOS Apps
Break
Tuning Your tvOS App
Making the Most Out of the Top Shelf
App Store Distribution
Reception

All sample code was in Swift, as you might expect, but they made a point of saying that you can develop tvOS apps in Objective-C, C++, and C too. I think these are especially important for the gaming community where frameworks such as Unity are so important (despite Metal and SpriteKit).

I won’t go through each session, as I don’t think that really serves any useful purpose (the videos will be released, so I am told). Instead I’ll expand on some of my notes from the day, as they were the points I thought were interesting.

The day started with a brief intro session that included a pre-amble about how TV is so entrenched in our lives and yet so behind the times. This led into a slide that simply said…

future-of-tv

“The Future of TV is Apps”

That’s probably the most bullish statement of intent that I’ve heard from Apple, so far, about their shiny new little black box. I think that if we can change user behaviour in the coming months and years then I might agree (see my piece at the end).

Then they pointed out that, as this is the very first iteration of this product, there are no permutations to worry about – the baseline for your iOS app might be an iPhone 4S running iOS 8 but for tvOS it’s just the latest and greatest – one box, one OS.

This is a device for which you can assume

  • It is always connected (most of the time)
  • It has a high speed connection (most of the time)
  • It has a fast dual-core processor
  • It has a decent amount of memory
  • It has a decent amount of storage (and mechanisms for maintaining that)

They then went on to explain that the principles for a television app are somewhat different from a phone app. Apple specifically called out three principles that you should consider when designing your app.

  • Connected
    Your users must feel connected to the content of your app. As your app is likely some distance from the user, with no direct contact between finger and content, this is a different experience from touching the glass of an iPhone UI.
  • Clear
    Your app should be legible and the user should never get lost in the user interface. If the user leaves the room for a moment then comes back, can they pick up where they left off?
  • Immersive
    Just like watching a movie or TV series, your app should be wholly immersive whilst on-screen.

If you had said these things to me casually, I would have probably said, “well, yeah, obviously” but when you have it spelled out to you, it gives you pause for thought;

“If I did port my app, how would I make an experience that works with the new remote and also makes sense on everything from a small flat-screen in a studio flat to an insanely big projector in a penthouse.”

Add to that the fact that the TV is a shared experience – from watching content together to just allowing different users to use your app at different times – it’s not the intimate experience we have learned to facilitate on iOS. It should still be personal, but it’s not personal to the same person all the time. Think of Netflix with their user picker at startup, or the tvOS AirBnB app with it’s avatar picker at the bottom of the screen.

Next was the Siri Remote and interactions via it. This is one complex device packed in a deceptively small form factor – from the microphone to the trackpad, gyroscope and accelerometer, this is not your usual television remote. We can now touch, swipe, swing, shake, click and talk to our media centre. The exciting thing for us as app developers is that almost all of this is open for us to use, either out of the box (for apps) or as custom interactions from raw event streams (particularly useful for games).

As you might expect from Apple, they were keen to stress that there were expectations for certain buttons that you should respect. Specifically, the menu and play/pause buttons. I like that they are encouraging conformity – it’s very much what people expect from Apple, but found it a bit silly when demonstrating how one might use the remote in landscape as a controller for a racing game. This, to me, felt a bit like dogma. If you want this to become a great gaming device, accept the natural limitations of the remote and push game controllers as the right choice here. Instead they kept going on about the remote and controllers being first class citizens in all circumstances.

Speaking to an indie game developer friend about the potential of the device, he said that he would really like three things from Apple, at least, before hopping on board;

  • Stats on Apple TV sales to evaluate the size of the market
  • A games pack style version that comes with two controllers to put the device on a par with the consoles
  • Removal of the requirement to support the remote as an option in games. Trying to design a game that must also work with the remote is just too limiting and hopefully Apple will realise this as they talk to more games companies.

A key component of the new way of interacting with tvOS (versus iOS) is the inability to set the focus for the user. Instead you guide the “focus engine” as it changes the focus for the user, in response to their gestures. This gives uniformity, again, and also means that apps cannot become bad citizens and switch the focus under the user. One could imagine the temptation to do this being hard to resist for some kinds of apps – breaking news or the latest posts in a social stream, perhaps.

Instead you use invisible focus guides between views and focusable properties on views to help the engine know what the right thing to do is. At one point in the presentations the speaker said

“Some people think they need a cursor on the Apple TV…they are wrong”

It seems clear to me that the focus engine is designed specifically to overcome this kind of hack and is a much better solution. If you’ve ever tried to use the cursor remote on some “Smart” TV’s then you’ll know how that feels. If not, imagine a mouse with a low battery after one too many happy hour cocktails.

With the expansive, but still limited resources of the Apple TV hardware, there will be times when there simply is not enough storage for everything that the user wants to install. The same in fact, holds true for iOS already. Putting aside my rant about how cheap memory and storage are and how much Apple cash-in on both by making them premium features, their solution is On-Demand Resources (ODR).

With ODR you can mark resources as being one of three types which change when, and if, they are downloaded, and how they may be purged under low resource conditions. Apple want you to bundle up your resources (images, videos, data, etc, but not code) into resource packs and to tag them. You tag them as either

  • Install
  • Prefetch
  • Download only on demand

Install come bundled with the app itself (splash screen, on-boarding, first levels, etc). Prefetch are downloaded automatically, but after launching the app and on demand are as you might expect – on demand from the app. On demand can be purged, using various heuristics as to how likely they are to affect the user/app – things like last accessed date and priority flags.

Although not talked about that much as far as I can tell, to me TVML is one of the big stories of tvOS. Apple have realised that writing a full blown native app is both expensive and overkill for some. If you’re all about content then you probably need little more than a grid of content to navigate, a single content drill down view and some play/pause of that streaming content. TVML gives you an XML markup language, powered by a JavaScript engine, that vends native components in a native app. It can interact with your custom app code too, through bridges between the JavaScript DOM and the native wrapper. This makes a lot of sense if you are Netflix, Amazon Prime Video, Mubi, Spotify or, as they pointed out, Apple Music and the tvOS App Store.

It highly specific but its highly specific to exactly the type of content Apple so desperately need to woo and who are likely wondering if they can afford to commit time and effort to an untested platform. As we’ve seen with the watchOS 2, developers are feeling somewhat wary of investing a lot of time in new platforms when they also have to maintain their existing ones, start moving to Swift, adopt the latest iOS 9 features, and so on.

I think this is a big deal because what Apple are providing is what so many third parties have been offering for years, to differing degrees of success. This is their Cordova, their PhoneGap or, perhaps most closely, their React Native. This is a fully Apple approved, and Apple supported, hybrid app development solution that your tranche of web developers are going to be able to use. If this ever comes to iOS it could open up apps to developers and businesses that just cannot afford a native app team, or the services of an app agency (assuming your business is all about vending content and you can live with a template look and feel). I think this could be really big in the future and in typical Apple fashion they are keeping it very low key for now.

They kept teasing that we were all there to find out how to get featured (certainly people were taking more photos there than anywhere else) but before that they spoke about tuning your apps for the TV. This included useful tricks and tips for the well documented frustrations of trying to enter text on the tvOS remote (make sure to mark email fields as such  – Apple will offer a recently used email list if you do) to examples of using built-in technologies to share data instead of asking the user to do work.

To the delight of my friends who work there, they demonstrated the Not On The High Street App and it’s use of Bonjour to discover the users iPhone/iPad and push the product they want to sell into the basket of the app on that platform. From there the user can complete their purchase very quickly – something that would be fiddly to do on the TV (slow keyboard, no Apple Pay, no Credit Card scanner).

Next came another feature that I think could hint at new directions for iOS in the future – the top shelf. If the user choses to put your app in the top row of apps, then, when it’s selected, that app gets to run a top shelf extension that populates the shelf with static or dynamic image content. This is the closest thing to a Windows Phone live tile experience that we’ve seen so far and, as I say, I think it could signpost a future “live” experience for iOS too. A blend of a Today Widget and a Top Shelf Widget could be very interesting.

Finally came the session they were promising; App Store Distribution. The key take-aways for me were

  • Don’t forget other markets (after the US the biggest app stores are Japan, China, UK, Australia, Canada and Germany)
  • Keep your app title short (typing is hard on tvOS)
  • Spend time getting your keywords right (and avoid wasting space with things like plurals)
  • Let Apple know 3-4 weeks before a major release of your app (appstorepromotion@apple.com)
  • Make your app the very best it can be and mindful of the tvOS platform

top-ios-markets

Then it was on to a reception with some delicious canapés and a selection of drinks. This wasn’t what made it great though. What made it great were all the Apple people in the room, giving everyone time who wanted it. This was not the Apple of old and it was all the better for it. The more of this kind of interaction they can facilitate the stronger their platform will be for us.

The Future of TV is Apps?

I think the future of consumer electronics is a multi-screen ecosystem where the user interface and, of course the form factor itself, follows the function to which it is in service.

Clearly, the television could become a critical screen in this future. I believe that, even as we get new immersive entertainment and story-telling options (virtual reality, 3D, and who knows what else), the passive television experience will persist. Sometimes all you want to do is just sit back and be entertained with nothing more taxing than the pause button.

A TV with apps allows this but also, perhaps, makes this more complex. When all I want to do is binge on Archer, a system with apps might not be what I want to navigate. That being said, if all I want to do is binge on Archer, and this can be done with a simple “Hey Siri, play Archer from my last unplayed episode”, then it’s a step ahead of my passive TV of old. It had better know I use Netflix and it had better not log me out of Netflix every few weeks like the Fire TV Stick does.

If I then get a notification (that hunts for my attention from watch to phone to television to who knows what else) that reminds me I have to be in town in an hour and that there are problems on the Northern Line so I should leave extra time, I might hit pause, grab my stuff and head out. As I sit on the tube with 20 minutes to kill, I might then say “Hey Siri, continue playing Archer”.

Just as I get to my appointment I find my home has noticed a lack of people and gone into low power mode, via a push notification. If I want, I can quickly reply with my expected arrival home time, so that it can put on the heating in time and also be on high alert for anyone else in my house during that period.

I suspect most of these transactions are being powered by apps, not the OS itself, but I do not expect to interact with the apps in most cases anymore. Apps will become simply the containers for the means of serving me these micro-interactions as I need/want them.

One only has to look at the Media Player shelf, Notification Actions, Today Widgets, Watch Apps, Glances, Complications, 3D Touch Quick Actions, and now the tvOS Top Shelf to see that this is already happening and will only increase as time goes on. Your app will power multiple screen experiences and be tailored for each, with multiple view types, and multiple interactions. Sometimes these will be immersive and last for minutes or hours (games, movie watching, book reading, etc) but other times these be will be micro-interactions of seconds at most (reply to a tweet, check the weather, plan a journey, start a music stream, buy a ticket, complete a checkout). Apps must evolve or die.

That situation is probably a few years off yet, but in the more immediate term, if we want the future of TV to be apps (beyond simply streaming content) then users will need to be persuaded that their TV can be a portal to a connected world.

From playing games to checking the weather to getting a travel report, these are all things for which an apps powered TV could be very useful. It’s frequently on, always connected, and has a nice big screen on which to view what you want to know. Whether users find this easier than going to pick up their iPhone or iPad remains to be seen.

I think Apple see the Apple TV as a Trojan horse. Many years ago, Steve Jobs introduced the iMac as the centre of your digital world; a hub into which you plugged things. I think the Apple TV is the new incarnation of that idea – except the cables have now gone (replaced with the likes of HomeKit, AirPlay and Bonjour), the storage is iCloud and the customisation is through small, focused apps, and not the fully fledged applications of old.

It’s early days and if the iPhone has taught us anything it’s that the early model will rapidly change and improve. Where it actually goes is hard to say, but where it could go is starting to become clear.

Is the future of the TV apps? Probably so, but probably not in the way we think of apps right now. The app is dying, long live the app.

tour-pass

 

Recent talks on Songkick Engineering

Since I joined Songkick a little over four years ago, our development team has done some amazing things. Our technology, process and culture have improved an enormous amount.

We’ve always been eager to share our progress on this blog and elsewhere, and we often talk about what we’ve learned and where we are still trying to improve.

Here are some recent talks given by members of our team discussing various aspects of how we work.

How we do product discovery

A few weeks ago, I gave a talk at the Future of Web Apps conference on how we do product discovery at Songkick. I had such an overwhelming response to it that I thought it might be useful to share it with the rest of the world.

Apologies for the whack formatting, but SlideShare doesn’t support Keynote files and I didn’t have time to redo the slides in PowerPoint to include the notes in a better way.

I’d love to hear how you guys go about product discovery and any tips / tricks on how to do it better.

Testing iOS apps

We recently released an update to our iPhone app. The app was originally developed by a third-party, so releasing an update required bringing the app development and testing in-house. We develop our projects in a continuous build environment, with automated builds, unit and acceptance tests. It allows us to develop fast and release often, and we wanted the iPhone project to work in the same way.

This article covers some of the tools we used to handle building and testing the app.

Build Automation

We use Git for our version control system, and Jenkins for our continuous integration server. Automating the project build (i.e. building the project to check for compilation errors) seemed like a basic step and a good place to start.

A prerequisite to this was to create a Mac Jenkins Build Slave, which is outside of the scope of this blog post (but if you’re interested, I followed the “master launches slave agent via SSH” instructions of the Jenkins site).

A quick search of Jenkins plugins page revealed a Xcode plugin which allows for building Objective-C applications. Setting up the plugin was a snap – search and install the “XCode integration” plugin from the Jenkins server plugin page, point the plugin to your project directory on the build slave, enable keychain access, and save.

Now for every commit I made to the project, this task would automatically run, and send me a rude email if project compilation failed. In practice I found that this was an excellent way of reminding me of any files I had forgot to check-in to Git; the project would compile on my laptop but fail on the CI server due to missing classes, images, etc.

Unit testing

I looked briefly into the unit testing framework Apple provides, which ships with Xcode. I added a unit test project to the Songkick app, and looked into creating mocks using OCMock, an Objective-C implementation of mock objects.

We already have fairly extensive API tests to test for specific iPhone-related user-flows (such as signing up, tracking an artist, etc), and due to time constraints we opted to concentrate on building acceptance tests, and revisit unit tests if we had time.

Acceptance Testing

There are a bunch of acceptance testing applications available for iOS apps. Here’s a few of the tools I looked into in detail:

Frank

Frank is an iOS acceptance testing application which supports a Cucumber-style test syntax. I was interested in Frank as we already make use of Cucumber to test our Ruby projects, so the familiarity of the domain-specific language would have been a benefit.

I downloaded the project and got a sample test up-and-running fairly quickly. Frank ships with some useful tools, including a web inspector (“Symbiote”) which allows for inspecting app UI elements using the browser, and a “Frank console” for running ad-hoc commands against an iPhone simulator from the command line.

Frank seems to be a pretty feature rich application. The drawbacks for me were that Frank could not be run on real hardware (as of March 2013, this appears to now be possible), and Frank also requires recompiling your application to make a special “Frankified” version to work with the testing framework.

Instruments

Apple provides an application called Instruments to handle testing, profiling and analysis of applications written with Xcode. Instruments allows for recording and editing UIAutomation scripts – runnable JavaScript test files for use against a simulated iOS app or a real hardware install.

InstrumentsRecordingTest

Being able to launch your app with Instruments, perform some actions from within the app, and have those actions automatically converted into a runnable test script was a really quick and easy way of defining tests. Instruments also supports running scripts via the command line.

The drawback of test scripts created with Instruments is that they can be particularly verbose, and Instruments does not provide a convenient way of formatting and defining individual test files (outside of a single UIAutomation script per unique action).

Tuneup_js

Designed to be used as an accompaniment to UIAutomation scripts created using Instruments, Tuneup_js is a JavaScript library that helps to ease the pain of working with the long-winded UIAutomation syntax.

It provides a basic test structure for organising test steps, and a bunch of user-friendly assertions built on top of the standard ones supported by Instruments.

tuneup

I found that recording tests in Instruments, and then converting them into the Tuneup_js test syntax was a really quick way of building acceptance tests for iOS apps. These tests could then be run using a script provided with the Tuneup_js package.

Scenarios

I settled on using Instruments and Tuneup_js to handle acceptance testing. Instruments because of the ability to quickly record acceptance test steps, and Tuneup_js because it could be used to wrap recorded test steps into repeatable tests and allowed for a nicer test syntax than offered out-of-the-box with UIAutomation. What was missing with these applications was a way to handle running the test files in an easily repeatable fashion, and against the iOS simulator as well as hardware devices.

I couldn’t find an existing application to do this, so I wrote Scenarios (Scenar-iOS, see what I did there?) to handle this task. Scenarios is a simple console Ruby app that performs the following steps:

  • Cleans any previous app installs from the target test device
  • Builds the latest version of the app
  • Installs the app on the target test device
  • Runs Tuneup_js-formatted tests against the installed app
  • Reports the test results

Scenarios accepts command-line parameters, such as the option to target the simulator or a hardware device (with the option of auto-detecting the hardware, or supplying a device ID). Scenarios also adds a couple of extra functions on top of the UIAutomation library:

  • withTimout – Can be used for potentially long-running calls (e.g. a button click to login, where the API call may be slow):
    withTimeout(function(){
      app.mainWindow().buttons()["Login"].tap();
    });
  • slowTap – Allows for slowing-down the speed at which taps are executed. Instruments can run test steps very fast, and sometimes it helps to slow down tests to see what they are doing, and help create a more realistic simulated user experience:
    app.toolbar().buttons()["Delete"].slowTap();

Scenarios ships with a sample project (app and tests) that can be run using the simulator or hardware. Here’s a video of the sample running on a simulator:

Jenkins Pipeline

Now I had build and acceptance tests in place, it was time to hook the tests up to Jenkins. I created the following Jenkins projects:

  • “ios-app” – runs the build automation
  • “ios-app-acceptance-tests-simulator” – runs the app (via Scenarios) on a simulator
  • “ios-app-acceptance-tests-iPhone3GS” – runs the app (via Scenarios) on an iPhone 3GS

jenkins-pipeline

Committing a code change to the iOS app Git repo caused the projects in the Jenkins pipeline to build the app, run the acceptance tests against the simulator, and finally run the acceptance tests on an iPhone 3GS. If any stage of the pipeline failed, I received an email informing me I had broken something.

test-iphone

Manual testing with TestFlight

As well as an automated setup, we also made use of the excellent TestFlight service, which enables over-the-air distribution of apps to testers. We had 12 users and 16 devices set up in TestFlight, and I was releasing builds (often daily) over-the-air. It enabled us to get some real-user feedback on the app, something that build and acceptance tests cannot replace.

Jenkins also has a TestFlight plugin, which enables you to automatically deploy a build to TestFlight as part of the pipeline. Very cool, but as we were committing code changes often throughout the day (and only wanted to release to TestFlight once a day), we decided to skip this step for the time being.

Overall, I think that the tools (both open-source and proprietary) available today for automated testing of iOS apps are feature rich (even if some are still in their infancy), and I’m pretty happy with our development setup at Songkick.

Migrating to a new Puppet certification authority

At Songkick all our servers are managed using Puppet, an open source configuration management tool. We use it in client-server mode and recently had the need to replace the certification authority certificates on all our nodes. I couldn’t find much information on how to do this without logging onto every machine, so I’ve documented my method.

What is this Puppet CA anyway?

If you’re using puppet in its typical client-server or agent-master setup, then when the puppet master is first started it will create a certification authority (CA) which all clients that connect to it need to be trusted by and must trust. This usually happens transparently, so often people aren’t aware that this certification authority exists.

The CA is an attempt to have trust between the agents and the master, so that an attacker cannot set up malicious puppet masters and tell puppet agents to do his or her bidding and also prevent malicious clients being able to see configuration data for other clients. Agents should only connect to masters that have certificates signed by its CA and masters should only send configuration information to clients that have certificates signed by the same CA.

There’s a more comprehensive explanation of Puppet SSL written by Brice Figureau which goes into far more detail than we have space for. The main thing to understand is that the CA is an important part of maintaining security and that you can only have one CA across a set of machines that access the same secured resources.

Why would I want to migrate to a new CA?

  • Your current CA certificate is about to expire. By default, CA certificates have a validity period of 5 years, so fairly early adopters of puppet will need to replace them.
  • You’ve had multiple CAs in use and need to consolidate on one.
  • You believe that your certificates and private keys are in the hands of people who could cause mischief with them.
  • You have fallen foul of bugs relating to the fact that you use a CA created in an older version of puppet.

It was in fact the second of these reasons that applied to Songkick; we’d previously been using multiple puppet masters, each with their own CA. We wanted to start using exported resources, stored in the same PuppetDB instance for all nodes. This meant that each master needed to be trusted by the same CA that signed the PuppetDB instance; hence we needed to consolidate on one CA.

How do I do it?

Set up NEW puppet master(s)

Set up at least one new puppet master server, with a new CA certificate.

If you have a lot of existing hosts managed by puppet, then it’s worth considering enabling the autosign option, even if only temporarily, as you’ll have a number of certificate requests to approve manually otherwise.

Configure AGents to connect to THe new master(S)

We’re assuming here that you’re managing the puppet agent configuration through puppet. and that changes to the puppet configuration cause an automatic restart of the puppet agent.

Change the configuration of your puppet agents, to connect to the new master(s) and use a different ssldir:

[main]
server = <new server hostname> 
ssldir = /var/lib/puppet/ssl2

Be careful not to apply this change to your newly created puppet master.

Your clients should reconfigure themselves, restart and when they start up, connect to your new puppet master, forgetting their old ssl configuration, including the CA certificates.

If you have autodiscovery records for puppet in DNS, e.g. an A record for ‘puppet’ or the SRV records, then you should leave them in place for now. Agents that have not been migrated to the new CA may need it.

It is a good idea to test this on a handful of nodes and check that it works in a completely automated fashion before applying to every node.

Tidying up (part 1)

Once every node has checked in with the new master and been issued with a new certificate, it’s time to start the process of tidying up. It’s a good idea to revert back to using the default ssldir, so that when agents bootstrap themselves with the default config, they do not then switch to the new ssldir and thus forget their old certificates. This will cause the master to refuse to talk to them, as this looks like a spoofing attempt.

On each client, we mirror the new ssldir to the old one:

file { '/var/lib/puppet/ssl': 
  source => 'file:///var/lib/puppet/ssl2',
  recurse => true, 
  purge => true, 
  force => true, 
}

Be careful not to apply this change to your newly created puppet master.

Tidy up (part 2)

Once that’s shipped everywhere, we remove the ssldir configuration, fall back on the default ssldir and remove the above resource definiton to copy the ssldir.

Tidy up (part 3)

You can now update your autodiscovery DNS entries, to point to the new servers and remove the autosign configuration, if desired.

Finally, we ship a change to the clients that removes the temporary /var/lib/puppet/ssl2 directory.

And that’s it, everything has been migrated to the new CA, with no need to do anything outside of puppet.

Testing your database backups: the test environment database refresh pattern

When did you last try restoring your database backups? A month ago, a week ago? A year ago? Never? When was the last time you refreshed the data in your test environments? When I joined Songkick, one of the first things I asked was when we last tested a restore of our database backups. The answer, pleasingly, was at 03:00 UK time that morning and not coincidentally, that’s when we last refreshed the data in our test environments.

Here’s how we get the warm and fuzzy feeling of knowing that our backups contain data that can be restored and makes sense.

  1. Every morning, our database servers run their scheduled backups, copying the resulting images to a backup server in the data centre.
  2. Overnight those backups get copied to the office, giving us an offsite copy.
  3. In the small hours, when most of us are asleep, each of the database servers in our staging environment retrieve the backups, erase their local data files and then restore the production backups over the top of them.
  4. We perform sanitisation on the data, to make it suitable for use in a testing environment.
  5. And finally, and most importantly, we use the databases in our testing.

By doing this, we identified one case when our backups seemed to work, produced plausible looking backups, but MySQL failed to apply InnoDB log records during recovery. It was inconvenient to discover this problem in our staging environment, but far less inconvenient than discovering it when we needed the backups to put our production system back into operation.

Here are some practical tips based on our experience implementing and managing this system at Songkick:

Back all databases up at the same time

If your system is composed of services backed by independent databases on different machines, it’s possible that there’s some implicit consistency between them. For example, a common situation at Songkick is to have an accounts service responsible for storing user accounts and another service that stores user data keyed against a user, then there’s an expectation that those databases have some degree of consistency.

If you back them up at different times, you’ll find inconsistencies, that a service might have a reference to a user that doesn’t yet exist. If the ID of the user is exposed to other services and that ID can be reused, you may find that newly created users in your test environment have existing data associated with them and this can cause significant problems in testing.

It’s worth noting that, in the case of a production restore, these issues would need to be diagnosed and solved in the heat of the moment. By finding them in your test environment, you’re giving yourself the space to solve them earlier, under less pressure.

Design the backups to be regularly exercised

Some types of backups are more amenable to being restored regularly in test environments. For example, our initial MongoDB database backups performed snapshots of our MongoDB database path. These proved difficult to restore, because they included local databases which contained information on replica set membership. This means that on startup, our staging MongoDB server would forget its existing replica set membership and try to talk to the production servers instead.

We switched to using mongodump to take a logical export of the database, simply so that we could restore it on the primary member of our existing staging replica set and update the entire replica set.

Sanitisation tips

After we’ve restored the databases, there are certain things we do to make them safe and usable in our testing environments.

  • Remove or obfuscate email addresses. We’re not fond of accidentally emailing people with test events we’ve created in staging, so we change people’s email addresses to be unusable, so that can’t happen. We leave people’s email addresses alone if they work at Songkick, so we can test email features by emailing ourselves.
  • Remove or obfuscate payment tokens. If it’s uncool to accidentally email people, accidentally charging them is positively hostile. Anything that’s used for payment needs to be removed.
  • Fix or replace information about the environment. It’s best to avoid keeping references to your technical environment in the same database as your application data, but sometimes it’s tricky to workaround. For example, our MogileFS installation needs to be kept in sync with our production one, to avoid problems with missing media. This means that we need to manually update the database to substitute the hostnames of the mogilefs servers.

Write code that can withstand the database going away

Unless you’ve put some work in, almost no database driver will gracefully handle the disappearance of a database server and then its re-appearance some time later. If the restore in your test environment is the first time you’ve tried this, you may find that you need to manually restart services, even after the database re-appears on the network.

The solution will vary depending on the database client being used, but often it’s a case of catching an exception, or changing some options when you establish the connection.

By making your applications reconnect to the database with no manual input, you are again fixing a problem that will eventually occur in production – a much more stressful time for it to be diagnosed and fixed.

Summary

Testing your database backups by restoring them automatically and regularly in your test environments is a great way to battle-harden your backups and applications and to make sure that your test environment looks like the real production environment.


If you’ve liked what you’ve read, why not head over to our jobs page? We’re looking for a Systems Engineer to add more touches like these to our infrastructure.

Safely dealing with magical text

Boy, what a week it’s been. A remote-code-execution bug was discovered in Ruby on Rails, and we’ve all been scrambling to patch our servers (please patch your apps before reading any further, there is an automated exploit out there that gives people a shell on your boxes otherwise).

What the Ruby community, and those of other dynamic languages, must realize from recent Rails security blunders is that very similar problems can easily exist in any non-trivial web application. Indeed, I found a remote-execution bug in my own open-source project Faye yesterday, 3.5 years into the life of the project (again: patch before reading on).

There are a lot of lessons to be had from recent Rails security blunders, since they involve so many co-operating factors: excessive trust of user input, insufficient input validation and output encoding, the behavioural capabilities of Ruby objects and certain Rails classes, ignorance of cryptography and the computational complexity of data transport formats. In this post I’d like to focus on one in particular: safely encoding data for output and execution.

Ugh, do I have to?

I know, I know, booooooring, but so many people are still getting this really badly wrong and it continues punish end users by exposing their data to malicious manipulation.

Robert Hansen and Meredith Patterson have a really good slide deck on stopping injection attacks with computational theory. One core message in that paper is that injection exploits (including SQL injection and cross-site scripting) involve crafting input such that it creates new and unexpected syntactic elements in code executed by the software, essentially introducing new instructions for the software to execute. Let’s look at a simple example.

Learn you a query string

I found the code that prompted me to write this post while updating some Google Maps URLs on our site this afternoon. Some of this code was constructing URLs by doing something like this:

def maps_url(lat, lng, zoom, width, height)
  params = [ "center=#{lat},#{lng}",
             "zoom=#{zoom}",
             "size=#{width}x#{height}" ]

  "http://maps.google.com/?" + params.join("&amp;")
end

maps_url(51.4651204, -0.1148897, 15, 640, 220)

# => "http://maps.google.com/?center=51.4651204,-0.1148897&amp; ...
#                             zoom=15&amp; ...
#                             size=640x220"

You can see the intent here: whoever wrote this code assumes the URL is going to end up being embedded in HTML, and so they have encoded the query string delimiters as &amp; entities. But this doesn’t fix the problem entities are designed to solve, namely: safely representing characters that usually have special meaning in HTML. What is telling is that the comma in the query string should really also be encoded as %2C, but isn’t.

So although the ampersands are being encoded, the actual query data is not, and that means anyone calling this function can use it to inject HTML, for example:

link = '<a href="' +
           maps_url(0, 0, 1, 0, '"><script>alert("Hello!")</script>') +
           '">Link text</a>'

# => '<a href="http://maps.google.com/?center=0,0&amp; ...
#                                      zoom=1&amp; ...
#                                      size=0x"> ...
#         <script>alert("Hello!")</script> ...
#         ">Link text</a>'

By abusing the maps_url() function, I have managed to inject characters with special meaning — <, >, etc. — into the output and thereby added new HTML elements to the output that shouldn’t be there. By passing unexpected input I’ve created a lovely little cross-site scripting exploit and stolen all your users’ sessions!

Note that you cannot cleanly fix this by using an HTML-escaping function like ERB::Util.h() on the output of maps_url(), because this would serve to re-encode the ampersands, leaving strings like &amp;amp; in the href attribute.

Stacks of languages

Meredith Patterson of the above-linked paper gave another presentation at 28C3 called The Science of Insecurity. I’ve been telling absolutely everyone to watch it recently, so here it is.

This talk describes how we should think of data transfer formats, network protocols and the like as languages, because in fact that’s what they are. It covers the different levels of language power – regular languages, context-free languages and Turing-complete languages – and how use of each affects the security of our systems. It also explains why, if your application relies on Turing-complete protocols, it will take an infinite amount of time to secure it.

When you build HTML pages, you are using a handful of languages that all run together in the same document. There’s HTML itself, and embedded URLs, and CSS, and JavaScript, and JavaScript embedded in CSS, and CSS selectors embedded in CSS and JavaScript, and base64 encoded images, and … well this list is long. All of these are languages and have formal definitions about how to parse them, and your browser needs to know which type of data it’s dealing with whenever it’s parsing your code.

Every character of output you generate is an instruction that tells the browser what do next. If it’s parsing an HTML attribute and sees the " character, it truncates the attribute at that point. If it thinks it’s reading a text node and sees a <, it starts parsing the input as an HTML tag.

Instead of thinking of your pages as data, you should think of them as executable language.

Back to reality

Let’s apply this idea to our URL:

http://maps.google.com/?center=51.4651204,-0.1148897&amp;zoom=15&amp;size=640x220

Outside of an HTML document, the meaning of this list of characters changes: those &amp; blobs only have meaning when interpreting HTML, and if we treat this query string verbatim we get these parameters out:

{
  'center'   => '51.4651204,-0.1148897',
  'amp;zoom' => '15',
  'amp;size' => '640x220'
}

(This assumes your URL parser doesn’t treat ; as a value delimiter, or complain that the comma is not encoded.)

We’ve seen what happens when we embed HTML-related characters in the URL: inserting the characters "> chops the <a> tag short and allows injection of new HTML elements. But that behaviour comes from HTML, not from anything about URLs; when the browser is parsing an href attribute, it just reads until it hits the closing quote symbol and then HTML-decodes whatever it read up to that point to get the attribute value. It could be a URL, or any other text value, the browser does not care. At that level of parsing, it only matters that the text is HTML-encoded.

In fact, you could have a query string like foo=true&bar="> and parsing it with a URL parser will give you the data {'foo' => 'true', 'bar' => '">'}. The characters "> mean something in the HTML language, but not in the query string language.

So, we have a stack of languages, each nested inside the other. Symbols with no special meaning at one level can gain meaning at the next. What to do?

Stacks of encodings

What we’re really doing here is taking a value and putting it into a query string inside a URL, then putting that URL inside an HTML document.

                                +-------------------------+
                                | "51.4651204,-0.1148897" |
                                +------------+------------+
                                             |
    +----------------------------------------|--------+
    |                                +-------V------+ |
    | http://maps.google.com/?center=| centre_value | |
    |                                +--------------+ |
    +------------------------+------------------------+
                             |
                       +-----V-----+
              <a href="| url_value |">Link>/a>
                       +-----------+

At each layer, the template views the value being injected in as an opaque string — it deosn’t care what it is, it just needs to make sure it’s encoded properly. The problem with our original example is that it pre-emptively applies HTML encoding to data because it anticipates that the value will be used in HTML, but does not apply encodings relevant to the task at hand, namely URL construction. This is precisely backwards: considering the problem as above we see that we should instead:

  1. Decide what type of string we’re creating — is it a URL, an HTML doc, etc.
  2. Apply all encoding relevant to the type of string being made
  3. Do not apply encodings for languages further up the stack

In other words, we should make a URL-constructing function apply URL-related encoding to its inputs, and an HTML-constructing function should apply HTML encoding. This means each layer’s functions can be recombined with others and still work correctly, becasue their outputs don’t make assumptions about where they will be used. So we would rewrite our code as:

def maps_url(lat, lng, zoom, width, height)
  params = { "center" => "#{lat},#{lng}",
             "zoom"   => zoom,
             "size"   => "#{width}x#{height}" }

  query = params.map do |key, value|
    "#{CGI.escape key.to_s}=#{CGI.escape value.to_s}"
  end
  "http://maps.google.com/?" + query.join("&")
end

url = maps_url(51.4651204, -0.1148897, 15, 640, 220)

# => "http://maps.google.com/?center=51.4651204%2C-0.1148897& ...
#                             zoom=15& ...
#                             size=640x220"

html = '<a href="' + ERB::Util.h(url) + '">Link</a>'

# => '<a href="http://maps.google.com/?center=51.4651204%2C-0.1148897&amp; ...
#                                      zoom=15&amp; ...
#                                      size=640x220">Link</a>'

Now we see that we get two valid pieces of data: url is a valid URL with all its query parameters correctly encoded but no HTML entities present, and html is a valid HTML fragment with its attributes correctly entity-encoded.

Also, note how we have treated all incoming data as literal (i.e. not already encoded for the task at hand), and we have not hand-written any encoding ourselves (e.g. hand-writing entities like &amp;). You should deal with data assuming it contains the literal information it represents and use library functions to encode it correctly. There’s a very good chance you don’t know all the text transformations required by each layer.

Thinking in types

At this point you’re probably thinking that I’ve made something quite simple seem very complicated. But thinking in terms of types of strings, treating your output as a language stack and following the bullet list above is a good discipline to follow if you want to make sure you handle data safely.

There are some systems that do this for you, for example Rails 3 automatically HTML-escapes any value you insert into an ERB template by default. I’m working on a more general version of this idea: Coping is a templating language that checks your templates conform to the language you’re producing, and doesn’t let input introduce new syntactic elements.

If you’re feeling very brave, I recommend taking the Coursera Compilers course. Although it doesn’t seem immediately relevant to web devs, many concepts from parser theory, type checking and code generation can be applied to security and are well worth learning.

Above all, learn from other people’s security failures and consider where you may have made similar mistakes.

Introducing Aspec: A black box API testing DSL

Caltrak is the service that stores Songkick users’ tracked artists and cities. It has no other service dependencies. You put data into the Caltrak box, then you get it back out.

For instance, you might make two POST requests to store artist trackings, and then want to retrieve them, which would look like this:

# create and retrieve artist trackings
POST /users/7/artists/1    204
POST /users/7/artists/2    204
 GET /users/7/artists      200    application/json   [1, 2]

Did you understand basically what that was saying? I hope so, because that’s an executable spec from the Caltrak tests.

It’s pretty simple. Every line is both a request and an assertion. Every line says “If I make this request then I expect to get this back”.

This works because the behaviour of this service can be entirely described through the REST API. There are no “side affects” that are not visible through the API itself.

Here is a longer portion from the aspec file.

# no users have pending notifications
   GET /users/with-pending-notifications                200  application/json  []

# users with events on their calendar have pending notifications
  POST /users/764/metro-areas/999                       204
  POST /users/764/artists/123                           204
  POST /events/5?artist_ids=123&metro_area_id=999       204
  POST /events/5/enqueue-notifications                  204
   GET /users/with-pending-notifications                200  application/json  [[764, "ep"]]

# users are unique in the response
  POST /users/764/artists/123                           204
  POST /users/764/artists/456                           204
  POST /users/764/metro-areas/999                       204
  POST /events/5?artist_ids=123,456&metro_area_id=999   204
  POST /events/5/enqueue-notifications                  204
   GET /users/with-pending-notifications                200  application/json  [[764, "ep"]]

Some aspects:

  • Each line has the format Verb, Url (with Params), Status, Content Type, Body separated by whitespace. These are the only things that can be asserted about the service responses.
  • Each “paragraph” is a separate test. The database is cleared in-between.
  • Lines beginning with # are comments.
  • Aspec stubs time, so that the first line of the test occurs precisely on the epoch and each subsequent line occurs 2s after that. This allows us to test responses with creation timestamps in them.

Motivation

When we began developing Caltrak, I wasn’t happy with the process of writing tests for this service.

I wanted the test framework to expose the simple nature of the API. You could make something almost as simple in RSpec or Cucumber with judicious use of helpers and so on, but you would end up with a DSL that obscured the underlying REST API.

In an Aspec file, there is no syntax that does not express real data either sent or received from the service. You’re basically writing down the actual HTTP requests and responses with lots of parts omitted. It is technical, but it is very readable. I think it is better documentation than most service tests.

Also, there is no context that is not immediately visible, as there might be with nested RSpec contexts, for example, where in a large test file the setup may be very distant from the test and assertion.

Implementation

NB This project is very immature. Use at your own risk.

Aspec assumes your project uses Rack, and uses Rack/Test to talk to it. The code is published on GitHub and there is a tiny example API project.

It is very similar to rspec in operation. You write a .aspec file, and put an aspec_helper.rb next to it.

Then run

aspec aspec/my_service.aspec

I’d be interested in hearing your thoughts on this testing style.

A Second Here, A Second There…

A Mysterious Performance Problem

In production, we were seeing lots of requests to our backend services taking a long time. They would typically be over 1000ms when we would expect them to be much, much faster. It was intermittent too. Sometimes the same endpoints were fast, sometimes they were slow.

When we investigated, we were able to reduce the performance problem to the following mystifying test case, which should happen on your system too.

With this Sinatra app:

require 'sinatra'

post "/foo" do
  status 200
  "There are #{params.length} params"
end

Making this request:

curl -d "a=1" "localhost:3000/foo"

takes almost exactly 0 seconds.

(Note, we use libcurl to talk to services instead of Ruby’s Net::HTTP, so this is very close to a real service request from one of our applications.)

On the other hand, making this request:

curl -id "a=$A_KB_OF_DATA" "localhost:8080/foo"

takes almost exactly 1 second.

Now, it shouldn’t take 1 second to make such a simple request just because I’m sending 1K of data.

So what on earth is happening here? If you already know the answer then much respect to you; but this was a head scratcher for me.

After spending a long time in the Sinatra, Rack and Unicorn stack, I was able to trace the long wait to the socket read call in the Unicorn request object.

At which point, Graham (our Head of Tech Ops) suggested that we dump the packets to examine what Curl was actually sending to the server and receiving back. The command we used for this was:

tcpdump -i lo -s 0 port 8080

(Which says, capture all packets on the loopback interface, in their entirety, to and from port 8080.)

Inspecting the contents of these packets led to us (or at least me) learning some new things about HTTP.

I Suck At HTTP

Here’s something I didn’t know about HTTP, from the RFC. An HTTP client can omit the request body and add this header to the request:

Expect: 100-continue

Which is a way of saying “I’m not sure if you want this request body, do you?”

If the server does want the request body, it should respond with a 100 response, and the client can then upload the body and the request/response continues as normal from there.

If it does not want the request body, it can respond with whatever other HTTP code.

Now, this is handy if there are circumstances where you want to reject a request before you go to the trouble of uploading a perhaps massive request body. (Maybe an enormous image file being posted to an endpoint that is not accepting uploads right now.)

It’s entirely up to the client whether to include this header. You can not bother at all, or only bother for very large files, or do it all the time. All valid approaches.

Curl says: if your post data is larger than 1K, I’ll send the Expect header. I say: OK Curl. 1 kilobyte seems pretty small, but whatever, fine.

So far so good.

Here’s the problem: Ruby web servers do not implement 100 Continue automatically.

And they shouldn’t, either. It’s up to the application to make that choice, receive or not receive, so application authors need to implement that themselves in general. And we didn’t.

So what happens? Curl sits there waiting. “Come on, come on, do you want this request body or not?

Curl makes a Weird Choice

So true, our app was technically broken. It happens. And we would have fixed it if Curl had timed out waiting for the 100, as you might expect. Because that would have been an error we would have noticed and fixed.

But Curl doesn’t timeout waiting. If it has sent the Expect header, Curl will wait for the 100 response, but only for exactly one second. After one second has passed, it will send the request body anyway.

So from our apps point of view, everything is fine. It was just a bit slow. No errors raised or reported. Just a slow service. Sometimes.

I guess from the point of view of Curl, it’s best to make every effort to have a successful request, even with performance degradation. From our point of view, failing hard would have been better than months of silent performance regression.

Working Around It

So we now have two options. The first and best is to actually implement the 100 Continue behaviour correctly. The second is to make Curl not behave like this.

Because we control all the clients to our internal services and wanted a quick fix, we decided to go with the second approach, which I’ll describe first.

If you set the Expect header yourself, even to a blank string, that overrides Curl’s behaviour and forces it to send the entire request immediately:

Expect: 

We did this and fixed all the slow requests we were seeing instantly.

Fixing it Properly (With Unicorn)

It’s not entirely obvious how to implement the required behaviour in a Sinatra application. How would you respond with 100 and then continue processing a request in Sinatra?

Fortunately, Unicorn has support for 100 Continue built in. If you respond with 100 from your application, it will return that to the client, and then immediately turn around and call your app a second time with the full data.

So we can make our application detect the Expect header and respond with 100 if appropriate.

We’d like to be able to make this decision from Sinatra, but I believe that Sinatra eagerly reads all of the request, so the delay is triggered before your action block is run.

So instead, include the logic in a Rack middleware, like this. Although here there is no logic, we simply request the body always and immediately if we detect that the client is waiting for 100 response:

class AlwaysRequestBody
  def initialize(app)
    @app = app
  end

  def call(env)
    if env["HTTP_EXPECT"] =~ /100-continue/
      [100, {}, [""]]
    else
      @app.call(env)
    end
  end
end

use AlwaysRequestBody
run Sinatra::Application

With this in place, the previously slow curl request returns instantly, and you can see here Curl logging the intermediate 100 response as well.

$ time curl -id "a=$A_KB_OF_DATA" "localhost:8080/foo"
HTTP/1.1 100 Continue

HTTP/1.1 200 OK
Date: Tue, 27 Nov 2012 17:32:52 GMT
Status: 200 OK
Connection: close
X-Frame-Options: sameorigin
Content-Type: text/html;charset=utf-8
X-XSS-Protection: 1; mode=block
Content-Length: 18
There are 1 params

real    0m0.008s
user    0m0.008s
sys     0m0.008s

Problem solved.

Conclusion

Hence my initial assertion that your app is probably broken. If a client sends the Expect header it has every right to expect a correct response,

Streaming modules in Rails 3

Rails 3 has limited built-in support for progressive rendering (or template streaming). It can stream your header, page, and footer when each has completed rendering. This can have a big impact on your pages’ speed. In our case it’s brought our average first response time from 1s to 400ms.

However, this doesn’t go as far as we’d like. In previous versions of songkick.com, we used a plugin that activated module-by-module template streaming. This means that the page streams each module of the page as it is rendered, not just in three big blocks.

This doesn’t seem to be possible in Rails 3 (although it is apparently planned for Rails 4), but with a bit of fiddling we have managed to make it work. It’s currently activated on most pages on www.songkick.com, if you head over there and find a slow page you can watch the page come in a module at a time.

I’m sharing it here both because I thought it might be useful, and because it’s a bit of a hack and I’d like feedback on it.

In the action you want to stream, you can turn on standard Rails 3 streaming with:

def show
  render :stream => true
end

But to enable module by module streaming there are two things to do. First, in your layout:

<% $streaming_output_buffer = @output_buffer %>
<%= yield %>

And then in the page template (here show.html.erb) wrap the entire contents in a block:

<% with_output_buffer($streaming_output_buffer) %>
  … all of ERB template in here
<% end %>

This will cause this ERB template to write to the streaming buffer rather than its own output buffer each time it renders a portion of the page.

You should only add this block in the top level page templates. Don’t put it in partials or whatever, otherwise weird things will happen. Also, if your entire page is a render call to just one other template (say the page just contained one big partial) then you won’t see any benefit, because it’s only this template that streams not recursively called ones.

Told you it was a bit of a hack. Hopefully it will tide us over until Rails 4 implements this properly.