Catalog: Increasing visibility for our Android UI tests

Getting automatic feedback from tests is extremely important when building any kind of software. At Songkick, our code is tested, validated, and reported through Jenkins CI.
The pipeline around our Android app includes static analysis, unit tests and instrumentation tests running on real devices and emulators.
Previously, we used square/spoon to run our instrumentation tests. It did a great job, with support for screenshots and LogCat recordings. But recently we had to skip it because it was conflicting with another library, LogCat recording stopped working, and it was taking too long to run all of our tests (around 15 minutes for our entire test suite).
So we moved to the official connected{Variant}AndroidTest tasks. Despite being much faster (around 8 minutes for the same test suite), we were missing the logs. When a test was failing, we couldn’t check the logs for more details. So we started re-running our tests and losing trust in them.

Introducing Catalog

Catalog is a Gradle plugin for Android. When added to your project, it runs with connected{Variant}AndroidTest tasks. At the end of the tests, it generates a report per device in app/build/outputs/androidTest-results/:

Screen Shot 2016-06-20 at 17.15.08

Why should I use it?

  • Catalog is built on top of Android build tools, we are not introducing any new test tasks
  • It will give you more confidence in your tests
  • It is lightweight (basically 8 simple classes)
  • It is fast, it won’t add any significant overhead to your build time

Get started

To include the plugin in your project, just add these lines in your app/build.gradle:

buildscript {
    repositories {
        jcenter()
    }
    dependencies {
        classpath 'com.songkick:catalog:0.1.1'
    }
}

apply plugin 'com.android.application'
apply plugin 'com.songkick.catalog'

How does it work?

Catalog consists of two gradle tasks:

  • recordConnected{Variant}AndroidTest: runs before connected{Variant}AndroidTest and connects to Adb to record the LogCat for the current application.
  • printConnected{Variant}AndroidTest: runs after connected{Variant}AndroidTest and gathers the recorded logs and prints a txt and a html file into app/build/outputs/androidTest-results/.

Going forward

We are starting small with Catalog, but we would love suggestions and feedback. If you like the plugin, please create a pull request or post an issue. We have a few ideas to make it even more awesome, like:

  • show the status of the test (failure/success/ignored)
  • generate a html file listing all devices
  • add support for screenshots

Anything is possible, feel free to contribute: https://github.com/songkick/catalog

How Docker is changing the way we develop, test & ship apps at Songkick

We’re really excited to have shipped our first app that uses Docker throughout our entire release cycle; from development, through to running tests on our CI server, and finally to our production environment. This article explains a bit about why we came to choose Docker, how we’re using it, and the benefits it brings.

Since Songkick and Crowdsurge merged last year we’ve had a mix of infrastructures, and in a long-term quest to consolidate platforms we’ve been looking at how to create a great development experience that would work cross-platform. We started by asking what a great development environment looks like, and came up with the following requirements:

  • Isolate dependencies (trying to run two different versions of a language or database on the same machine isn’t fun!)
  • Match production accurately
  • Fast to set up, and fast to work with day-to-day
  • Simple to use (think make run)
  • Easy for developers to change

We’ve aspired to created a development environment that gets out of the way and allows developers to focus on building great products. We believe that if you want a happy, productive development team it’s essential to get this right, and with the right decisions and a bit of work Docker is a great tool to achieve that.

We’ve broken down some advice and examples of how we’re using Docker for one of our new internal apps.

Install the Docker Toolbox

The Docker Toolbox provides you with all the right tools to work with Docker on Mac or Windows.

A few of us have also been playing with Docker for Mac that provides a more native experience. It’s still in beta but it’s a fantastic step forwards compared to the Docker toolbox and docker-machine.

Use VMWare Fusion instead of Virtualbox

Although Docker Toolbox comes with Virtualbox included, we chose to use VMWare Fusion instead. File change notifications are significantly better using VMWare Fusion, allowing features like Rails auto-reloading to work properly.

Creating a different Docker machine is simple:

$ docker-machine create --driver vmwarefusion default

Use existing services where possible

In development we connect directly to our staging database, removing a set of dependencies (running a local database, seeding structure and data) and giving us a useful, rich dataset to develop against.

Having a production-like set of data to develop and test against is really important, helping us catch bugs, edge-cases and data-related UX problems early.

Test in isolation

For testing we use docker-compose to run the tests against an ephemeral local database, making our tests fast and reliable.

Because you may not want to run your entire test suite each time, we also have a test shell ideal for running specific sets of tests:

$ make shell ENV=test
$ rspec spec/controllers/

Proper development tooling

As well as running the Ruby web server through Docker, we also provide a development shell container, aliased for convenience. This is great for trying out commands in the Rails console or installing new gems without needing Ruby or other dependencies on your Mac.

$ make shell ENV=dev
$ bundle install
$ rails console

Use separate Dockerfiles for development and production

We build our development and production images slightly differently. They both declare the same system dependencies but differ in how they install gems and handle assets. Let’s run through each one and see how they work:

Dockefile.dev

FROM ruby:2.3.1-slim

RUN mkdir -p /app

RUN apt-get update && \
 apt-get install -y \
 build-essential \
 pkg-config \
 libxml2-dev \
 libxslt-dev \
 libmysqlclient-dev \
 mysql-client \
 libssl-dev \
 libreadline-dev \
 git \
 libfontconfig \
 wget && \
 apt-get clean && \
 rm -rf /var/lib/apt/lists/ /tmp/ /var/tmp/

# Add our Gemfile to the app directory, this is here so if it changes
# then the bundle install is triggered again
WORKDIR /app
COPY Gemfile* /app/
COPY vendor/cache /app/vendor/cache

RUN bundle config build.nokogiri --use-system-libraries \
 && bundle install --local

COPY . /app

EXPOSE 8080

CMD ["rails", "server", "-b", "0.0.0.0", "-p", "8080"]

Here we deliberately copy the Gemfile, corresponding lock file and the vendor/cache directory first, then run bundle install.

When steps in the Dockerfile change, Docker only re-runs that step and steps after. This means we only run `bundle install` when there’s a change to the Gemfile or the cached gems, but when other files in the app change we can skip this step, significantly speeding up build time.

We deliberately chose to cache the gems rather than install afresh from Rubygems.org each time for three reasons. First, it removes a deployment dependency–when you’re deploying several times a day it’s not great having to rely on more external services than necessary. Second, it means we don’t have to authenticate for installing private or Git-based gems from inside containers. Finally, it’s also much faster installing gems from the filesystem, using the –local flag to avoid hitting Rubygems altogether.

Dockefile.prod

FROM ruby:2.3.1-slim

# Create our app directory
RUN mkdir -p /app

RUN apt-get update && \
 apt-get install -y \
 build-essential \
 ...
 apt-get clean && \
 rm -rf /var/lib/apt/lists/ /tmp/ /var/tmp/

WORKDIR /app
COPY . /app

RUN bundle config build.nokogiri --use-system-libraries \
 && bundle install --local --without development test

RUN RAILS_ENV=production bundle exec rake assets:precompile

EXPOSE 8080

CMD ["rails", "server", "-b", "0.0.0.0", "-p", "8080", "--pid", "/tmp/rails.pid"]

For production we install our gems differently, skipping test and development groups and precompiling assets into the image.

Deployment

To release this image we tag it as the latest version, as well as the git SHA. This is then pushed to our private ECR.

We deliberately deploy that specific version of the image, meaning rolling back is as simple re-deploying a previous version from Jenkins.

Running in production

For running containers in production, we’re doing the simplest possible thing–using Docker to solve a dependency management problem only.

We’re running one container per node, using host networking and managing the process using upstart. When deploying we simply tell the upstart service to restart, which pulls the relevant image from the registry, stops the existing container and starts the new one.

This isn’t the most scalable or resource-efficient way of running containers but for a low-traffic internal app it’s a great balance of simplicity and effectiveness.

Next steps

One thing we’re still missing on production is downtime-less deploys. Amazon’s ECS handles this automatically (by spinning up a new pool of containers before automatically swapping them out in the load balancer) so we’re looking to move towards using that instead.

We’re still learning a lot about using Docker but so far it’s been a powerful, reliable and enjoyable tool to use for both developers and ops.

Ingredients for a healthy Android codebase

Getting started in Android development is pretty straightforward, there are plenty of tutorials and documentation provided by Google. But Google will teach you to build a tent, not a solid sustainable house. As it’s still a very young platform with a very young community, the Android world has been lacking some direction on how to properly architect an app. Recently, some teams have started to take the problem more seriously, with the shiny tagline “Clean architecture for Android”.

At Songkick, we had the chance to rebuild the Android client from scratch 7 months ago. The previous version was working very well but the codebase had not been touched for almost 3 years, which was leaving us with old practices, old libraries, and Eclipse. We wanted to take a good direction so we spent a week designing the general architecture of the app. So we tried to apply the following principles from Uncle Bob’s clean architecture:

Systems should be

  • Independent of Frameworks. The architecture does not depend on the existence of a particular library. This allows you to use such frameworks as tools, rather than having to design your system around their limited constraints.
  • Testable. The business rules can be tested without the UI, Database, Web Server, or any other external element.
  • Independent of UI. The UI can change easily, without changing the rest of the system. A Web UI could be replaced with a console UI, for example, without changing the business rules.
  • Independent of Database. You can swap out Oracle or SQL Server, for Mongo, BigTable, CouchDB, or something else. Your business rules are not bound to the database.
  • Independent of any external agency. In fact your business rules simply don’t know anything at all about the outside world.

…and this is what we ended up with:

Screen Shot 2016-02-25 at 14.21.45

Layers

Data layer

The data layer acts as a mediator between data sources and the domain logic. It should be a pure Java layer. We divide the data layer in different buckets following the repository pattern. In short, a repository is an abstract layer that isolates business objects from the data sources.

Screen Shot 2016-02-25 at 14.23.01

For example it can expose a searchArtist() method but the domain layer will not (and should not) know where the data is coming from. In fact one day we could swap the data source from a database to a web API and the domain layer will not see the difference.

When the data source is the Songkick REST API, we usually follow the format of the endpoint to know where data access belongs. That way we have a UserRepository, an ArtistRepository, an EventRepository, and so on.

Domain layer

The role of the domain layer is to orchestrate the flow of data and offer its services to the presentation layer. The domain layer is application specific, this is where the core business logic belongs. It is divided in use cases. A use case should not be directly linked to any external agencies and it should also be a pure Java layer.

Presentation layer

At the top of the stack, we have the presentation layer which is responsible for displaying information to the user.

That’s where things get tricky because of this class:

Screen Shot 2016-02-25 at 14.25.02

When I started developing for Android, I found that an Activity is a very convenient place where everything can happen:

  • it’s tied to the view lifecycle
  • it can receive user inputs
  • it’s a Context so it gives access to many data sources (ContentResolver, SharedPreferences, …)

Adding on top of that, most of the samples provided by Google have everything in an Activity, what could go wrong? If you follow that pattern I can guarantee that your Activity will be huge and untestable.

We took the decision to consider our activities/fragments as views and make them as dumb as possible. The view related logic lives in presenters that communicate with the domain layer. Presenters should only have simple logic related to presentation of the data, not to the data itself.

Models vs. View models

This architecture is moving a lot of logic away from the presentation layer but there is one last thing that we didn’t consider: models. Models that we get from the data sources are very rarely what we want to display to the user. It’s very common to do some extra treatment just before binding the data to the view. We’ve seen some apps that have 300 lines of code onBindViewHolder(), resulting in very slow view recycling. This is unacceptable, why would you want to add additional overhead to your process on the main thread? Why not move that overhead to the same background thread you used to fetch the data?

In the Songkick Android app, the presentation layer barely know what the original model is. It only deals with view models. A view model is the view representation of the content your data layer fetched. In the domain layer, each use case has a transformer that converts models to view models. To respect the clean architecture rules, the presentation layer provides the transformer to the domain layer and the domain layer uses it without really knowing what it does.

So say that you have the following Artist model:

Screen Shot 2016-02-25 at 14.32.42

If we just want to show the name and if the artist is on tour, our ArtistViewModel is as follow:

Screen Shot 2016-02-25 at 14.32.32

So that we can efficiently bind it to our view:

Screen Shot 2016-02-25 at 14.32.19

Communication

To communicate between these layers, we use RxJava by:

  • exposing Observables in repositories
  • exposing methods to subscribe/unsubscribe to an Observable that emits ViewModels in the use case
  • subscribing/unsubscribing to the use case in the Presenter

Structure

To structure our app we are using Dagger in the following way:

Screen Shot 2016-02-25 at 14.28.59

Repositories are unique per application as they should be stateless and shared across activities. Use cases and presenters are unique per Activity/Fragment. Presenters are stateful and should be linked to a unique Activity/Fragment.

We are also trying to follow the quote by Erich Gamma:

“Program to an interface, not an implementation”

  • It decouples the client from the implementation
  • It defines the vocabulary of the collaboration
  • It makes everything easier to test

Testing

Most of the pieces in this stack are pure Java classes. So they should be ready for unit testing without Robolectric. The only bit that needs Robolectric would be the Activity/Fragment.

We usually prefer testing the presentation layer with pure UI tests using Espresso. The good thing is that we can just mock the data layer to expose observables emitting entities from a JSON file and we’re good to go:

Screen Shot 2016-02-25 at 14.30.07

Of course there are drawbacks to only testing the domain and presentation layer without checking if it’s compliant with the external agencies, but we generally found that tests were much more stable and very accurate with that pattern. End-to-end tests are also valuable and we could imagine adding a separate category running through some important user journeys by providing the default sources to our data layer.

Conclusion

We’ve now run the new app for 4 months and it appeared to be very stable and very maintainable. We’re also in a great place with a good test coverage on both unit and UI tests. The codebase is pretty scalable when it comes to add new features.

Although it works for us, we are not saying that everyone should go for this architecture. We’re just at the first iteration of “Clean architecture” for Android, and are looking forward to seeing what it will be in the future.

Here’s a link to the talk I gave about the same topic: https://youtu.be/-oZswd1j5H0 (slides: https://speakerdeck.com/romainpiel/ingredients-for-a-healthy-codebase)

References

Uncle Bob’s clean architecturehttp://fernandocejas.com/2014/09/03/architecting-android-the-clean-way
https://github.com/android10/Android-CleanArchitecture
Martin Fowler – The repository pattern
Erich Gamma – Design Principles from Design Patterns

Move fast, but test the code

At Songkick we believe code only starts adding value when it’s out in production, and being used by real users. Using Continuous Deployment helps us ship quickly and frequently. Code is pushed to Git, automatically built, checked, and if all appears well, deployed to production.

Automated pipelines make sure that every release goes through all of our defined steps. We don’t need to remember to trigger test suites, and we don’t need to merge features between branches. Our pipeline contains enough automated checks for us to be confident releasing the code to production.

However, our automated checks are not enough to confirm if a feature is actually working as it should be. For that we need to run through all our defined acceptance criteria and implicit requirements, and see the feature being used in the real world by real users.

In a previous life we used to try and perform all of our testing in the build/test/release pipeline. Not only was this slow and inefficient, dependent on lots of different people to be available at the same time, but often we found that features behaved very differently in production. Real users do unexpected things and it’s difficult to create truly realistic test environments.

Our motivation to get features out to real users as quickly as possible drove our adoption of Continuous Deployment. Having manual acceptance testing within the release pipeline slowed us down and made processes unpredictable. It was hard to define a process that relied on so many different people. We treated everyday events such as meetings and other work priorities as exceptional events which made things even more delay-prone and frustrating.

Eventually we decided that the build and release pipeline must be fully automated. We wanted developers to be able to push code and know that if Jenkins passed the build, it was safe for them to deploy to production. Attempting to automate all testing is never going to be achievable, or desirable. Firstly, automated tests are expensive to build and maintain. Secondly, testing, as opposed to checking, is not something that can be automated.

When we check something we are comparing the system against a known outcome. For example checking a button launches the expected popup when clicked, or checking a date displays in the specified format. Things like this can be, and should be automated.

Testing is more involved and relies on a human making a judgement. Testing involves exploring the system in creative ways in order to discover the things that you forgot about, the things that are unexpected, or difficult to completely define. It’s hard to predict how time and specific data combinations will affect computer systems, testing is a good way to try and uncover what actually happens. Removing the constraint of needing fully defined expected outcomes allows us to explore the system as a user might.

In practical terms this means running automated checks in our release pipeline and performing testing before code is committed, and post release. Taking testing out of the release pipeline removes the time pressures and allows us freedom to test everything as deeply as we require.

Songkick's Test and Release Process

Songkick’s Test and Release Process

Small, informal meetings called kick-offs help involve everyone in defining and designing the feature. We discuss what we’re building and why, plan how to test and release the code, and consider ways to measure success. Anything more complicated than a simple bug fix gets a kick-off before we start writing code. Understanding the context is important for helping us do the right thing. If we know that there are deadlines or business risks associated then we’re likely to act differently from a situation than has technical risks.

Coming out of the kick-off meeting we know how risky we consider the feature to be. We will have decided on the best approach to testing and releasing the code. As part of developing the feature we’ll also write or update our automated checks to make sure we don’t break the feature further down the line. Our process is intentionally flexible to allow us to treat each change appropriately depending on risk and need to ship.

Consider a recently released feature to store promoter details against ticket allocations as an example. The feature kick-off meeting identified risks and we discussed what and how to test the feature. We identified ways to break down the work into smaller pieces that could be developed and released independently; each hidden behind a feature flipper to keep it invisible from real users.

Developers and testers paired together to decide on specific areas to test. The tester’s testing expertise, and the developer’s deep understanding of the code feed into an informal collection of test ideas based on risk. Usually these are represented in a visual mind map for easy reference.

The developers, guided by the mind map, tested the feature and added automated unit and integration tests as they went. Front-end changes were overseen by a designer working closely with one of the developers to come up with the best, feasible, design. Once we had all the pieces of the feature the whole team jumped in to do some testing, and update our automated acceptance tests.

The feature required a bit of data backfilling so the development team were able to use the functionality in production, in ways we expect real users to use it. Of course we found some bugs but by working with small releases we were able to quickly locate the source of the problem. Fast release pipelines allow fixes to be deployed within minutes, making the cost of most bugs tolerably low.

Once the feature had been fully released and switched on for all users we used monitoring to check for unexpected issues. Reviewing features after a week or two of real world usage allows us to make informed decisions about the technical implementation and user experience. Taking the time to review how much value features are adding allows us to quickly spot and respond to problems.

Testing a feature involves many experts. Testers must be on hand to aid the developers in their testing, often by creating a mindmap of test ideas to guide testing. We try to use our previous experience of releasing similar features to focus the testing on areas that are typically complex or easy to break. Designers and UX people get involved to make sure the UX works as hoped, and the design looks good on all our supported devices and browsers. Product managers make sure the features actually do what they want them to do. High risk features have additional deep testing from the test team and in certain cases we throw in some focused performance or security testing.

Most of our bugs come from forgetting use cases or not understanding existing functionality in the system. Testing gives us a chance to use the system in an investigative way to hopefully find these bugs. Moving testing outside of our release pipeline gives us space to perform enough testing for each feature whilst maintaining a fully automated, and fast, release pipeline.

Apple tvOS Tech Talks, London 2016

Apple tvOS Tech Talks
London 2016
by Michael May

opening-slide

As part of Apple’s plan to get more apps onto the Apple TV platform they instigated one of their irregular Tech Talks World Tours. It came to London on January 11th 2016 and I got a golden ticket to attend the one day event.

The agenda for the day was

Apple TV Tech Talks Kickoff
Designing for Apple TV
Focus Driven Interfaces with UIKit
Break
Siri Remote & Game Controllers
On-Demand Resources & Data Storage
Lunch
Media Playback
Leveraging TVML for Media Apps
Best Practices for Designing tvOS Apps
Break
Tuning Your tvOS App
Making the Most Out of the Top Shelf
App Store Distribution
Reception

All sample code was in Swift, as you might expect, but they made a point of saying that you can develop tvOS apps in Objective-C, C++, and C too. I think these are especially important for the gaming community where frameworks such as Unity are so important (despite Metal and SpriteKit).

I won’t go through each session, as I don’t think that really serves any useful purpose (the videos will be released, so I am told). Instead I’ll expand on some of my notes from the day, as they were the points I thought were interesting.

The day started with a brief intro session that included a pre-amble about how TV is so entrenched in our lives and yet so behind the times. This led into a slide that simply said…

future-of-tv

“The Future of TV is Apps”

That’s probably the most bullish statement of intent that I’ve heard from Apple, so far, about their shiny new little black box. I think that if we can change user behaviour in the coming months and years then I might agree (see my piece at the end).

Then they pointed out that, as this is the very first iteration of this product, there are no permutations to worry about – the baseline for your iOS app might be an iPhone 4S running iOS 8 but for tvOS it’s just the latest and greatest – one box, one OS.

This is a device for which you can assume

  • It is always connected (most of the time)
  • It has a high speed connection (most of the time)
  • It has a fast dual-core processor
  • It has a decent amount of memory
  • It has a decent amount of storage (and mechanisms for maintaining that)

They then went on to explain that the principles for a television app are somewhat different from a phone app. Apple specifically called out three principles that you should consider when designing your app.

  • Connected
    Your users must feel connected to the content of your app. As your app is likely some distance from the user, with no direct contact between finger and content, this is a different experience from touching the glass of an iPhone UI.
  • Clear
    Your app should be legible and the user should never get lost in the user interface. If the user leaves the room for a moment then comes back, can they pick up where they left off?
  • Immersive
    Just like watching a movie or TV series, your app should be wholly immersive whilst on-screen.

If you had said these things to me casually, I would have probably said, “well, yeah, obviously” but when you have it spelled out to you, it gives you pause for thought;

“If I did port my app, how would I make an experience that works with the new remote and also makes sense on everything from a small flat-screen in a studio flat to an insanely big projector in a penthouse.”

Add to that the fact that the TV is a shared experience – from watching content together to just allowing different users to use your app at different times – it’s not the intimate experience we have learned to facilitate on iOS. It should still be personal, but it’s not personal to the same person all the time. Think of Netflix with their user picker at startup, or the tvOS AirBnB app with it’s avatar picker at the bottom of the screen.

Next was the Siri Remote and interactions via it. This is one complex device packed in a deceptively small form factor – from the microphone to the trackpad, gyroscope and accelerometer, this is not your usual television remote. We can now touch, swipe, swing, shake, click and talk to our media centre. The exciting thing for us as app developers is that almost all of this is open for us to use, either out of the box (for apps) or as custom interactions from raw event streams (particularly useful for games).

As you might expect from Apple, they were keen to stress that there were expectations for certain buttons that you should respect. Specifically, the menu and play/pause buttons. I like that they are encouraging conformity – it’s very much what people expect from Apple, but found it a bit silly when demonstrating how one might use the remote in landscape as a controller for a racing game. This, to me, felt a bit like dogma. If you want this to become a great gaming device, accept the natural limitations of the remote and push game controllers as the right choice here. Instead they kept going on about the remote and controllers being first class citizens in all circumstances.

Speaking to an indie game developer friend about the potential of the device, he said that he would really like three things from Apple, at least, before hopping on board;

  • Stats on Apple TV sales to evaluate the size of the market
  • A games pack style version that comes with two controllers to put the device on a par with the consoles
  • Removal of the requirement to support the remote as an option in games. Trying to design a game that must also work with the remote is just too limiting and hopefully Apple will realise this as they talk to more games companies.

A key component of the new way of interacting with tvOS (versus iOS) is the inability to set the focus for the user. Instead you guide the “focus engine” as it changes the focus for the user, in response to their gestures. This gives uniformity, again, and also means that apps cannot become bad citizens and switch the focus under the user. One could imagine the temptation to do this being hard to resist for some kinds of apps – breaking news or the latest posts in a social stream, perhaps.

Instead you use invisible focus guides between views and focusable properties on views to help the engine know what the right thing to do is. At one point in the presentations the speaker said

“Some people think they need a cursor on the Apple TV…they are wrong”

It seems clear to me that the focus engine is designed specifically to overcome this kind of hack and is a much better solution. If you’ve ever tried to use the cursor remote on some “Smart” TV’s then you’ll know how that feels. If not, imagine a mouse with a low battery after one too many happy hour cocktails.

With the expansive, but still limited resources of the Apple TV hardware, there will be times when there simply is not enough storage for everything that the user wants to install. The same in fact, holds true for iOS already. Putting aside my rant about how cheap memory and storage are and how much Apple cash-in on both by making them premium features, their solution is On-Demand Resources (ODR).

With ODR you can mark resources as being one of three types which change when, and if, they are downloaded, and how they may be purged under low resource conditions. Apple want you to bundle up your resources (images, videos, data, etc, but not code) into resource packs and to tag them. You tag them as either

  • Install
  • Prefetch
  • Download only on demand

Install come bundled with the app itself (splash screen, on-boarding, first levels, etc). Prefetch are downloaded automatically, but after launching the app and on demand are as you might expect – on demand from the app. On demand can be purged, using various heuristics as to how likely they are to affect the user/app – things like last accessed date and priority flags.

Although not talked about that much as far as I can tell, to me TVML is one of the big stories of tvOS. Apple have realised that writing a full blown native app is both expensive and overkill for some. If you’re all about content then you probably need little more than a grid of content to navigate, a single content drill down view and some play/pause of that streaming content. TVML gives you an XML markup language, powered by a JavaScript engine, that vends native components in a native app. It can interact with your custom app code too, through bridges between the JavaScript DOM and the native wrapper. This makes a lot of sense if you are Netflix, Amazon Prime Video, Mubi, Spotify or, as they pointed out, Apple Music and the tvOS App Store.

It highly specific but its highly specific to exactly the type of content Apple so desperately need to woo and who are likely wondering if they can afford to commit time and effort to an untested platform. As we’ve seen with the watchOS 2, developers are feeling somewhat wary of investing a lot of time in new platforms when they also have to maintain their existing ones, start moving to Swift, adopt the latest iOS 9 features, and so on.

I think this is a big deal because what Apple are providing is what so many third parties have been offering for years, to differing degrees of success. This is their Cordova, their PhoneGap or, perhaps most closely, their React Native. This is a fully Apple approved, and Apple supported, hybrid app development solution that your tranche of web developers are going to be able to use. If this ever comes to iOS it could open up apps to developers and businesses that just cannot afford a native app team, or the services of an app agency (assuming your business is all about vending content and you can live with a template look and feel). I think this could be really big in the future and in typical Apple fashion they are keeping it very low key for now.

They kept teasing that we were all there to find out how to get featured (certainly people were taking more photos there than anywhere else) but before that they spoke about tuning your apps for the TV. This included useful tricks and tips for the well documented frustrations of trying to enter text on the tvOS remote (make sure to mark email fields as such  – Apple will offer a recently used email list if you do) to examples of using built-in technologies to share data instead of asking the user to do work.

To the delight of my friends who work there, they demonstrated the Not On The High Street App and it’s use of Bonjour to discover the users iPhone/iPad and push the product they want to sell into the basket of the app on that platform. From there the user can complete their purchase very quickly – something that would be fiddly to do on the TV (slow keyboard, no Apple Pay, no Credit Card scanner).

Next came another feature that I think could hint at new directions for iOS in the future – the top shelf. If the user choses to put your app in the top row of apps, then, when it’s selected, that app gets to run a top shelf extension that populates the shelf with static or dynamic image content. This is the closest thing to a Windows Phone live tile experience that we’ve seen so far and, as I say, I think it could signpost a future “live” experience for iOS too. A blend of a Today Widget and a Top Shelf Widget could be very interesting.

Finally came the session they were promising; App Store Distribution. The key take-aways for me were

  • Don’t forget other markets (after the US the biggest app stores are Japan, China, UK, Australia, Canada and Germany)
  • Keep your app title short (typing is hard on tvOS)
  • Spend time getting your keywords right (and avoid wasting space with things like plurals)
  • Let Apple know 3-4 weeks before a major release of your app (appstorepromotion@apple.com)
  • Make your app the very best it can be and mindful of the tvOS platform

top-ios-markets

Then it was on to a reception with some delicious canapés and a selection of drinks. This wasn’t what made it great though. What made it great were all the Apple people in the room, giving everyone time who wanted it. This was not the Apple of old and it was all the better for it. The more of this kind of interaction they can facilitate the stronger their platform will be for us.

The Future of TV is Apps?

I think the future of consumer electronics is a multi-screen ecosystem where the user interface and, of course the form factor itself, follows the function to which it is in service.

Clearly, the television could become a critical screen in this future. I believe that, even as we get new immersive entertainment and story-telling options (virtual reality, 3D, and who knows what else), the passive television experience will persist. Sometimes all you want to do is just sit back and be entertained with nothing more taxing than the pause button.

A TV with apps allows this but also, perhaps, makes this more complex. When all I want to do is binge on Archer, a system with apps might not be what I want to navigate. That being said, if all I want to do is binge on Archer, and this can be done with a simple “Hey Siri, play Archer from my last unplayed episode”, then it’s a step ahead of my passive TV of old. It had better know I use Netflix and it had better not log me out of Netflix every few weeks like the Fire TV Stick does.

If I then get a notification (that hunts for my attention from watch to phone to television to who knows what else) that reminds me I have to be in town in an hour and that there are problems on the Northern Line so I should leave extra time, I might hit pause, grab my stuff and head out. As I sit on the tube with 20 minutes to kill, I might then say “Hey Siri, continue playing Archer”.

Just as I get to my appointment I find my home has noticed a lack of people and gone into low power mode, via a push notification. If I want, I can quickly reply with my expected arrival home time, so that it can put on the heating in time and also be on high alert for anyone else in my house during that period.

I suspect most of these transactions are being powered by apps, not the OS itself, but I do not expect to interact with the apps in most cases anymore. Apps will become simply the containers for the means of serving me these micro-interactions as I need/want them.

One only has to look at the Media Player shelf, Notification Actions, Today Widgets, Watch Apps, Glances, Complications, 3D Touch Quick Actions, and now the tvOS Top Shelf to see that this is already happening and will only increase as time goes on. Your app will power multiple screen experiences and be tailored for each, with multiple view types, and multiple interactions. Sometimes these will be immersive and last for minutes or hours (games, movie watching, book reading, etc) but other times these be will be micro-interactions of seconds at most (reply to a tweet, check the weather, plan a journey, start a music stream, buy a ticket, complete a checkout). Apps must evolve or die.

That situation is probably a few years off yet, but in the more immediate term, if we want the future of TV to be apps (beyond simply streaming content) then users will need to be persuaded that their TV can be a portal to a connected world.

From playing games to checking the weather to getting a travel report, these are all things for which an apps powered TV could be very useful. It’s frequently on, always connected, and has a nice big screen on which to view what you want to know. Whether users find this easier than going to pick up their iPhone or iPad remains to be seen.

I think Apple see the Apple TV as a Trojan horse. Many years ago, Steve Jobs introduced the iMac as the centre of your digital world; a hub into which you plugged things. I think the Apple TV is the new incarnation of that idea – except the cables have now gone (replaced with the likes of HomeKit, AirPlay and Bonjour), the storage is iCloud and the customisation is through small, focused apps, and not the fully fledged applications of old.

It’s early days and if the iPhone has taught us anything it’s that the early model will rapidly change and improve. Where it actually goes is hard to say, but where it could go is starting to become clear.

Is the future of the TV apps? Probably so, but probably not in the way we think of apps right now. The app is dying, long live the app.

tour-pass

 

Posted in iOS

Recent talks on Songkick Engineering

Since I joined Songkick a little over four years ago, our development team has done some amazing things. Our technology, process and culture have improved an enormous amount.

We’ve always been eager to share our progress on this blog and elsewhere, and we often talk about what we’ve learned and where we are still trying to improve.

Here are some recent talks given by members of our team discussing various aspects of how we work.

How we do product discovery

A few weeks ago, I gave a talk at the Future of Web Apps conference on how we do product discovery at Songkick. I had such an overwhelming response to it that I thought it might be useful to share it with the rest of the world.

Apologies for the whack formatting, but SlideShare doesn’t support Keynote files and I didn’t have time to redo the slides in PowerPoint to include the notes in a better way.

I’d love to hear how you guys go about product discovery and any tips / tricks on how to do it better.

Testing iOS apps

We recently released an update to our iPhone app. The app was originally developed by a third-party, so releasing an update required bringing the app development and testing in-house. We develop our projects in a continuous build environment, with automated builds, unit and acceptance tests. It allows us to develop fast and release often, and we wanted the iPhone project to work in the same way.

This article covers some of the tools we used to handle building and testing the app.

Build Automation

We use Git for our version control system, and Jenkins for our continuous integration server. Automating the project build (i.e. building the project to check for compilation errors) seemed like a basic step and a good place to start.

A prerequisite to this was to create a Mac Jenkins Build Slave, which is outside of the scope of this blog post (but if you’re interested, I followed the “master launches slave agent via SSH” instructions of the Jenkins site).

A quick search of Jenkins plugins page revealed a Xcode plugin which allows for building Objective-C applications. Setting up the plugin was a snap – search and install the “XCode integration” plugin from the Jenkins server plugin page, point the plugin to your project directory on the build slave, enable keychain access, and save.

Now for every commit I made to the project, this task would automatically run, and send me a rude email if project compilation failed. In practice I found that this was an excellent way of reminding me of any files I had forgot to check-in to Git; the project would compile on my laptop but fail on the CI server due to missing classes, images, etc.

Unit testing

I looked briefly into the unit testing framework Apple provides, which ships with Xcode. I added a unit test project to the Songkick app, and looked into creating mocks using OCMock, an Objective-C implementation of mock objects.

We already have fairly extensive API tests to test for specific iPhone-related user-flows (such as signing up, tracking an artist, etc), and due to time constraints we opted to concentrate on building acceptance tests, and revisit unit tests if we had time.

Acceptance Testing

There are a bunch of acceptance testing applications available for iOS apps. Here’s a few of the tools I looked into in detail:

Frank

Frank is an iOS acceptance testing application which supports a Cucumber-style test syntax. I was interested in Frank as we already make use of Cucumber to test our Ruby projects, so the familiarity of the domain-specific language would have been a benefit.

I downloaded the project and got a sample test up-and-running fairly quickly. Frank ships with some useful tools, including a web inspector (“Symbiote”) which allows for inspecting app UI elements using the browser, and a “Frank console” for running ad-hoc commands against an iPhone simulator from the command line.

Frank seems to be a pretty feature rich application. The drawbacks for me were that Frank could not be run on real hardware (as of March 2013, this appears to now be possible), and Frank also requires recompiling your application to make a special “Frankified” version to work with the testing framework.

Instruments

Apple provides an application called Instruments to handle testing, profiling and analysis of applications written with Xcode. Instruments allows for recording and editing UIAutomation scripts – runnable JavaScript test files for use against a simulated iOS app or a real hardware install.

InstrumentsRecordingTest

Being able to launch your app with Instruments, perform some actions from within the app, and have those actions automatically converted into a runnable test script was a really quick and easy way of defining tests. Instruments also supports running scripts via the command line.

The drawback of test scripts created with Instruments is that they can be particularly verbose, and Instruments does not provide a convenient way of formatting and defining individual test files (outside of a single UIAutomation script per unique action).

Tuneup_js

Designed to be used as an accompaniment to UIAutomation scripts created using Instruments, Tuneup_js is a JavaScript library that helps to ease the pain of working with the long-winded UIAutomation syntax.

It provides a basic test structure for organising test steps, and a bunch of user-friendly assertions built on top of the standard ones supported by Instruments.

tuneup

I found that recording tests in Instruments, and then converting them into the Tuneup_js test syntax was a really quick way of building acceptance tests for iOS apps. These tests could then be run using a script provided with the Tuneup_js package.

Scenarios

I settled on using Instruments and Tuneup_js to handle acceptance testing. Instruments because of the ability to quickly record acceptance test steps, and Tuneup_js because it could be used to wrap recorded test steps into repeatable tests and allowed for a nicer test syntax than offered out-of-the-box with UIAutomation. What was missing with these applications was a way to handle running the test files in an easily repeatable fashion, and against the iOS simulator as well as hardware devices.

I couldn’t find an existing application to do this, so I wrote Scenarios (Scenar-iOS, see what I did there?) to handle this task. Scenarios is a simple console Ruby app that performs the following steps:

  • Cleans any previous app installs from the target test device
  • Builds the latest version of the app
  • Installs the app on the target test device
  • Runs Tuneup_js-formatted tests against the installed app
  • Reports the test results

Scenarios accepts command-line parameters, such as the option to target the simulator or a hardware device (with the option of auto-detecting the hardware, or supplying a device ID). Scenarios also adds a couple of extra functions on top of the UIAutomation library:

  • withTimout – Can be used for potentially long-running calls (e.g. a button click to login, where the API call may be slow):
    withTimeout(function(){
      app.mainWindow().buttons()["Login"].tap();
    });
  • slowTap – Allows for slowing-down the speed at which taps are executed. Instruments can run test steps very fast, and sometimes it helps to slow down tests to see what they are doing, and help create a more realistic simulated user experience:
    app.toolbar().buttons()["Delete"].slowTap();

Scenarios ships with a sample project (app and tests) that can be run using the simulator or hardware. Here’s a video of the sample running on a simulator:

Jenkins Pipeline

Now I had build and acceptance tests in place, it was time to hook the tests up to Jenkins. I created the following Jenkins projects:

  • “ios-app” – runs the build automation
  • “ios-app-acceptance-tests-simulator” – runs the app (via Scenarios) on a simulator
  • “ios-app-acceptance-tests-iPhone3GS” – runs the app (via Scenarios) on an iPhone 3GS

jenkins-pipeline

Committing a code change to the iOS app Git repo caused the projects in the Jenkins pipeline to build the app, run the acceptance tests against the simulator, and finally run the acceptance tests on an iPhone 3GS. If any stage of the pipeline failed, I received an email informing me I had broken something.

test-iphone

Manual testing with TestFlight

As well as an automated setup, we also made use of the excellent TestFlight service, which enables over-the-air distribution of apps to testers. We had 12 users and 16 devices set up in TestFlight, and I was releasing builds (often daily) over-the-air. It enabled us to get some real-user feedback on the app, something that build and acceptance tests cannot replace.

Jenkins also has a TestFlight plugin, which enables you to automatically deploy a build to TestFlight as part of the pipeline. Very cool, but as we were committing code changes often throughout the day (and only wanted to release to TestFlight once a day), we decided to skip this step for the time being.

Overall, I think that the tools (both open-source and proprietary) available today for automated testing of iOS apps are feature rich (even if some are still in their infancy), and I’m pretty happy with our development setup at Songkick.

Migrating to a new Puppet certification authority

At Songkick all our servers are managed using Puppet, an open source configuration management tool. We use it in client-server mode and recently had the need to replace the certification authority certificates on all our nodes. I couldn’t find much information on how to do this without logging onto every machine, so I’ve documented my method.

What is this Puppet CA anyway?

If you’re using puppet in its typical client-server or agent-master setup, then when the puppet master is first started it will create a certification authority (CA) which all clients that connect to it need to be trusted by and must trust. This usually happens transparently, so often people aren’t aware that this certification authority exists.

The CA is an attempt to have trust between the agents and the master, so that an attacker cannot set up malicious puppet masters and tell puppet agents to do his or her bidding and also prevent malicious clients being able to see configuration data for other clients. Agents should only connect to masters that have certificates signed by its CA and masters should only send configuration information to clients that have certificates signed by the same CA.

There’s a more comprehensive explanation of Puppet SSL written by Brice Figureau which goes into far more detail than we have space for. The main thing to understand is that the CA is an important part of maintaining security and that you can only have one CA across a set of machines that access the same secured resources.

Why would I want to migrate to a new CA?

  • Your current CA certificate is about to expire. By default, CA certificates have a validity period of 5 years, so fairly early adopters of puppet will need to replace them.
  • You’ve had multiple CAs in use and need to consolidate on one.
  • You believe that your certificates and private keys are in the hands of people who could cause mischief with them.
  • You have fallen foul of bugs relating to the fact that you use a CA created in an older version of puppet.

It was in fact the second of these reasons that applied to Songkick; we’d previously been using multiple puppet masters, each with their own CA. We wanted to start using exported resources, stored in the same PuppetDB instance for all nodes. This meant that each master needed to be trusted by the same CA that signed the PuppetDB instance; hence we needed to consolidate on one CA.

How do I do it?

Set up NEW puppet master(s)

Set up at least one new puppet master server, with a new CA certificate.

If you have a lot of existing hosts managed by puppet, then it’s worth considering enabling the autosign option, even if only temporarily, as you’ll have a number of certificate requests to approve manually otherwise.

Configure AGents to connect to THe new master(S)

We’re assuming here that you’re managing the puppet agent configuration through puppet. and that changes to the puppet configuration cause an automatic restart of the puppet agent.

Change the configuration of your puppet agents, to connect to the new master(s) and use a different ssldir:

[main]
server = <new server hostname> 
ssldir = /var/lib/puppet/ssl2

Be careful not to apply this change to your newly created puppet master.

Your clients should reconfigure themselves, restart and when they start up, connect to your new puppet master, forgetting their old ssl configuration, including the CA certificates.

If you have autodiscovery records for puppet in DNS, e.g. an A record for ‘puppet’ or the SRV records, then you should leave them in place for now. Agents that have not been migrated to the new CA may need it.

It is a good idea to test this on a handful of nodes and check that it works in a completely automated fashion before applying to every node.

Tidying up (part 1)

Once every node has checked in with the new master and been issued with a new certificate, it’s time to start the process of tidying up. It’s a good idea to revert back to using the default ssldir, so that when agents bootstrap themselves with the default config, they do not then switch to the new ssldir and thus forget their old certificates. This will cause the master to refuse to talk to them, as this looks like a spoofing attempt.

On each client, we mirror the new ssldir to the old one:

file { '/var/lib/puppet/ssl': 
  source => 'file:///var/lib/puppet/ssl2',
  recurse => true, 
  purge => true, 
  force => true, 
}

Be careful not to apply this change to your newly created puppet master.

Tidy up (part 2)

Once that’s shipped everywhere, we remove the ssldir configuration, fall back on the default ssldir and remove the above resource definiton to copy the ssldir.

Tidy up (part 3)

You can now update your autodiscovery DNS entries, to point to the new servers and remove the autosign configuration, if desired.

Finally, we ship a change to the clients that removes the temporary /var/lib/puppet/ssl2 directory.

And that’s it, everything has been migrated to the new CA, with no need to do anything outside of puppet.

Testing your database backups: the test environment database refresh pattern

When did you last try restoring your database backups? A month ago, a week ago? A year ago? Never? When was the last time you refreshed the data in your test environments? When I joined Songkick, one of the first things I asked was when we last tested a restore of our database backups. The answer, pleasingly, was at 03:00 UK time that morning and not coincidentally, that’s when we last refreshed the data in our test environments.

Here’s how we get the warm and fuzzy feeling of knowing that our backups contain data that can be restored and makes sense.

  1. Every morning, our database servers run their scheduled backups, copying the resulting images to a backup server in the data centre.
  2. Overnight those backups get copied to the office, giving us an offsite copy.
  3. In the small hours, when most of us are asleep, each of the database servers in our staging environment retrieve the backups, erase their local data files and then restore the production backups over the top of them.
  4. We perform sanitisation on the data, to make it suitable for use in a testing environment.
  5. And finally, and most importantly, we use the databases in our testing.

By doing this, we identified one case when our backups seemed to work, produced plausible looking backups, but MySQL failed to apply InnoDB log records during recovery. It was inconvenient to discover this problem in our staging environment, but far less inconvenient than discovering it when we needed the backups to put our production system back into operation.

Here are some practical tips based on our experience implementing and managing this system at Songkick:

Back all databases up at the same time

If your system is composed of services backed by independent databases on different machines, it’s possible that there’s some implicit consistency between them. For example, a common situation at Songkick is to have an accounts service responsible for storing user accounts and another service that stores user data keyed against a user, then there’s an expectation that those databases have some degree of consistency.

If you back them up at different times, you’ll find inconsistencies, that a service might have a reference to a user that doesn’t yet exist. If the ID of the user is exposed to other services and that ID can be reused, you may find that newly created users in your test environment have existing data associated with them and this can cause significant problems in testing.

It’s worth noting that, in the case of a production restore, these issues would need to be diagnosed and solved in the heat of the moment. By finding them in your test environment, you’re giving yourself the space to solve them earlier, under less pressure.

Design the backups to be regularly exercised

Some types of backups are more amenable to being restored regularly in test environments. For example, our initial MongoDB database backups performed snapshots of our MongoDB database path. These proved difficult to restore, because they included local databases which contained information on replica set membership. This means that on startup, our staging MongoDB server would forget its existing replica set membership and try to talk to the production servers instead.

We switched to using mongodump to take a logical export of the database, simply so that we could restore it on the primary member of our existing staging replica set and update the entire replica set.

Sanitisation tips

After we’ve restored the databases, there are certain things we do to make them safe and usable in our testing environments.

  • Remove or obfuscate email addresses. We’re not fond of accidentally emailing people with test events we’ve created in staging, so we change people’s email addresses to be unusable, so that can’t happen. We leave people’s email addresses alone if they work at Songkick, so we can test email features by emailing ourselves.
  • Remove or obfuscate payment tokens. If it’s uncool to accidentally email people, accidentally charging them is positively hostile. Anything that’s used for payment needs to be removed.
  • Fix or replace information about the environment. It’s best to avoid keeping references to your technical environment in the same database as your application data, but sometimes it’s tricky to workaround. For example, our MogileFS installation needs to be kept in sync with our production one, to avoid problems with missing media. This means that we need to manually update the database to substitute the hostnames of the mogilefs servers.

Write code that can withstand the database going away

Unless you’ve put some work in, almost no database driver will gracefully handle the disappearance of a database server and then its re-appearance some time later. If the restore in your test environment is the first time you’ve tried this, you may find that you need to manually restart services, even after the database re-appears on the network.

The solution will vary depending on the database client being used, but often it’s a case of catching an exception, or changing some options when you establish the connection.

By making your applications reconnect to the database with no manual input, you are again fixing a problem that will eventually occur in production – a much more stressful time for it to be diagnosed and fixed.

Summary

Testing your database backups by restoring them automatically and regularly in your test environments is a great way to battle-harden your backups and applications and to make sure that your test environment looks like the real production environment.


If you’ve liked what you’ve read, why not head over to our jobs page? We’re looking for a Systems Engineer to add more touches like these to our infrastructure.