Optimising platform spend with cluster autoscaling

Eric Escalante
Songkick Tech
Published in
7 min readMay 26, 2023

--

This post was written originally by Phil Hope while we worked together in the Platform Team — I just helped with a bit of editing.

Photo by Growtika on Unsplash

In this blog post, we’ll explore how we leveraged cluster autoscaling capabilities to dynamically allocate resources as needed, resulting in more efficient use of infrastructure and lower costs. Let’s get started!

Background

Kubernetes workloads at Songkick can be categorised into three types:

  • Applications: Front and Backend Services
  • Daemons: Continuous processes such as campaign subscribers
  • Cron Jobs: Regularly scheduled actions such as backups and data transformations.

In the case of both application and daemon pods, processes are running continuously. However, the pods launched by cron jobs run specific tasks and are terminated on completion. These tasks include our ETL pipeline (detailed in Eric’s blog post), event data ingestion from 3rd parties, and scheduled internal maintenance tasks.

On investigation into our workload performance and spend, it was clear we were faced with two problems: first, we were overspending for portions of the day when no cron jobs were running. Jobs would launch for specific windows in the day, and we would always need to provision the total nodes needed at peak for this. Secondly, on periods of high application demand, pods would compete for resources which meant some cron-related pods would be terminated because of their lower priority.

After looking into migrating to a serverless approach such as CloudRun or a fully automated cluster like GKE Autopilot, we settled on GKE Cluster Autoscaling (CA) since it required the least amount of change to our existing deployment and infrastructure codebases, while at the same time giving us flexibility of only paying for what we actually used.

Cluster Autoscaling — How does it work?

CA adjusts the size of the cluster based on the demand for resources. It monitors this demand and adjusts the number of nodes accordingly based on metrics such as CPU and memory use, as well as network traffic. CA registers a decision status based on the satisfaction of these rules, and we can configure behaviour to scale up or down based on any of these metrics, or a combination within the limit of available nodes on the cluster.

CA creates a watch on the K8s API server and polls for unschedulable pods every 10 seconds. In the case of a satisfied scale up decision, a new node will be launched using a template and a check run to ensure the new machine has the available capacity for the pod. If all conditions are satisfied, it launches that workload.

In the case of scale down, a similar watch polls for unneeded nodes each 10 seconds. If the sum of CPU/Memory is less than 50% of allocatable resource on that node it triggers a scale down process, in which existing pods are moved to another node with availability and the original is marked for deletion. If no new workloads are assigned to this node after 10 minutes it is terminated.

An alternative to this approach would be to control our workload types using node pools and configuring pod affinity to always launch and scale within those pools. This is an approach we were already making use of to configure varying machine types based on performance needs, e.g. high memory nodes for memory intensive notifications processes.

We did consider scaling pools up and down based on schedule for cron workloads. However, this would involve greater management overhead to ensure pools scaled as expected and scripting actions manually, rather than them being automatically reactive. With CA being a well trodden path for exactly our aim of cost reduction, and given the flexibility of configuration and ease of integration with our existing stack, it was ultimately the obvious choice.

Implementation

Photo by Taylor Vick on Unsplash

Provisioning a cluster

We began with Terraform, as our application cluster is created as code in the same way. This gave the perfect opportunity to refactor and introduce a lightweight and generic module that allows users to create Kubernetes clusters. We introduced the ability to configure cluster features on module calls such as toggling auto scaling and logging, configuring logging levels or IAM policies from a single point. This also meant testing was simplified, as a new cluster could be created and terminated with a small footprint.

We provisioned the new cluster for Songkick cron jobs with a single node group, and upper and lower count bounds which took some trial and error to balance. It’s important to note that, in our case, using GKE and a persistent Luigi deployment on our cluster, we always need a single node running making our lower count 1. For a max count, conservative configuration is not so important as with GKE you pay for the nodes you use, not the potential for CA to scale up.

We currently balance max node count at around 30% over average max scale, which can still be performant for any surges that might occur out of hours.

In order to maximise cost efficiency we tested several machine types for the cluster including high-CPU and high-memory types, using loads simulating the resource requests for both ETL cron jobs and our 3rd Party APIs ingestions. The most well balanced between cost and performance for our use case was actually the default e2-standard-4 machine, which comprises 4 vCPUs and 16GB Memory.

Does it Scale?

In initial tests we observed new nodes being added to the cluster and pods scheduled correctly onto them, however they would not initially scale down as expected. This is where Pod Disruption Budgets (PDBs) come to the rescue.

When a node needs to be terminated, Kubernetes attempts to reschedule the affected pods on other nodes. However, if there are not enough resources available on other nodes, or if rescheduling is taking longer than expected, it’s possible for pods to become unavailable. In the context of autoscaling, when there is a sudden increase in traffic or workload as nodes are added or removed, there is a risk of pods becoming unavailable due to rescheduling delays or resource constraints. PDBs help mitigate this risk by ensuring pods are not disrupted beyond a certain threshold. By setting these for critical workloads, we can ensure they remain available even as the cluster size changes.

As CA polls for a decision to scale down, we were noticing the log message “NotEnoughPDB”. In the same way PDBs protect applications from disruption, they also signal to the autoscaler whether disruptions are allowed. As shown in the image below, we needed to add theses not only to our workloads but also to the kube-system pods responsible for the system processes such as DNS and pod scheduling metrics collection:

kube-system Pod Disruption Budgets

After this addition we were greeted with the behaviour we originally expected as our cluster successfully scaled down to a single node after 10 minutes of inactivity. Success!

Performance

Below you can see an example of usage after migration, with our cluster scaling frequently during the running of jobs with a maximum of eight cores here (four nodes). Due to the daily schedule of varying processes, there’s a distinctive pattern of usage: a peak during late afternoon, a constant six nodes overnight, and a fluctuating load throughout the rest of the day.

For this iteration we didn’t modify any of the cron job schedules from their original as the business expects data to be loaded at certain times. One potential improvement for the future would be to group schedules to maximise node usage in one scale up and remain at a minimal scale once complete.

Cluster monitoring of node autoscaling

Conclusion

We were now left with the ability to scale down our original cluster’s total nodes. Bearing in mind at maximum scale in our crons cluster we currently use five nodes, we were able to safely remove the total number of nodes running on the original cluster by 15, saving hundreds of dollars a month. Furthermore, by isolating this type of workload in its own cluster, we are able to test and make changes without the fear of affecting our critical services, giving us greater confidence in progression and incident management.

This work has also come with cross functional benefits in that the overall availability and reliability of our platform has improved. By automatically scaling the number of nodes based on demand, we can ensure our workloads are always running on nodes with available resources, reducing the risk of downtime and any need for us to manually intervene.

Hopefully you have gained some insight into the process of applying GKEs autoscaler and how it might help you. Overall I would recommend this solution if, like us, you are a small platform team with limited resources. While this post has focused on our cron job architecture, this solution is potentially an improvement for all workloads running on our Kubernetes clusters and something we’ll be investigating next!

Keep an eye on our tech blog for more reads like this.

--

--