Now that a little more than half of 2017 has gone by, I want to talk about some of infrastructure changes we put in place beginning December 2016, based on our experience of running clevertap.com on AWS for the past three years. These changes were prompted by the requirement to support clevertap.com in multiple geographical data-centers for compliance requirements of some of our large customers and the need be agile. On interesting days, we handle a 4x increase in incoming write traffic, going from 33K req/sec to 166K req/sec within twenty minutes.
Background Context
Most of our AWS infrastructure was setup in May 2014. Back then, I was aware of CloudFormation but didn’t go with it — so it was clickety-click all over the console to setup the components – AutoScaling groups, Launch configurations, ELB, subnets, routes, security groups, etc. Inside the EC2 instance, Puppet was our tool for configuration management. Nagios (also managed by Puppet via exported resources) faithfully monitored everything for us.
Fast forward two years, here’s what we learnt and how we implemented the same infrastructure again.
From Clickety-Click to CloudFormation
When tasked with setting up a parallel data center to support a Fortune 500 customer, there was no way in hell I was going to click all over the console again. Besides, I knew this was the tip of the iceberg. At the rate at which we were growing, more customers would ask for their data to be hosted in geographically local data-centers. We had to have a way to reliably reproduce all of our AWS configuration.
This is when we got really serious about CloudFormation.
Fortunately, the learning curve for Colombia Phone Numbers List CloudFormation is not so steep. Don’t be scared by looking at sample CF templates, especially if you are just beginning with it — they just look really long. If you have used the console and are familiar with AWS concepts, CF just feels like an alternative to clickety-clicks aka infrastructure-as-code. Within my first hour with CF, I had the VPC provisioned. Over the next week, we managed to port all of our AWS resources into a template. At first it made sense to have everything in a single template until you hit the maximum number of resources – 200. At this point we were forced to break our dearly loved template into independent stacks
From Puppet to Docker Containers
Bootstrapping an instance during a scale up event with Puppet turned out to be slow for us. It’s not that Puppet is slow — it just didn’t fit right in our use-case. There were too many moving parts we were getting Puppet to put together for us. This is where Docker fit right in. We create a container with the application’s executing environment and the application, check it into ECR and during a scale up event, it’s checked out by CodeDeploy and boom! – it’s up and running, ready to serve. We use standalone docker coupled with docker-compose and host networking to keep things simple and fast. This has considerably improved our confidence in AutoScaling. I could be sleeping or sitting on a beach in Goa sipping beer and AutoScaling reliably manages traffic 🙂
Freestyle Jenkins Jobs to Scripted Pipelines to Declarative Multi-branch Pipeline
Freestyle Jenkins Jobs to Scripted Pipelines to Declarative Multi-branch Pipeline
From Freestyle Jenkins Jobs to Scripted Pipelines to Declarative Multi-branch Pipeline Jobs
Our journey with Jenkins begun with freestyle jobs — the ones where you put in shell commands and it reliably executes them, every single time. We had two jobs per application, one to build and the other to deploy. Of course, this is no CI/CD. Since then, we’ve moved to scripted pipeline jobs. This was a step in the right direction, code and Jenkins instructions lived together. More importantly, job configuration was versioned controlled. In our latest iteration, we have moved from scripted to declarative multi pipeline jobs. Most of the heavy lifting is done in libraries. This gives us clean looking Jenkins files which are easier to read and understand by a larger audience.
From SSH Loops for Deployment to CodeDeploy
Deploys were handled by bash for-loops that ran commands on remote hosts after querying the AWS API for available hosts. This worked surprisingly very well for more than two years. It never failed on us. Team members would build, deploy and sometimes revert multiple times a day. But as we grew we needed funky things like – deploy to all at once and deploy to a percentage of hosts at once. Besides going multi data center, this was going to be difficult to manage. We needed something that natively understood instance states and hooked into AutoScaling lifecycle hooks.
AWS CodeDeploy is our new hero. It integrates nicely with other AWS services.
The one thing I would really like to see CodeDeploy implement is a way to hook into the startup lifecycle events for an EC2 host not in an AutoScaling group. This way when a static EC2 host comes to life, deployment groups that qualify it wil