AWS Docker Execution System
Problem Statment:
How to design a system that can be used to run docker on AWS? The system should be flexible enough to launch the docker task within milliseconds and powerful enough to run dockers of huge sizing.
Infrastructure cost should also be taken care of considering the tasks may be sparsely throughout the day.
Prerequisites:
Before we begin going through the system let's visit the managed services that we are going to use.
- Storage Services:
a. Amazon DynamoDB:
b. Amazon Elastic File Storage - Compute Services:
a. Amazon Elastic Container Service
b. Amazon Elastic Compute Cloud
c. Amazon Fargate
d. Amazon Elastic Container Repository
e. Amazon AutoScaling and Launch Config
f. Amazon Lambda - Scaling And Management Services:
a. Amazon Cloud Watch
b. Amazon Capacity Provider
Architectural Overview:
This architecture has 2 processing cores
- ECS TASK PROCESSOR
a. EC2 + Spot Instances
b. Fargate - LAMBDA TASK PROCESSOR
Based on the request packet received by SQS, the main driver lambda decides which processing core to use.
The state management is done by storing task_id mapping in a simple DynamoDB table. An extension of Dynamo DB with app sync can be used to provide QraphQL API for state monitoring.
Driver Lambda
Based on the request packet received by SQS, the main driver lambda decides which processing core to use. It also decides to send a task to Fargate if in case the cluster is fully occupied.
Once the docker is all set to be executed the task request is then sent to the AWS ECS agent. This first request is to execute the task on an EC2 instance. Under any circumstances if the resources are not available of scale-out is in process, the ECS agent returns an error. This triggers another request to AWS ECS Agent to execute the task on Fargate. Fargate being infinitely scalable the task is bound to be picked up under any scenario.
ECS TASK PROCESSOR
Scaling And Management Services
A limitation of the CloudWatch alarm is that it allows you to choose only one metric at a time. It’s possible to set up two different alarms (one for CPU and one for memory) and trigger the AutoScaling Group to scale out. But when both metrics are used to scale in, you run into troubles. Imagine when you have a high CPU but low memory reservation. One alarm tries to scale out while the other wants to scale in. You end up in a situation where a new container instance is launched and another terminated again and again.
This gets even worse because tasks are not re-scheduled when a new container instance gets launched. But when the AutoScaling group terminates an instance, it normally chooses the oldest one (usually with many tasks running). That makes the cluster unstable as the tasks need to be scheduled over and over again.
It’s impossible to scale in and out with two different metrics.
When two metrics are problematic, the solution must be one metric, right? Ideally, a new metric, as the default metrics (CPU or memory reservation) can’t be used. Also, it should be easy to set up when the cluster should scale in and out.
The constraint is still, that it should be possible to start the largest container.
Keep enough resources to schedule the largest container.
The solution is a metric that shows the number of largest containers that could be scheduled. It’s calculated based on CPU and memory.
A cron-based lambda can be used to calculate this number. First, the available CPU is used to calculate how many of the largest containers can be scheduled. Second, the same happens for memory. The lowest number of these two calculations wins and in the end, it will be summed up for all container instances.
More interesting is which values have to be set as Threshold.
To scale out, the threshold of the SchedulableContainersLowAlert should be 1 to make sure that at least one instance of the largest possible container can be scheduled. Greater thresholds can make sense when it’s likely that multiple of the largest containers are started at the same time.
To scale in, the threshold of the SchedulableContainersHighAlert needs to be calculated. It has to be greater than the number of maximum containers that can be placed on one container instance.
Threshold = min ( cpu(ec2) / cpu (container), memory(ec2) / memory (container))
In our example the container instance is an m4.2xlarge instance (32.0 GB / 8 vCPUs):
Threshold = min ((8192 / 1024), (32,768 / 4096))
Threshold = min (8, 8)
Threshold = 8
Because 8 containers can be placed on one container instance, the threshold when the cluster should scale in needs to be greater than 8. Otherwise, it starts a loop of scaling out and in.
Alternate: ECS has a recently launched service called capacity provider that can be clubbed with the auto-scaling group to manage cluster sizing automatically. This will also enable queuing of submitted tasks.
Spot Instances
A Spot Instance is an unused EC2 instance that is available for less than the On-Demand price.
Disadvantages: Spot instances can be taken back any time by AWS on a short notice of 120 seconds.
Fail-Safe: Although the spot termination scenario is very unlikely to occur, we have a fail-safe mechanism that re-allocates executions onto other available resources and puts up another instance in place. The probability of such a scenario is expected to less than 0.1% in our use case.
Spot Bidding: Determination of bid value is very important to prevent spot termination and get the most out of it. As recommended by AWS we use the on-demand price as the bid out value. This instance will only be taken back once there are no such instances in that particular availability zone.
LAMBDA TASK PROCESSOR
It’s a combination of EC2 instances and Fargate. To reduce cost drastically spot instances came into the picture. This will further boost availability and can be used to run instance-dependent workloads.
- Compute Cluster: The infrastructure holds a hybrid cluster i.e. a combination of ECS Optimized C5.2x Large Instances and Fargate.
- EC2 Instances: We use Spot EC2 instances backed by Launch Config and Auto Scaling Group. This gives a huge saving over on-demand instances. Spot instances are also coupled with a fail-safe mechanism to support terminations failures.
- Fargate: We use AWS Fargate as last resort to run an algorithm in case EC2 Instances do not accept a given task due to a lack of resources.
- ECR: Elastic container repository is used to hold docker images that hold the algorithm logic. This algorithm takes input as a reference in the form of Environment Variables as override parameters.
- Task Definitions: Each algorithm is coupled with a task definition that determines docker images, resources, permission, limits, and more. This task definition can be used to run the task on both EC2 and Fargate.
- Auto Scaling Group: The auto-scaling group helps integrate EC2 launch configurations, update unhealthy instances, schedule instance turndowns, define scaling limits. and more.
References:
Fargate Pricing: https://www.trek10.com/blog/fargate-pricing-vs-ec2/