AWS – A Programmer's Blog

Metrics Driven Development – What I did to reduce AWS EC2 costs to 27% and improve 25% in latency

Recently, I did some work related to auto-scaling and performance tuning. As a result, the costs reduced to 27% and service latency improved 25%.

Overall Instance Count And Service Latency Change

Takeaways

React Server Side Render performs not good under Nodejs Cluster, consider using a reverse proxy, e.g. Nginx
React V16 Server Side Render performs much faster than V15, 40% in our case
Use smaller instances to get better scaling granularity if possible, e.g. change C4.2xLarge to C4.Large
AWS t2.large performs 3 times slower than C4.large on React Server Side Render
AWS Lambda performs 3 times slower than C4.large on React Server Side Render
There’s a race condition in Nginx http upstream keepalive module which generates 502 Bad Gateway errors (104 connection reset by peer)

Background

Here’s the background of the service before optimization:

Serving 6000 requests per minute
Using AWS Classic Load Balancer
Running 25 C3.2xLarge EC2 instances which have 8-core CPU on each instance
Using PM2 as the Process Manager and the Cluster Manager
Written in Nodejs and using React 15 server-side render
Continue reading “Metrics Driven Development – What I did to reduce AWS EC2 costs to 27% and improve 25% in latency”

How To: Create Subscription Filter in CloudWatch using serverless

Recently, I worked on a task which need to collect all CloudWatch logs to a Kinesis stream. The project is using Serverless for deployment. There are some plugins to create CloudWatch Log subscription filter, but none of them using Kinesis as the destination.

Then by using the serverless-scriptable-plugin, I’m able to do this very easily. The following code find out all CloudWatch LogGroups, and create a SubscriptionFilter for each of them.

Create a file at build/serverless/add-log-subscriptions.js Continue reading “How To: Create Subscription Filter in CloudWatch using serverless”

Troubleshooting of blocked requests when fetching messages from hundreds SQS queues

I’m working on a project which needs to fetch messages from hundreds of SQS queues. We’re using SQS long polling to reduce the number of empty responses. It was very quick to get response at first when there are only dozen queues. As we added more and more queues, the performance getting worse and worse. It takes 60 seconds to get the response when there’s 300 queues and WaitTimeSeconds set to 10 seconds.

We are using Node.js in single thread mode, and I believe that it could handle 10 thousands connections without any problem because most of the tasks are IO processing. We also created an AWS support case, but nobody clearly answered the question.

Using AWS SDK to reproduce the issue

I start to troubleshoot the problem, the first step is reproduce the issue using a simple code which makes it easier to find out the issue. Continue reading “Troubleshooting of blocked requests when fetching messages from hundreds SQS queues”