Many applications built at Mystique Unicorn use modern serverless architectures like AWS Lambda, Fargate, Amazon ECS, or EKS. These applications can have a large number of open connections to the database server, and may open and close database connections at a high rate, exhausting database memory and compute resources.
Relational databases were built for a lower number of long-running compute instances. They were not designed for spiky character of an event driven workload. You will quickly exhaust the database connections pools by trying to open a lot of connections in a very short period of time. Another item that is of interest is reducing the applications recovery time after database failures. Even if you have two nodes, either you update the dns entries of the database endpoint or let your clients be aware of the newer endpoint to connect.
Can you help the Miztiik Unicorn team with PoC on how these two database problems can be solved?
Amazon RDS Proxy1 can front your Amazon RDS for MySQL/PostgreSQL and Aurora MySQL/PostgreSQL databases. It allows you to manage an applicationβs access to the database and provides connection pooling, multiplexing, and graceful failover. It helps you to scale beyond database connection limits and manage bursts of connections and requests from applications. This post focuses on the failover benefits of RDS Proxy.
Failover occurs when the primary database instance becomes inaccessible and another instance takes over as the new primary. This disrupts client connections. DNS propagation delay is the largest contributor to overall failover time.
With RDS Proxy, the applications can avoid the complexity associated with failover and experience faster recovery. RDS Proxy actively monitors database instances and automatically connects clients to the right target. It also maintains idle client connections through database failover without dropping them. Idle connections in this context are connections that donβt have outstanding requests.
We will build the above architecture in this demo. We will use PostgreSQL 11.6
2 for our RDS database and use pgbench
3
-
This demo, instructions, scripts and cloudformation template is designed to be run in
us-east-1
. With few modifications you can try it out in other regions as well(Not covered here).- π AWS CLI Installed & Configured - Get help here
- π AWS CDK Installed & Configured - Get help here
- π Python Packages, Change the below commands to suit your OS, the following is written for amzn linux 2
- Python3 -
yum install -y python3
- Python Pip -
yum install -y python-pip
- Virtualenv -
pip3 install virtualenv
- Python3 -
-
-
Get the application code
git clone https://github.com/miztiik/improve-db-perf-with-rds-proxy cd improve-db-perf-with-rds-proxy
-
-
We will use
cdk
to make our deployments easier. Lets go ahead and install the necessary components.# You should have npm pre-installed # If you DONT have cdk installed npm install -g aws-cdk # Make sure you in root directory python3 -m venv .venv source .venv/bin/activate pip3 install -r requirements.txt
The very first time you deploy an AWS CDK app into an environment (account/region), youβll need to install a
bootstrap stack
, Otherwise just go ahead and deploy usingcdk deploy
.cdk bootstrap cdk ls # Follow on screen prompts
You should see an output of the available stacks,
improve-db-perf-with-rds-proxy-vpc-stack store-events-db-stack store-events-consumer-stack store-events-db-proxy-stack
-
Let us walk through each of the stacks,
-
Stack: store-events-db-stack
For the RDS database, we will use
PostgreSQL 11.6
2. This stack will also create the vpc, subnets, route tables using the stackimprove-db-perf-with-rds-proxy-vpc-stack
. We will be using the same VPC for hosting our RDS Consumer on EC2 server. The RDS instance will be hosted in the private subnet and the security group is configured to allow port5432
connections from within the VPC. The database is has the following configurations,- Database Name:
store_events
- DB Engine Version:
11.6.x
- DB Username:
mystiquemaster
- DB Password: Auto generated and stored in AWS Secrets Manager
- Parameter & Option Group: Defaults
- Performance Insights:
Enabled
- Multi-AZ:
Enabled
Initiate the deployment with the following command,
cdk deploy improve-db-perf-with-rds-proxy-vpc-stack cdk deploy store-events-db-stack
After successfully deploying the stack, Check the
Outputs
section of the stack. You will find the DB ResourceStoreEventsDatabase
. You will find the credentials to connect to the database in secrets manager arnStoreEventsDatabaseSecretArn
. The sample command to connect to the db with a cli can also be found there. - Database Name:
-
Stack: store-events-consumer-stack
To perform database tests on the RDS db, We will use an ec2 instance. The EC2 instance will be launched in the public subnet along with a IAM Role that supports SSM Session Manager access2. This Ec2 instance is configured to be bootstrapped with PostgreSQL client libraries to enable connections to RDS. Take a look at the userdata scripts at
stacks/back_end/store_events_consumer_on_ec2_stack/bootstrap_scripts/deploy_app.sh
if you want to modify them or improve them to add your own libraries.Initiate the deployment with the following command,
cdk deploy store-events-consumer-stack
After successfully deploying the stack, Check the
Outputs
section of the stack. You will find the instanceEc2ConsumerInstance
. -
Stack: store-events-db-proxy-stack
This stack will create Amazon RDS Proxy1 in the same VPC as the RDS database and within the same SG as the DB. In production you want to consider using different SGs for each of them. I have configured the proxy to maintain idle connections for a maximum of
10
minutes. After that those connections will be released.Initiate the deployment with the following command,
cdk deploy store-events-db-proxy-stack
After successfully deploying the stack, Check the
Outputs
section of the stack. You will find theStoreEventsDbProxy
resource.
-
-
-
Connect To OLTP consumer ON EC2: Connect to the
Ec2ConsumerInstance
instance using SSM Session Manager2. Get the rds credentials and connection information from AWS Secrets ManagerStoreEventsDatabaseSecretArn
. Although you can type the hostname, port and db name everything, it is better to set them as variables and re-use them for testing.DB_ENDPOINT="ssc2j5w2whgoua.ca4q8ficuhu.us-east-1.rds.amazonaws.com" DB_PROXY_ENDPOINT="store-events-db-proxy.proxy-ca4q8ficuhu.us-east-1.rds.amazonaws.com" DB_PORT="5432" DB_NAME="store_events" DB_USER_NAME="mystiquemaster" DB_PASSWORD="OBVIOUSLY_YOUR_PASSWORD_GOES_HERE" S3_PREFIX="store_events" # Initialize the database with some data for testing PGPASSWORD=${DB_PASSWORD} pgbench -h ${DB_ENDPOINT} -p ${DB_PORT} -U ${DB_USER_NAME} -d ${DB_NAME} -i -s 500
The initialization will take a few minutes to complete. Until it completes the loading process, you can check out the RDS Database Metrics in the console. You should be able to observe increased CPU usage and the free storage space decreasing.
After the initialization is complete, We will perform two tests. The first one will connect directly to the database endpoint and in the second one we will connect to the db using the proxy endpoint and see how the connection scaling works.
- Test: Connections-To-DB
We will set up
70
clients using-c
option and run the test for a duration of60
seconds. If you want you can increase no. of threads using-j
option. For now let us begin,> PGPASSWORD=${DB_PASSWORD} pgbench -h ${DB_ENDPOINT} -p ${DB_PORT} -U ${DB_USER_NAME} -d ${DB_NAME} -c 70 -T 60 ... transaction type: TPC-B (sort of) scaling factor: 500 query mode: simple number of clients: 70 number of threads: 1 duration: 60 s number of transactions actually processed: 32308 tps = 537.058231 (including connections establishing) tps = 546.439557 (excluding connections establishing)
- Test: Connections-To-Proxy
> PGPASSWORD=${DB_PASSWORD} pgbench -h ${DB_PROXY_ENDPOINT} -p ${DB_PORT} -U ${DB_USER_NAME} -d ${DB_NAME} -c 70 -T 60 ... transaction type: TPC-B (sort of) scaling factor: 500 query mode: simple number of clients: 70 number of threads: 1 duration: 60 s number of transactions actually processed: 30484 tps = 507.638955 (including connections establishing) tps = 520.950778 (excluding connections establishing)
If you run the tests multiple times, you will notice the number of transactions to settle around the same number, but when connecting to the proxy you should be able to support more client connections. For example, You can notice that I have chosen the number of clients to be
70
anything more with the direct to DB will fail, whereas you can have more clients with the proxy setup. Go ahead try increasing them and play around with them.In the above snapshot, The second spike is from running pgbench against the proxy endpoint. You can observe that the rate of increase is spread over time giving your db the much needed breather to catch up with spiky requests. You can also
10
different tests in a loop. Each time we will increase the number of connections(following a fibonacci pattern) and then observe the connections in cloudwatchfor i in 1 2 3 5 8 13 21 34 55 89; do sleep 5; PGPASSWORD=${DB_PASSWORD} pgbench -h ${DB_PROXY_ENDPOINT} -p ${DB_PORT} -U ${DB_USER_NAME} -d ${DB_NAME} -c $i -T 60;done
You can use RDS Performance Insights to identify the queries that are being run and the ones that is causing lot of load on CPU or waiting for data from disk to uncover potential performance issues.
You can see that there are no tables under our database
store_events
. -
FailOver Recovery:
- Test: Connections-To-DB
Let us perform a failover test, by manually triggering a reboot with failover option from the RDS console. Before you do that, initiate the following command from the consumer in EC2. This should run a query against the database server to get the current time in a infinite loop. We will use those timestamps to find out how long it takes to recover from a failure.
while true; \ do \ PGPASSWORD=${DB_PASSWORD} \ psql \ -h ${DB_ENDPOINT} \ -p ${DB_PORT} \ -U ${DB_USER_NAME} \ -d ${DB_NAME} \ -c "SELECT CURRENT_TIME;"; \ sleep .5; \ done
current_time ------------------- 18:31:42.46637+00 (1 row) current_time -------------------- 18:31:42.985845+00 (1 row) current_time -------------------- 18:31:43.505596+00 (1 row) psql: could not connect to server: Connection timed out Is the server running on host "ssc2j5w2whgoua.ca64q8ficuhu.us-east-1.rds.amazonaws.com" (10.10.2.135) and accepting TCP/IP connections on port 5432? current_time -------------------- 18:33:55.083264+00 (1 row) current_time -------------------- 18:33:55.606506+00 (1 row) (1 row)
You can see that, It has taken about
132
seconds to recover from failure- Test: Connections-To-Proxy
Now run this query, this will query the time against the proxy endpoint and trigger a reboot with failover from the console.
while true; \ do \ PGPASSWORD=${DB_PASSWORD} \ psql \ -h ${DB_PROXY_ENDPOINT} \ -p ${DB_PORT} \ -U ${DB_USER_NAME} \ -d ${DB_NAME} \ -c "SELECT CURRENT_TIME;"; \ sleep .5; \ done
current_time -------------------- 18:41:12.679197+00 (1 row) current_time -------------------- 18:41:13.209632+00 (1 row) current_time -------------------- 18:41:13.741972+00 (1 row) SSL connection has been closed unexpectedly connection to server was lost current_time -------------------- 18:41:34.426217+00 (1 row) current_time -------------------- 18:41:34.986858+00 (1 row)
You can see that, It has taken about
21
seconds to recover from failure. Definitely proxy has reduced the time to connect to the active node.
-
-
Here we have demonstrated how to use RDS Proxy to improve performance and recover times of your database. You can extend this further by using proxy the pin sessions.
-
If you want to destroy all the resources created by the stack, Execute the below command to delete the stack, or you can delete the stack from console as well
- Resources created during Deploying The Application
- Delete CloudWatch Lambda LogGroups
- Any other custom resources, you have created for this demo
# Delete from cdk cdk destroy # Follow any on-screen prompts # Delete the CF Stack, If you used cloudformation to deploy the stack. aws cloudformation delete-stack \ --stack-name "MiztiikAutomationStack" \ --region "${AWS_REGION}"
This is not an exhaustive list, please carry out other necessary steps as maybe applicable to your needs.
This repository aims to show how to use RDS Proxy for database performance to new developers, Solution Architects & Ops Engineers in AWS. Based on that knowledge these Udemy course #1, course #2 helps you build complete architecture in AWS.
Thank you for your interest in contributing to our project. Whether it is a bug report, new feature, correction, or additional documentation or solutions, we greatly value feedback and contributions from our community. Start here
Buy me a coffee β.
- AWS Docs: RDS Proxy
- AWS Docs: RDS Proxy prerequisites and limitations apply to PostgreSQL
- Docs: PostgreSQL Pgbench
Level: 300