RStoolKit - RedShift Health Check

If you are managing a single node Redshift cluster or a big giant multi node cluster, you are responsible for its performance. RedShift performance optimization starts from the table designing. We have an amazing RedShift Utility repo where we can get a bunch of SQL queries to check the cluster's status. But unfortunately many times we may end up with many queries but the results are positive, no need to optimize anything.

The goal of this project is to concentrate only on where is the bottleneck in your RedShift cluster and a complete health check recipe in one place. Once it identified the issues in the cluster, you don't need to run somewhere to find the solutions for them. This utility will take you to the documentation page where you'll find the cause for this issue and how to solve them. Also, these doc pages contain External links section where you can find scripts from other trusted repositions like AWS RedShift Utilities, AWS Blogs, and some documentation to learn and fix the issue.

📢 What we are checking in your cluster?

Design - Find the wrong distribution and sort keys, compression issues and etc.
Table - Table related issues like without sort key, unused tables and etc.
WLM - Find the max connection hit, check the concurrency is good or not, number of queues.
Vacuum - Tables never performed vacuum, ghost rows, tombstone blocks and etc.
Performance - Disk based queries, High CPU queries and etc.

List of checks and their priority:

Category	Check ID	Findings	Threshold-Red 🚨	Threshold-Yellow:question:	Threshold-Green:+1:
Design	1	Tables without sort keys	Greater than 10	Between 1 to 10	0
Design	2	Sort key column compressed	Greater than 10	Between 1 to 10	0
Design	3	Sort key skew Greater than 4	Greater than 10	Between 1 to 10	0
Design	4	Tables with high Skew	Greater than 10	Between 1 to 10	0
Design	5	Tables without compression	Greater than 10	Between 1 to 10	0
WLM	6	WLM queue wait time Greater than 1min	Greater than 900	Between 180 to 900	Less than 180
WLM	7	WLM max connection hit	Greater than 400	Between 100 to 400	Less than 100
WLM	8	Number of WLM queue	1	Between 2 and 3	Greater than 3
WLM	9	Auto WLM enabled	-	0	Greater than 1
WLM	10	Max concurrency for a slot	Greater than 20	Between 15 and 20	Less than 16
WLM	11	WLM commit queue wait	Greater than 120	Between 60 to 120	Less than 60
Vacuum	12	Ghost rows	Greater than 100000	Between 1000 to 100000	Less than 1000
Vacuum	13	Tables never performed vacuum (based on STL_Vacuum)	Greater than 5	Between 1 and 5	0
Vacuum	14	Table vacuum older than 5 days	Greater than 10	Between 5 and 10	Less than 5
Vacuum	15	Tables with tombstone blocks	Greater than 5	Between 1 and 5	0
Vacuum	16	Tables with missing stats	Greater than 5	Between 1 and 5	0
Vacuum	17	Tables with stale stats (Greater than 5 percent)	Greater than 5	Between 1 and 5	0
Table	18	Top size tables	Greater than 2	Between 1 and 2	0
Table	19	Table with high number of alerts (Greater than 3 alerts)	Greater than 5	Between 1 and 5	0
Table	20	Non scaned Tables (based on STL Scan)	Greater than 5	Between 1 and 5	0
Table	21	Tables without backup	Greater than 0	No	0
Table	22	Tables with fragmentation	Greater than 5	Between 1 and 5	0
Performance	23	Disk based queries	Greater than 500	Between 300 and 500	Less than 300
Performance	24	COPY not optimized	Greater than 5	Between 1 and 5	0
Performance	25	High CPU queries (Greater than 80 Percent)	Greater than 10	Between 5 and 10	Less than 5
Performance	26	Most frequent Alert (Greater than 500 times)	Greater than 10	Between 5 and 10	Less than 5
Performance	27	Long running queries (Greater than 30mins)	Greater than 300	Between 100 and 300	Less than 100
Performance	28	Max temp space used by queries	Greater than 10	Between 5 and 10	Less than 5

Before running the script:

We optimized this query as much as possible.

We recommend running this on your non peak hours.
If you have a single node cluster, then keep an eye on CPU (but it'll not go more than 50% or 60%).
If your cluster is launched within 3 days(new cluster or if you did elastic resize), then the system tables may not have enough data to find out the bottleneck. So at least 3 days of data is recommended on STL, SVV tables to run this script.

Run the Health Check:

SQL script:

You can just download the health-check.sql script from this repo and run this on your favourite UI tool like pgAdmin, DBeaver or Aginity workbench.
If you are using command line tool psql client then you can import this via command line.

wget https://raw.githubusercontent.com/searceinc/RStoolKit/master/health-check.sql
psql -h redshift-endpoint -U user -p 5439 -d db_name < health-check.sql

Sample Report:

Scheduled Report via Lambda:

If you want to run this health check every week or some particular frequency, you can use AWS Lambda to run this health check and send the report over the email with SES. We have created a CloudFormation template to create this lambda function. RedShift and SES details can be passed through the Lambda environment variables. Please follow the below instructions to deploy this.

Changes in RedShift:

No changes on the cluster level, but lambda needs to access the RedShift cluster. So the subnets that you are going to use for Lambda those IP ranges needs to be whitelisted on RedShift's security group.

Changes in the VPC:

We strongly recommend that to use SES with VPC endpoints for security reasons.
Create a VPC Endpoint on SES supported Regions and attach to your RedShift VPC.
While creating this Endpoint it will you to pick subnets where SES endpoints are available(Still many AZs is not supporting this).
Create a Security Group for AWS Lambda and allow the 25, 465, 587, 2465, or 2587 port to the subnets IP Range that you in your previous step.
More detailed information from the AWS Doc: https://docs.aws.amazon.com/ses/latest/DeveloperGuide/send-email-set-up-vpc-endpoints.html
Create an AWS IAM user for SES to send emails. Follow this link for more information.

Resources will be created by the Cloud Formation:

IAM role for Lambda.
Custom policy for lambda [AWSLambdaVPCAccessExecutionRole and AWSLambdaBasicExecutionRole]
Lambda function with Python 3.7

Things you need to take care:

Its an addition step for security. The function will work without this change, but recommend to enable the KMS encryption to this.

This cloud formation stack will use the RedShift credentials and SES credentials as an Environment variables.
Its all in plain text. To ensure the security you can encrypt them using KMS. For more information please follow this link.

Cloud Formation Parameters:

Parameter name	Purpose
REDSHIFTendpoint	RedShift cluster endpoint
REDSHIFTdatabase	RedShift Database name
REDSHIFTuser	User Name for RedShift Cluster
REDSHIFTpasswd	Password for RedShift Cluster
REDSHIFTport	RedShift port
SESregion	SES region
SESendpoint	SES SMTP endpoint
SESusername	SES Access Key
SESpassword	SES Secret Key
SESsendermail	SES verified sender email
SESrecipient	Email recipients
S3codebucket	Lambda code bucket
S3codekey	lamda zip file location
lambdaSecurityGroup	Security group for lambda
lambdaSubnetIds	Subnets for the Lambda

Deploy this Stack (Regions where SES is supported):

Region	URL (Click the Launch stack icon)
US East (N. Virginia)
US West (Oregon)
Asia Pacific (Mumbai)
Asia Pacific (Sydney)
Canada (Central)
Europe (Frankfurt)
Europe (Ireland)
Europe (London)
South America (São Paulo)

Manual Deployment (If your region is not listed on the above list)

The above list is prepared if SES is officially supported. If you want to use this lambda function, still you can deploy it on the other regions, but you need to take care of the Email part from the lambda code. (Still, we didn't test this option)

Or use SES in another region.
- Create a SES user on any supported region, and verify an email address.
- Create NAT gateway and attach it to a new route table.
- Create or choose any exsiting subnets to use this NAT gateway.
- While launching the CF template select the subnet that has NAT gate way.
Download the Lambda code
Download CloudFormation template

Sample Report:

License & Contributing:

RStooKit is free and open sourced under the MIT license.
We appreciate and happy to accept the pull requests from anyone. Read the contributing guide before submit your pull request.
If you want add your query or modify something on the documentation page, feel free to get the documents under the /src/doc/.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
src		src
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
code-of-conduct.md		code-of-conduct.md
health-check.sql		health-check.sql
rstoolkit-cloudformation-template.yaml		rstoolkit-cloudformation-template.yaml
rstoolkit-lambda-function.zip		rstoolkit-lambda-function.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RStoolKit - RedShift Health Check

📢 What we are checking in your cluster?

List of checks and their priority:

Before running the script:

Run the Health Check:

SQL script:

Sample Report:

Scheduled Report via Lambda:

Changes in RedShift:

Changes in the VPC:

Resources will be created by the Cloud Formation:

Things you need to take care:

Cloud Formation Parameters:

Deploy this Stack (Regions where SES is supported):

Manual Deployment (If your region is not listed on the above list)

Sample Report:

License & Contributing:

About

Releases

Packages

Languages

License

giovani-anjos/RStoolKit

Folders and files

Latest commit

History

Repository files navigation

RStoolKit - RedShift Health Check

📢 What we are checking in your cluster?

List of checks and their priority:

Before running the script:

Run the Health Check:

SQL script:

Sample Report:

Scheduled Report via Lambda:

Changes in RedShift:

Changes in the VPC:

Resources will be created by the Cloud Formation:

Things you need to take care:

Cloud Formation Parameters:

Deploy this Stack (Regions where SES is supported):

Manual Deployment (If your region is not listed on the above list)

Sample Report:

License & Contributing:

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages