Skip to content

yan-bev/emr_jstat_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

135 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Filtered Jstat

Filtered Jstat graphs old storage percentage, full garbage collection occurances, and full garbage collection time for all applicable processes in all instances in the desired EMR Cluster.

Description

From the Master node, Main.py runs jstat_starter, sleeps for one hour and extracts the jstat output from each instance in an endless loop. In order to graph O%, FGCT, and FGC, grapher.py should be run. The graph will be uploaded to s3 to specified bucket.

If desired, all jstat processes inclduing main.py can be killed by running jstat_killer.py.


Getting Started

In order to use Filtered Jstat, there must be at least one running EMR Cluster with both master and worker node(s). the security group of the master node must have ssh/port 22 available. The ssh key must be saved in AWS Secrets Manager.


Executing Program

  1. (required): open ssh on the security group of the master node from the local machine.
  2. run aws configure on the local machine, ensure that the region name is the same as the EMR Cluster Region:
    aws configure
  3. change SecretName in config.ini to the key used stored in AWS Secrets Manager
  4. change RegionName in config.ini to the region the key is stored in.
  5. grant the Master Node access to AWS Secrets Manager by using IAM. This can be done by adding the below policy (while pasting the ARN) to the MasterNode Role.
        {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "VisualEditor0",
                    "Effect": "Allow",
                    "Action": "secretsmanager:GetSecretValue",
                    "Resource": "<secret_arn>"
                }
            ]
        }
  1. ssh into the master node of the cluster:
    ssh -i <path_to_key> <desired_user>@<MasterNode_PublicIP>
  2. install git:
    sudo yum install git
  3. clone emr_project to Master Node:
    git clone https://github.com/yan-bev/emr_jstat_project
  4. install python modules:
    pip install paramiko pandas matplotlib boto3
  5. (optional): create a s3 bucket to hold the graphs (required): replace S3BucketName in config.ini with the desired s3 bucket.
  6. run main.py:
    python3 main.py & > /dev/null
  7. run grapher.py:
    python3 grapher.py
  8. (optional): run jps_killer.py to stop all jps processes and jps output files on the worker node(s) as well as remove csv files from the master. this will also kill the main.py process.

Graph Output With Test Data

expected output

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published