Skip to content

dabidgs3/flink-metrics

Repository files navigation

Introduction

The current solution shows how to collect, store and plot time-series data about metrics related to resources consumed by jobs and tasks executed by flink. For this purpose, a general scheme had beeen tested, it is presented in the figure below.

alt text

File Scheduled Reporter

Flink Configuration

In the next configuration metrics scopes may be vanished.


#=================================
#
#METRICS
#
#=================================
metrics.scope.jm: localhost.myhost
metrics.scope.jm.job: localhost.jobmanager.myjm
metrics.scope.tm: localhost.taskmanager.mytm
metrics.scope.tm.job: localhost.mytm.myjob
metrics.scope.task: localhost.taskmanager.mytm.myjob.mytask.idtask

metrics.reporters: inet_reporter
# #specifying the class for the internal reporter
metrics.reporter.inet_reporter.class: berlin.bbdc.inet.flinkReporter.FileFlinkReporter
metrics.reporter.inet_reporter.interval: 10 MILLISECONDS
#path for writing the metric file
metrics.reporter.inet_reporter.path: /tmp/metrics/
#Graphite server configuration
metrics.reporter.inet_reporter.host: localhost
metrics.reporter.inet_reporter.port: 2003
metrics.reporter.inet_reporter.protocol: TCP

Package project

mvn package -DskipTests

Debug flink

File ${FLINK_DIR}/conf/log4j.properties

log4j.rootLogger=DEBUG, file

Flink lib dependency

cp target/wordcountMetrics-0.1.jar $FLINK/lib/

Expectation

2017-02-11 15:15:58,099 DEBUG org.apache.hadoop.metrics2.impl.MetricsSystemImpl             - UgiMetrics, User and group related metrics
2017-02-11 15:15:58,427 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Classpath: /home/dgu/developer/myflink/lib/flink-dist_2.10-1.2-SNAPSHOT.jar:/home/dgu/developer/myflink/lib/flink-metrics-graphite-1.2-SNAPSHOT.jar:/home/dgu/developer/myflink/lib/flink-python_2.10-1.2-SNAPSHOT.jar:/home/dgu/developer/myflink/lib/log4j-1.2.17.jar:/home/dgu/developer/myflink/lib/slf4j-log4j12-1.7.7.jar:/home/dgu/developer/myflink/lib/wordcountMetrics-0.1.jar:::
2017-02-11 15:15:58,616 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.scope.jm, localhost.myhost
2017-02-11 15:15:58,617 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.scope.jm.job, localhost.jobmanager.myjm
2017-02-11 15:15:58,617 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.scope.tm, localhost.taskmanager.mytm
2017-02-11 15:15:58,617 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.scope.tm.job, localhost.mytm.myjob
2017-02-11 15:15:58,617 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.scope.task, localhost.taskmanager.mytm.myjob.mytask.idtask
2017-02-11 15:15:58,617 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporters, inet_reporter
2017-02-11 15:15:58,617 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporter.inet_reporter.class, berlin.bbdc.inet.flinkReporter.FileFlinkReporter
2017-02-11 15:15:58,617 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporter.inet_reporter.interval, '10 MILLISECONDS'
2017-02-11 15:15:58,633 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.scope.jm, localhost.myhost
2017-02-11 15:15:58,633 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.scope.jm.job, localhost.jobmanager.myjm
2017-02-11 15:15:58,634 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.scope.tm, localhost.taskmanager.mytm
2017-02-11 15:15:58,634 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.scope.tm.job, localhost.mytm.myjob
2017-02-11 15:15:58,634 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.scope.task, localhost.taskmanager.mytm.myjob.mytask.idtask
2017-02-11 15:15:58,634 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporters, inet_reporter
2017-02-11 15:15:58,634 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporter.inet_reporter.class, berlin.bbdc.inet.flinkReporter.FileFlinkReporter
2017-02-11 15:15:58,634 DEBUG org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporter.inet_reporter.interval, '10 MILLISECONDS'

2017-02-11 17:14:52,986 DEBUG berlin.bbdc.inet.flinkReporter.FileFlinkReporter              - INET - Metrics - added: localhost.taskmanager.mytm.Status.JVM.ClassLoader.ClassesUnloaded
2017-02-11 17:14:52,986 DEBUG berlin.bbdc.inet.flinkReporter.FileFlinkReporter              - INET - Metrics - added: localhost.taskmanager.mytm.Status.JVM.GarbageCollector.PS Scavenge.Count
2017-02-11 17:14:52,986 DEBUG berlin.bbdc.inet.flinkReporter.FileFlinkReporter              - INET - Metrics - added: localhost.taskmanager.mytm.Status.JVM.GarbageCollector.PS Scavenge.Time
2017-02-11 17:14:52,986 DEBUG berlin.bbdc.inet.flinkReporter.FileFlinkReporter              - INET - Metrics - added: localhost.taskmanager.mytm.Status.JVM.GarbageCollector.PS MarkSweep.Count
2017-02-11 17:14:52,986 DEBUG berlin.bbdc.inet.flinkReporter.FileFlinkReporter              - INET - Metrics - added: localhost.taskmanager.mytm.Status.JVM.GarbageCollector.PS MarkSweep.Time

Results

localhost.myhost.numRunningJobs.csv
localhost.myhost.Status.JVM.ClassLoader.ClassesLoaded.csv
localhost.myhost.Status.JVM.ClassLoader.ClassesUnloaded.csv
localhost.myhost.Status.JVM.CPU.Load.csv
localhost.myhost.Status.JVM.CPU.Time.csv
localhost.myhost.Status.JVM.GarbageCollector.PS MarkSweep.Count.csv
localhost.myhost.Status.JVM.GarbageCollector.PS MarkSweep.Time.csv
localhost.myhost.Status.JVM.GarbageCollector.PS Scavenge.Count.csv
localhost.myhost.Status.JVM.GarbageCollector.PS Scavenge.Time.csv
localhost.myhost.Status.JVM.Memory.direct.Count.csv
localhost.myhost.Status.JVM.Memory.direct.MemoryUsed.csv
localhost.myhost.Status.JVM.Memory.direct.TotalCapacity.csv
localhost.myhost.Status.JVM.Memory.Heap.Committed.csv
localhost.myhost.Status.JVM.Memory.Heap.Max.csv
localhost.myhost.Status.JVM.Memory.Heap.Used.csv
localhost.myhost.Status.JVM.Memory.mapped.Count.csv
localhost.myhost.Status.JVM.Memory.mapped.MemoryUsed.csv
localhost.myhost.Status.JVM.Memory.mapped.TotalCapacity.csv
localhost.myhost.Status.JVM.Memory.NonHeap.Committed.csv

Measurements

@:/tmp$ cat localhost.myhost.numRunningJobs.csv
t,value
1486899737,0
1486899738,0
1486899739,0
1486899740,0

Graphite

Pre-requisites

Python and pip

sudo apt install python-setuptools python-dev build-essential
sudo apt install python-pip
pip install --upgrade pip

Repositories

mkdir ~/graphite
cd ~/graphite
git clone https://github.com/graphite-project/graphite-web.git
git clone https://github.com/graphite-project/carbon.git
git clone https://github.com/graphite-project/whisper.git

Libraries

pip install whisper
pip install pyparsing==1.5.1
pip install tagging
pip install cairocffi
pip install Django==1.9
pip install django-cms
pip install django-tagging==0.4.3
pip install pytz
pip install whitenoise
pip install mod_wsgi
pip install Twisted==11.1.0
pip install carbon
apt install apache2
apt install apache2-dev libapache2-mod-wsgi
pip install scandir
pip install graphite-web
pip install mod_wsgi

Installation

In ~/graphite/graphite-web

cd ~/graphite/graphite-web
python setup.py install

Default installation $GRAPHITE_ROOT==/opt/graphite

sudo chmod -R 777 /opt/graphite/
sudo chown -R <your username>:staff /opt/graphite
cp /opt/graphite/conf/carbon.conf{.example,}
cp /opt/graphite/conf/storage-schemas.conf{.example,}
cp /opt/graphite/conf/graphite.wsgi{.example,}
cp /opt/graphite/conf/graphTemplates.conf{.example,}
cp /opt/graphite/conf/dashboard.conf{.example,}
cp /opt/graphite/conf/whitelist.conf{.example,}
cd /opt/graphite/webapp/graphite
cp local_settings.py{.example,}
PYTHONPATH=/opt/graphite/webapp django-admin.py migrate --settings=graphite.settings --run-syncdb

Testing

python /opt/graphite/bin/carbon-cache.py start
python /opt/graphite/bin/run-graphite-devel-server.py /opt/graphite

Browsing

http://localhost:8080/

Creating a graph

Login 
User: hduser
Passwd: hduser

Dashboard

Graph example

http://localhost:8087/dashboard/#Example-SocketWindow-Data-Processing

In the following URL, we can configure all the numerical series we would like to display just by clicking on it. In the menu Dashboard we can find the options related to Graphs Edit, Save, Import, etc.

http://localhost:8087/dashboard

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages