The current solution shows how to collect, store and plot time-series data about metrics related to resources consumed by jobs and tasks executed by flink. For this purpose, a general scheme had beeen tested, it is presented in the figure below.
In the next configuration metrics scopes may be vanished.
#=================================
#
#METRICS
#
#=================================
metrics.scope.jm: localhost.myhost
metrics.scope.jm.job: localhost.jobmanager.myjm
metrics.scope.tm: localhost.taskmanager.mytm
metrics.scope.tm.job: localhost.mytm.myjob
metrics.scope.task: localhost.taskmanager.mytm.myjob.mytask.idtask
metrics.reporters: inet_reporter
# #specifying the class for the internal reporter
metrics.reporter.inet_reporter.class: berlin.bbdc.inet.flinkReporter.FileFlinkReporter
metrics.reporter.inet_reporter.interval: 10 MILLISECONDS
#path for writing the metric file
metrics.reporter.inet_reporter.path: /tmp/metrics/
#Graphite server configuration
metrics.reporter.inet_reporter.host: localhost
metrics.reporter.inet_reporter.port: 2003
metrics.reporter.inet_reporter.protocol: TCP
mvn package -DskipTests
File ${FLINK_DIR}/conf/log4j.properties
log4j.rootLogger=DEBUG, file
cp target/wordcountMetrics-0.1.jar $FLINK/lib/
2017-02-11 15:15:58,099 DEBUG org.apache.hadoop.metrics2.impl.MetricsSystemImpl - UgiMetrics, User and group related metrics
2017-02-11 15:15:58,427 INFO org.apache.flink.runtime.jobmanager.JobManager - Classpath: /home/dgu/developer/myflink/lib/flink-dist_2.10-1.2-SNAPSHOT.jar:/home/dgu/developer/myflink/lib/flink-metrics-graphite-1.2-SNAPSHOT.jar:/home/dgu/developer/myflink/lib/flink-python_2.10-1.2-SNAPSHOT.jar:/home/dgu/developer/myflink/lib/log4j-1.2.17.jar:/home/dgu/developer/myflink/lib/slf4j-log4j12-1.7.7.jar:/home/dgu/developer/myflink/lib/wordcountMetrics-0.1.jar:::
2017-02-11 15:15:58,616 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.scope.jm, localhost.myhost
2017-02-11 15:15:58,617 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.scope.jm.job, localhost.jobmanager.myjm
2017-02-11 15:15:58,617 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.scope.tm, localhost.taskmanager.mytm
2017-02-11 15:15:58,617 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.scope.tm.job, localhost.mytm.myjob
2017-02-11 15:15:58,617 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.scope.task, localhost.taskmanager.mytm.myjob.mytask.idtask
2017-02-11 15:15:58,617 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.reporters, inet_reporter
2017-02-11 15:15:58,617 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.reporter.inet_reporter.class, berlin.bbdc.inet.flinkReporter.FileFlinkReporter
2017-02-11 15:15:58,617 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.reporter.inet_reporter.interval, '10 MILLISECONDS'
2017-02-11 15:15:58,633 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.scope.jm, localhost.myhost
2017-02-11 15:15:58,633 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.scope.jm.job, localhost.jobmanager.myjm
2017-02-11 15:15:58,634 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.scope.tm, localhost.taskmanager.mytm
2017-02-11 15:15:58,634 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.scope.tm.job, localhost.mytm.myjob
2017-02-11 15:15:58,634 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.scope.task, localhost.taskmanager.mytm.myjob.mytask.idtask
2017-02-11 15:15:58,634 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.reporters, inet_reporter
2017-02-11 15:15:58,634 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.reporter.inet_reporter.class, berlin.bbdc.inet.flinkReporter.FileFlinkReporter
2017-02-11 15:15:58,634 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.reporter.inet_reporter.interval, '10 MILLISECONDS'
2017-02-11 17:14:52,986 DEBUG berlin.bbdc.inet.flinkReporter.FileFlinkReporter - INET - Metrics - added: localhost.taskmanager.mytm.Status.JVM.ClassLoader.ClassesUnloaded
2017-02-11 17:14:52,986 DEBUG berlin.bbdc.inet.flinkReporter.FileFlinkReporter - INET - Metrics - added: localhost.taskmanager.mytm.Status.JVM.GarbageCollector.PS Scavenge.Count
2017-02-11 17:14:52,986 DEBUG berlin.bbdc.inet.flinkReporter.FileFlinkReporter - INET - Metrics - added: localhost.taskmanager.mytm.Status.JVM.GarbageCollector.PS Scavenge.Time
2017-02-11 17:14:52,986 DEBUG berlin.bbdc.inet.flinkReporter.FileFlinkReporter - INET - Metrics - added: localhost.taskmanager.mytm.Status.JVM.GarbageCollector.PS MarkSweep.Count
2017-02-11 17:14:52,986 DEBUG berlin.bbdc.inet.flinkReporter.FileFlinkReporter - INET - Metrics - added: localhost.taskmanager.mytm.Status.JVM.GarbageCollector.PS MarkSweep.Time
localhost.myhost.numRunningJobs.csv
localhost.myhost.Status.JVM.ClassLoader.ClassesLoaded.csv
localhost.myhost.Status.JVM.ClassLoader.ClassesUnloaded.csv
localhost.myhost.Status.JVM.CPU.Load.csv
localhost.myhost.Status.JVM.CPU.Time.csv
localhost.myhost.Status.JVM.GarbageCollector.PS MarkSweep.Count.csv
localhost.myhost.Status.JVM.GarbageCollector.PS MarkSweep.Time.csv
localhost.myhost.Status.JVM.GarbageCollector.PS Scavenge.Count.csv
localhost.myhost.Status.JVM.GarbageCollector.PS Scavenge.Time.csv
localhost.myhost.Status.JVM.Memory.direct.Count.csv
localhost.myhost.Status.JVM.Memory.direct.MemoryUsed.csv
localhost.myhost.Status.JVM.Memory.direct.TotalCapacity.csv
localhost.myhost.Status.JVM.Memory.Heap.Committed.csv
localhost.myhost.Status.JVM.Memory.Heap.Max.csv
localhost.myhost.Status.JVM.Memory.Heap.Used.csv
localhost.myhost.Status.JVM.Memory.mapped.Count.csv
localhost.myhost.Status.JVM.Memory.mapped.MemoryUsed.csv
localhost.myhost.Status.JVM.Memory.mapped.TotalCapacity.csv
localhost.myhost.Status.JVM.Memory.NonHeap.Committed.csv
@:/tmp$ cat localhost.myhost.numRunningJobs.csv
t,value
1486899737,0
1486899738,0
1486899739,0
1486899740,0
Python and pip
sudo apt install python-setuptools python-dev build-essential
sudo apt install python-pip
pip install --upgrade pip
Repositories
mkdir ~/graphite
cd ~/graphite
git clone https://github.com/graphite-project/graphite-web.git
git clone https://github.com/graphite-project/carbon.git
git clone https://github.com/graphite-project/whisper.git
Libraries
pip install whisper
pip install pyparsing==1.5.1
pip install tagging
pip install cairocffi
pip install Django==1.9
pip install django-cms
pip install django-tagging==0.4.3
pip install pytz
pip install whitenoise
pip install mod_wsgi
pip install Twisted==11.1.0
pip install carbon
apt install apache2
apt install apache2-dev libapache2-mod-wsgi
pip install scandir
pip install graphite-web
pip install mod_wsgi
In ~/graphite/graphite-web
cd ~/graphite/graphite-web
python setup.py install
Default installation $GRAPHITE_ROOT==/opt/graphite
sudo chmod -R 777 /opt/graphite/
sudo chown -R <your username>:staff /opt/graphite
cp /opt/graphite/conf/carbon.conf{.example,}
cp /opt/graphite/conf/storage-schemas.conf{.example,}
cp /opt/graphite/conf/graphite.wsgi{.example,}
cp /opt/graphite/conf/graphTemplates.conf{.example,}
cp /opt/graphite/conf/dashboard.conf{.example,}
cp /opt/graphite/conf/whitelist.conf{.example,}
cd /opt/graphite/webapp/graphite
cp local_settings.py{.example,}
PYTHONPATH=/opt/graphite/webapp django-admin.py migrate --settings=graphite.settings --run-syncdb
python /opt/graphite/bin/carbon-cache.py start
python /opt/graphite/bin/run-graphite-devel-server.py /opt/graphite
http://localhost:8080/
Login
User: hduser
Passwd: hduser
Graph example
http://localhost:8087/dashboard/#Example-SocketWindow-Data-Processing
In the following URL, we can configure all the numerical series we would like to display just by clicking on it. In the menu Dashboard we can find the options related to Graphs Edit, Save, Import, etc.
http://localhost:8087/dashboard