-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-29064][CORE] Add PrometheusResource to export Executor metrics #25770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
srowen
reviewed
Sep 12, 2019
srowen
reviewed
Sep 12, 2019
This comment has been minimized.
This comment has been minimized.
Test build #110530 has finished for PR 25770 at commit
|
Hi, All. Could you review this once more, please? |
Gentle ping~ |
LGTM. Merged into master. Thanks. |
PavithraRamachandran
pushed a commit
to PavithraRamachandran/spark
that referenced
this pull request
Sep 15, 2019
### What changes were proposed in this pull request? At Apache Spark 3.0.0, [SPARK-23429](apache#21221) added the ability to collect executor metrics via heartbeats and to expose it as a REST API. This PR aims to extend it to support `Prometheus` format additionally. ### Why are the changes needed? Prometheus.io is a CNCF project used widely with K8s. - https://github.com/prometheus/prometheus ### Does this PR introduce any user-facing change? Yes. New web interfaces are added along with the existing JSON API. | | JSON End Point | Prometheus End Point | | ------- | ------------------------------------ | --------------------------------- | | Driver | /api/v1/applications/{id}/executors/ | /metrics/executors/prometheus/ | ### How was this patch tested? Manually connect to the new end-points with `curl` and compare with JSON. **SETUP** ``` $ sbin/start-master.sh $ sbin/start-slave.sh spark://`hostname`:7077 $ bin/spark-shell --master spark://`hostname`:7077 --conf spark.ui.prometheus.enabled=true ``` **JSON (existing after SPARK-23429)** ``` $ curl -s http://localhost:4040/api/v1/applications/app-20190911204823-0000/executors [ { "id" : "driver", "hostPort" : "localhost:52615", "isActive" : true, "rddBlocks" : 0, "memoryUsed" : 0, "diskUsed" : 0, "totalCores" : 0, "maxTasks" : 0, "activeTasks" : 0, "failedTasks" : 0, "completedTasks" : 0, "totalTasks" : 0, "totalDuration" : 0, "totalGCTime" : 0, "totalInputBytes" : 0, "totalShuffleRead" : 0, "totalShuffleWrite" : 0, "isBlacklisted" : false, "maxMemory" : 384093388, "addTime" : "2019-09-12T03:48:23.875GMT", "executorLogs" : { }, "memoryMetrics" : { "usedOnHeapStorageMemory" : 0, "usedOffHeapStorageMemory" : 0, "totalOnHeapStorageMemory" : 384093388, "totalOffHeapStorageMemory" : 0 }, "blacklistedInStages" : [ ], "peakMemoryMetrics" : { "JVMHeapMemory" : 229995952, "JVMOffHeapMemory" : 145872280, "OnHeapExecutionMemory" : 0, "OffHeapExecutionMemory" : 0, "OnHeapStorageMemory" : 0, "OffHeapStorageMemory" : 0, "OnHeapUnifiedMemory" : 0, "OffHeapUnifiedMemory" : 0, "DirectPoolMemory" : 75891, "MappedPoolMemory" : 0, "ProcessTreeJVMVMemory" : 0, "ProcessTreeJVMRSSMemory" : 0, "ProcessTreePythonVMemory" : 0, "ProcessTreePythonRSSMemory" : 0, "ProcessTreeOtherVMemory" : 0, "ProcessTreeOtherRSSMemory" : 0, "MinorGCCount" : 8, "MinorGCTime" : 82, "MajorGCCount" : 3, "MajorGCTime" : 128 }, "attributes" : { }, "resources" : { } }, { "id" : "0", "hostPort" : "127.0.0.1:52619", "isActive" : true, "rddBlocks" : 0, "memoryUsed" : 0, "diskUsed" : 0, "totalCores" : 16, "maxTasks" : 16, "activeTasks" : 0, "failedTasks" : 0, "completedTasks" : 0, "totalTasks" : 0, "totalDuration" : 0, "totalGCTime" : 0, "totalInputBytes" : 0, "totalShuffleRead" : 0, "totalShuffleWrite" : 0, "isBlacklisted" : false, "maxMemory" : 384093388, "addTime" : "2019-09-12T03:48:25.907GMT", "executorLogs" : { "stdout" : "http://127.0.0.1:8081/logPage/?appId=app-20190911204823-0000&executorId=0&logType=stdout", "stderr" : "http://127.0.0.1:8081/logPage/?appId=app-20190911204823-0000&executorId=0&logType=stderr" }, "memoryMetrics" : { "usedOnHeapStorageMemory" : 0, "usedOffHeapStorageMemory" : 0, "totalOnHeapStorageMemory" : 384093388, "totalOffHeapStorageMemory" : 0 }, "blacklistedInStages" : [ ], "attributes" : { }, "resources" : { } } ] ``` **Prometheus** ``` $ curl -s http://localhost:4040/metrics/executors/prometheus metrics_app_20190911204823_0000_driver_executor_rddBlocks_Count 0 metrics_app_20190911204823_0000_driver_executor_memoryUsed_Count 0 metrics_app_20190911204823_0000_driver_executor_diskUsed_Count 0 metrics_app_20190911204823_0000_driver_executor_totalCores_Count 0 metrics_app_20190911204823_0000_driver_executor_maxTasks_Count 0 metrics_app_20190911204823_0000_driver_executor_activeTasks_Count 0 metrics_app_20190911204823_0000_driver_executor_failedTasks_Count 0 metrics_app_20190911204823_0000_driver_executor_completedTasks_Count 0 metrics_app_20190911204823_0000_driver_executor_totalTasks_Count 0 metrics_app_20190911204823_0000_driver_executor_totalDuration_Value 0 metrics_app_20190911204823_0000_driver_executor_totalGCTime_Value 0 metrics_app_20190911204823_0000_driver_executor_totalInputBytes_Count 0 metrics_app_20190911204823_0000_driver_executor_totalShuffleRead_Count 0 metrics_app_20190911204823_0000_driver_executor_totalShuffleWrite_Count 0 metrics_app_20190911204823_0000_driver_executor_maxMemory_Count 384093388 metrics_app_20190911204823_0000_driver_executor_usedOnHeapStorageMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_usedOffHeapStorageMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_totalOnHeapStorageMemory_Count 384093388 metrics_app_20190911204823_0000_driver_executor_totalOffHeapStorageMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_JVMHeapMemory_Count 230406336 metrics_app_20190911204823_0000_driver_executor_JVMOffHeapMemory_Count 146132592 metrics_app_20190911204823_0000_driver_executor_OnHeapExecutionMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_OffHeapExecutionMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_OnHeapStorageMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_OffHeapStorageMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_OnHeapUnifiedMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_OffHeapUnifiedMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_DirectPoolMemory_Count 97049 metrics_app_20190911204823_0000_driver_executor_MappedPoolMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_ProcessTreeJVMVMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_ProcessTreeJVMRSSMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_ProcessTreePythonVMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_ProcessTreePythonRSSMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_ProcessTreeOtherVMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_ProcessTreeOtherRSSMemory_Count 0 metrics_app_20190911204823_0000_driver_executor_MinorGCCount_Count 8 metrics_app_20190911204823_0000_driver_executor_MinorGCTime_Count 82 metrics_app_20190911204823_0000_driver_executor_MajorGCCount_Count 3 metrics_app_20190911204823_0000_driver_executor_MajorGCTime_Count 128 metrics_app_20190911204823_0000_0_executor_rddBlocks_Count 0 metrics_app_20190911204823_0000_0_executor_memoryUsed_Count 0 metrics_app_20190911204823_0000_0_executor_diskUsed_Count 0 metrics_app_20190911204823_0000_0_executor_totalCores_Count 16 metrics_app_20190911204823_0000_0_executor_maxTasks_Count 16 metrics_app_20190911204823_0000_0_executor_activeTasks_Count 0 metrics_app_20190911204823_0000_0_executor_failedTasks_Count 0 metrics_app_20190911204823_0000_0_executor_completedTasks_Count 0 metrics_app_20190911204823_0000_0_executor_totalTasks_Count 0 metrics_app_20190911204823_0000_0_executor_totalDuration_Value 0 metrics_app_20190911204823_0000_0_executor_totalGCTime_Value 0 metrics_app_20190911204823_0000_0_executor_totalInputBytes_Count 0 metrics_app_20190911204823_0000_0_executor_totalShuffleRead_Count 0 metrics_app_20190911204823_0000_0_executor_totalShuffleWrite_Count 0 metrics_app_20190911204823_0000_0_executor_maxMemory_Count 384093388 metrics_app_20190911204823_0000_0_executor_usedOnHeapStorageMemory_Count 0 metrics_app_20190911204823_0000_0_executor_usedOffHeapStorageMemory_Count 0 metrics_app_20190911204823_0000_0_executor_totalOnHeapStorageMemory_Count 384093388 metrics_app_20190911204823_0000_0_executor_totalOffHeapStorageMemory_Count 0 ``` Closes apache#25770 from dongjoon-hyun/SPARK-29064. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>
dongjoon-hyun
added a commit
that referenced
this pull request
Oct 10, 2019
### What changes were proposed in this pull request? [SPARK-29064](#25770) introduced `PrometheusResource` to expose `ExecutorSummary`. This PR aims to improve it further more `Prometheus`-friendly to use [Prometheus labels](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels). ### Why are the changes needed? **BEFORE** ``` metrics_app_20191008151432_0000_driver_executor_rddBlocks_Count 0 metrics_app_20191008151432_0000_driver_executor_memoryUsed_Count 0 metrics_app_20191008151432_0000_driver_executor_diskUsed_Count 0 ``` **AFTER** ``` $ curl -s http://localhost:4040/metrics/executors/prometheus/ | head -n3 metrics_executor_rddBlocks_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 metrics_executor_memoryUsed_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 metrics_executor_diskUsed_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 ``` ### Does this PR introduce any user-facing change? No, but `Prometheus` understands the new format and shows more intelligently. <img width="735" alt="ui" src="https://user-images.githubusercontent.com/9700541/66438279-1756f900-e9e1-11e9-91c7-c04c6ce9172f.png"> ### How was this patch tested? Manually. **SETUP** ``` $ sbin/start-master.sh $ sbin/start-slave.sh spark://`hostname`:7077 $ bin/spark-shell --master spark://`hostname`:7077 --conf spark.ui.prometheus.enabled=true ``` **RESULT** ``` $ curl -s http://localhost:4040/metrics/executors/prometheus/ | head -n3 metrics_executor_rddBlocks_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 metrics_executor_memoryUsed_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 metrics_executor_diskUsed_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 ``` Closes #26060 from dongjoon-hyun/SPARK-29400. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
atronchi
pushed a commit
to atronchi/spark
that referenced
this pull request
Oct 23, 2019
### What changes were proposed in this pull request? [SPARK-29064](apache#25770) introduced `PrometheusResource` to expose `ExecutorSummary`. This PR aims to improve it further more `Prometheus`-friendly to use [Prometheus labels](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels). ### Why are the changes needed? **BEFORE** ``` metrics_app_20191008151432_0000_driver_executor_rddBlocks_Count 0 metrics_app_20191008151432_0000_driver_executor_memoryUsed_Count 0 metrics_app_20191008151432_0000_driver_executor_diskUsed_Count 0 ``` **AFTER** ``` $ curl -s http://localhost:4040/metrics/executors/prometheus/ | head -n3 metrics_executor_rddBlocks_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 metrics_executor_memoryUsed_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 metrics_executor_diskUsed_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 ``` ### Does this PR introduce any user-facing change? No, but `Prometheus` understands the new format and shows more intelligently. <img width="735" alt="ui" src="https://user-images.githubusercontent.com/9700541/66438279-1756f900-e9e1-11e9-91c7-c04c6ce9172f.png"> ### How was this patch tested? Manually. **SETUP** ``` $ sbin/start-master.sh $ sbin/start-slave.sh spark://`hostname`:7077 $ bin/spark-shell --master spark://`hostname`:7077 --conf spark.ui.prometheus.enabled=true ``` **RESULT** ``` $ curl -s http://localhost:4040/metrics/executors/prometheus/ | head -n3 metrics_executor_rddBlocks_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 metrics_executor_memoryUsed_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 metrics_executor_diskUsed_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 ``` Closes apache#26060 from dongjoon-hyun/SPARK-29400. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun
added a commit
that referenced
this pull request
Jan 27, 2024
### What changes were proposed in this pull request? `spark.ui.prometheus.enabled` has been used since Apache Spark 3.0.0. - #25770 This PR aims to enable `spark.ui.prometheus.enabled` by default like Driver `JSON` API in Apache Spark 4.0.0. | | JSON End Point | Prometheus End Point | | ------- | ------------------------------------ | --------------------------------- | | Driver | /api/v1/applications/{id}/executors/ | /metrics/executors/prometheus/ | ### Why are the changes needed? **BEFORE** ``` $ bin/spark-shell $ curl -s http://localhost:4040/metrics/executors/prometheus | wc -l 0 ``` **AFTER** ``` $ bin/spark-shell $ curl -s http://localhost:4040/metrics/executors/prometheus | wc -l 20 ``` ### Does this PR introduce _any_ user-facing change? No, this is only a new endpoint. ### How was this patch tested? Pass the CIs and do manual test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44912 from dongjoon-hyun/SPARK-46886. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
szehon-ho
pushed a commit
to szehon-ho/spark
that referenced
this pull request
Feb 7, 2024
### What changes were proposed in this pull request? `spark.ui.prometheus.enabled` has been used since Apache Spark 3.0.0. - apache#25770 This PR aims to enable `spark.ui.prometheus.enabled` by default like Driver `JSON` API in Apache Spark 4.0.0. | | JSON End Point | Prometheus End Point | | ------- | ------------------------------------ | --------------------------------- | | Driver | /api/v1/applications/{id}/executors/ | /metrics/executors/prometheus/ | ### Why are the changes needed? **BEFORE** ``` $ bin/spark-shell $ curl -s http://localhost:4040/metrics/executors/prometheus | wc -l 0 ``` **AFTER** ``` $ bin/spark-shell $ curl -s http://localhost:4040/metrics/executors/prometheus | wc -l 20 ``` ### Does this PR introduce _any_ user-facing change? No, this is only a new endpoint. ### How was this patch tested? Pass the CIs and do manual test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#44912 from dongjoon-hyun/SPARK-46886. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
At Apache Spark 3.0.0, SPARK-23429 added the ability to collect executor metrics via heartbeats and to expose it as a REST API. This PR aims to extend it to support
Prometheus
format additionally.Why are the changes needed?
Prometheus.io is a CNCF project used widely with K8s.
Does this PR introduce any user-facing change?
Yes. New web interfaces are added along with the existing JSON API.
How was this patch tested?
Manually connect to the new end-points with
curl
and compare with JSON.SETUP
JSON (existing after SPARK-23429)
Prometheus