Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] add more metrics to help locate hotspot issues #53490

Merged
merged 11 commits into from
Feb 7, 2025

Conversation

silverbullet233
Copy link
Contributor

@silverbullet233 silverbullet233 commented Dec 3, 2024

Why I'm doing:

What I'm doing:

Add more metrics to improve the observability of the system, mainly including the following parts:

  1. add TableMetrics to observe table-level's queries and loads so that we can find hot spots
  2. add IOMetrics to observe the disk io occupied by different components.
  3. refactor the metrics related to pipeline executor: DriverExecutor of different resource groups will share global metrics, and use core local value instead of atomic variables as much as possible to reduce competition.
  4. add more metrics to ScanExecutor and DriverExecutor.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.4
    • 3.3
    • 3.2
    • 3.1
    • 3.0

dirtysalt
dirtysalt previously approved these changes Dec 3, 2024
@silverbullet233 silverbullet233 changed the title [WIP][Enhancement] add more metrics [Enhancement] add more metrics Dec 3, 2024
@silverbullet233 silverbullet233 requested a review from a team as a code owner December 12, 2024 01:31
@silverbullet233 silverbullet233 force-pushed the add_metrics branch 2 times, most recently from e5959ba to 96dfa85 Compare December 12, 2024 11:35
murphyatwork
murphyatwork previously approved these changes Dec 13, 2024
@silverbullet233 silverbullet233 changed the title [Enhancement] add more metrics [Enhancement] add more metrics to help locate hotspot issues Dec 13, 2024
@silverbullet233 silverbullet233 enabled auto-merge (squash) December 13, 2024 02:54
ZiheLiu
ZiheLiu previously approved these changes Dec 13, 2024
kevincai
kevincai previously approved these changes Dec 24, 2024
Copy link
Contributor

@kevincai kevincai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, why disable the metrics under UT mode?

Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
Copy link

github-actions bot commented Feb 5, 2025

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Feb 5, 2025

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Feb 5, 2025

[BE Incremental Coverage Report]

fail : 167 / 210 (79.52%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/common/daemon.cpp 0 1 00.00% [171]
🔵 be/src/util/table_metrics.cpp 6 27 22.22% [34, 37, 38, 39, 40, 41, 43, 44, 45, 46, 48, 49, 50, 51, 52, 53, 55, 58, 59, 61, 62]
🔵 be/src/runtime/lake_tablets_channel.cpp 1 4 25.00% [306, 568, 569]
🔵 be/src/service/staros_worker.cpp 3 10 30.00% [84, 85, 86, 87, 104, 138, 139]
🔵 be/src/exec/pipeline/pipeline_driver_executor.cpp 12 14 85.71% [40, 76]
🔵 be/src/exec/pipeline/pipeline_metrics.cpp 21 24 87.50% [30, 33, 34]
🔵 be/src/io/io_profiler.cpp 15 17 88.24% [96, 165]
🔵 be/src/exec/pipeline/pipeline_driver_queue.cpp 9 10 90.00% [80]
🔵 be/src/util/table_metrics.h 24 26 92.31% [45, 58]
🔵 be/src/exec/workgroup/scan_executor.cpp 12 13 92.31% [101]
🔵 be/src/util/system_metrics.cpp 14 14 100.00% []
🔵 be/src/exec/workgroup/work_group.cpp 2 2 100.00% []
🔵 be/src/runtime/exec_env.cpp 2 2 100.00% []
🔵 be/src/storage/tablet.cpp 2 2 100.00% []
🔵 be/src/exec/pipeline/pipeline_driver.cpp 1 1 100.00% []
🔵 be/src/util/metrics.h 2 2 100.00% []
🔵 be/src/exec/pipeline/pipeline_metrics.h 5 5 100.00% []
🔵 be/src/exec/pipeline/scan/olap_chunk_source.cpp 4 4 100.00% []
🔵 be/src/exec/workgroup/pipeline_executor_set.cpp 9 9 100.00% []
🔵 be/src/exec/pipeline/pipeline_driver_poller.h 2 2 100.00% []
🔵 be/src/util/starrocks_metrics.h 3 3 100.00% []
🔵 be/src/exec/spill/log_block_manager.cpp 5 5 100.00% []
🔵 be/src/util/starrocks_metrics.cpp 2 2 100.00% []
🔵 be/src/util/system_metrics.h 2 2 100.00% []
🔵 be/src/exec/pipeline/pipeline_driver_poller.cpp 2 2 100.00% []
🔵 be/src/exec/pipeline/pipeline_driver_queue.h 3 3 100.00% []
🔵 be/src/runtime/local_tablets_channel.cpp 4 4 100.00% []

@silverbullet233
Copy link
Contributor Author

silverbullet233 commented Feb 6, 2025

just curious, why disable the metrics under UT mode?

@kevincai because the registration of table metrics depends on TableMetricsManager.
In the unit test, many codes are relatively independent, and many global variables will not be initialized.
we need to change a lot of test codes if we want it to work well. so, for simplicity, I just disabled it.

@@ -65,6 +65,7 @@
#include "util/compression/compression_utils.h"
#include "util/defer_op.h"
#include "util/stack_util.h"
#include "util/starrocks_metrics.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used?

be/src/service/staros_worker.cpp Show resolved Hide resolved
be/src/util/metrics.h Show resolved Hide resolved
@andyziye andyziye disabled auto-merge February 7, 2025 02:20
@andyziye andyziye merged commit 000cdd2 into StarRocks:main Feb 7, 2025
61 of 62 checks passed
Copy link

github-actions bot commented Feb 7, 2025

@Mergifyio backport branch-3.4

@github-actions github-actions bot removed the 3.4 label Feb 7, 2025
Copy link
Contributor

mergify bot commented Feb 7, 2025

backport branch-3.4

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Feb 7, 2025
Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
(cherry picked from commit 000cdd2)

# Conflicts:
#	be/src/exec/pipeline/pipeline_driver_poller.cpp
#	be/src/service/staros_worker.cpp
#	be/src/service/staros_worker.h
#	be/test/exec/pipeline/schedule/observer_test.cpp
silverbullet233 added a commit that referenced this pull request Feb 11, 2025
…#53490) (#55644)

Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
Co-authored-by: eyes_on_me <nopainnofame@sina.com>
Co-authored-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants