Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify tracking of task counts stats in TaskQueue #16423

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

kfaraz
Copy link
Contributor

@kfaraz kfaraz commented May 9, 2024

Description

The current model of tracking task counts for the TaskCountStatsMonitor is too limiting and verbose.

Changes

  • Remove deprecated methods from TaskCountStatsProvider
  • Rename CoordinatorRunStats to DruidRunStats as it is now being extensively used outside the coordinator.
  • Simplify collection of stats in TaskQueue.
    • Remove methods getRunningTaskCount, getWaitingTaskCount, etc.
    • Collect all stats in a single DruidRunStats instance which is reset upon invocation of getQueueStats()
  • Add dimension taskType to task count metrics

Classes to review

  • TaskQueue
  • TaskCountStatsProvider
  • TaskCountStatsMonitor
  • TaskMaster

Further work

  • There are several other count stats providers which may benefit from a similar cleanup. I will try to consolidate all of them in a follow up PR.

Design considerations

The few advantages of the previous design were that:

  • The metrics emitted by the monitor were clearly listed out in the monitor itself. This can still be retained by adding logic in the monitor to filter out the metrics contained in the DruidRunStats.
  • The individual methods in the interface made it necessary for the concrete implementation to provide those metrics. But with a plain getStats() method which returns a packaged DruidRunStats, it is possible for the concrete implementation to return any or no metrics inside the DruidRunStats. Still thinking of a clean way to enforce this. I suppose we could add logic in the monitor to ensure that all required metrics have been sent by the implementation.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@kfaraz kfaraz requested a review from abhishekrb19 May 9, 2024 14:50
Copy link

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Jul 22, 2024
@kfaraz kfaraz removed the stale label Jul 23, 2024
Copy link

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Sep 22, 2024
@kfaraz kfaraz removed the stale label Sep 23, 2024
Copy link

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Nov 23, 2024
@kfaraz
Copy link
Contributor Author

kfaraz commented Dec 12, 2024

I don't intend to merge this PR as it has too many conflicts and the approach might need to be changed as well.
But keeping it open as there are some changes here that I might be able to salvage.

@kfaraz kfaraz removed the stale label Dec 12, 2024
@kfaraz kfaraz marked this pull request as draft December 12, 2024 13:07
Copy link

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Feb 11, 2025
@kfaraz kfaraz removed the stale label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant