Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indexer level task metrics to provide more visibility in the task distribution #15991

Merged
merged 9 commits into from
Mar 21, 2024

Conversation

rbankar7
Copy link
Contributor

@rbankar7 rbankar7 commented Feb 28, 2024

Adds indexer level task metrics

Description

This PR adds indexer level task metrics-

"worker/task/running/count"
"worker/task/assigned/count"
"worker/task/completed/count"

These metrics will provide more visibility into the tasks distribution across indexers (We often see a task skew issue across indexers and with this issue it would be easier to catch the imbalance)

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your first PR, @rbankar7 ! I have left some suggestions.

@Before
public void setUp()
{
task = createMock(Task.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use NoopTask instead of creating a mock.

taskDetails = createMock(WorkerTaskManager.TaskDetails.class);
EasyMock.expect(taskDetails.getDataSource()).andReturn("dummy_DS2");
EasyMock.replay(taskDetails);
taskAnnouncement = createMock(TaskAnnouncement.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to use concrete instances of TaskAnnouncement and TaskDetails (if needed) instead of mocks.

EasyMock.replay(taskAnnouncement);
taskRunner = createMock(TaskRunner.class);
taskConfig = createMock(TaskConfig.class);
overlordClient = createMock(OverlordClient.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use NoopOverlordClient instead.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, @rbankar7 .
I have left some comments that need to be addressed.
Please update the docs to add info about the new metrics.

@rbankar7
Copy link
Contributor Author

rbankar7 commented Mar 7, 2024

Thanks for the update, @rbankar7 . I have left some comments that need to be addressed. Please update the docs to add info about the new metrics.

Thanks @kfaraz for the review
I have addressed the comments and updated the doc with the new metrics

@rbankar7 rbankar7 requested a review from kfaraz March 7, 2024 09:33
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the new metric and docs, @rbankar7 !

public interface IndexerTaskCountStatsProvider
{
/**
* The number of running tasks on an indexer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion for the javadocs but this can be addressed later.

Suggested change
* The number of running tasks on an indexer
* Map from datasource name to the number of running tasks on the Indexer.

@kfaraz
Copy link
Contributor

kfaraz commented Mar 8, 2024

@rbankar7 , the checkstyle and WorkerTaskCountStatsMonitorTest are failing. Please take a look.

@kfaraz
Copy link
Contributor

kfaraz commented Mar 9, 2024

@rbankar7 , there are more checkstyle failures. To ensure that you catch all of them, please run the following on your local machine before you commit:

mvn checkstyle:checkstyle

@rbankar7
Copy link
Contributor Author

@kfaraz I fixed the issues and verified that there are no checkstyle errors
Thanks!

@kfaraz
Copy link
Contributor

kfaraz commented Mar 15, 2024

@rbankar7 , I ran the workflow. The testWorkerTaskCountStatsMonitorTest.testMonitorNotMiddleManager seems to be failing.

Error:  Errors: 
Error:  org.apache.druid.server.metrics.WorkerTaskCountStatsMonitorTest.testMonitorNotMiddleManager
Error:    Run 1: WorkerTaskCountStatsMonitorTest.testMonitorNotMiddleManager:288 » Configuration Guice configuration errors:

1) No implementation for org.apache.druid.server.metrics.IndexerTaskCountStatsProvider was bound.
  while locating org.apache.druid.server.metrics.IndexerTaskCountStatsProvider

The code coverage also seems insufficient for WorkerTaskManager.
Screenshot 2024-03-15 at 4 55 00 PM

@rbankar7
Copy link
Contributor Author

rbankar7 commented Mar 15, 2024

I ran the workflow. The testWorkerTaskCountStatsMonitorTest.testMonitorNotMiddleManager seems to be failing.

@kfaraz will fix it
Thanks

…nitorIndexer) and add tests for getting worker stats
@rbankar7
Copy link
Contributor Author

@kfaraz looks like build is passing now
could you please take a look
Thanks!

@kfaraz kfaraz merged commit 3d8b0ff into apache:master Mar 21, 2024
85 checks passed
@kfaraz
Copy link
Contributor

kfaraz commented Mar 21, 2024

Thanks a lot for the fix, @rbankar7 !
Congrats on your first commit to Apache Druid! 🎉

rbankar7 added a commit to confluentinc/druid that referenced this pull request Mar 22, 2024
… distribution (apache#15991)

Changes:

Add the following indexer level task metrics:
- `worker/task/running/count`
- `worker/task/assigned/count`
- `worker/task/completed/count`

These metrics will provide more visibility into the tasks distribution across indexers
(We often see a task skew issue across indexers and with this issue it would be easier
to catch the imbalance)
rbankar7 added a commit to confluentinc/druid that referenced this pull request Mar 22, 2024
… distribution (apache#15991)

Changes:

Add the following indexer level task metrics:
- `worker/task/running/count`
- `worker/task/assigned/count`
- `worker/task/completed/count`

These metrics will provide more visibility into the tasks distribution across indexers
(We often see a task skew issue across indexers and with this issue it would be easier
to catch the imbalance)
rbankar7 added a commit to confluentinc/druid that referenced this pull request Mar 26, 2024
…re visibility (#191)

* Add indexer level task metrics to provide more visibility in the task distribution (apache#15991)

Changes:

Add the following indexer level task metrics:
- `worker/task/running/count`
- `worker/task/assigned/count`
- `worker/task/completed/count`

These metrics will provide more visibility into the tasks distribution across indexers
(We often see a task skew issue across indexers and with this issue it would be easier
to catch the imbalance)
rbankar7 added a commit to confluentinc/druid that referenced this pull request Apr 2, 2024
… distribution (apache#15991) (#192)

Changes:

Add the following indexer level task metrics:
- `worker/task/running/count`
- `worker/task/assigned/count`
- `worker/task/completed/count`

These metrics will provide more visibility into the tasks distribution across indexers
(We often see a task skew issue across indexers and with this issue it would be easier
to catch the imbalance)
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants