-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/databricksreceiver] add skeleton and initial README #20891
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please rebase and run make gendependabot
to fix the dependabot conflict.
[cluster.id, spark.app.id, pipeline.id, pipeline.name] | ||
databricks.spark.block_manager.memory.off_heap.max: | ||
enabled: true | ||
description: n/a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we keep it n/a
going forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, not even Spark documents these metrics. It would be nice to have a description here but I'm not sure where to get this information.
unit: "{jobs}" | ||
gauge: | ||
value_type: int | ||
databricks.jobs.schedule.status: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this metric just an aggregation of databricks.tasks.schedule.status
? If that's the case, I have a couple of Qs:
- Do we need to expose them both? If so, we should make one of them optional, otherwise, we expose duplicated data (just in different forms). Any backend (or collector) can do the aggregation.
- If the aggregation method is sum, they should be sums, not gauges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The status comes from the job, and a job has multiple tasks, so the job status is simply propagated to each of its tasks. I suppose the task status metric could be made optional or removed altogether. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lookin good, only blocker from me would maybe be deciding on enabling all metrics by default since that's customer facing.
MetricsBuilderConfig
comment is additive and can be decided later if we want it at all.
Sure, we need a rebase and we can get this in. The receiver needs more codeowners and probably someone from Databricks to help out if anyone is interested. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
@MovieStoreGuy rebased, ready to go in |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Any updates here? @atoulme |
Not much - we still want to do it, just no time. Interested? |
Description: Adds DataBricks receiver skeleton.
Link to tracking Issue: #18724
Testing: Unit tests added.
Documentation: Initial README added.