Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the Ray integration #15441

Merged
merged 1 commit into from
Sep 12, 2023
Merged

Conversation

FlorentClarret
Copy link
Member

@FlorentClarret FlorentClarret commented Aug 1, 2023

What does this PR do?

Add the Ray integration

Motivation

2023Q3 OKR

Additional Notes

I will add the documentation and the assets in follow-up PRs to keep this PR as small as possible to focus on the implementation.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have changelog/ and integration/ labels attached
  • If the PR doesn't need to be tested during QA, please add a qa/skip-qa label.

@github-actions
Copy link

github-actions bot commented Aug 1, 2023

Test Results

2 files  2 suites   4m 47s ⏱️
7 tests 7 ✔️ 0 💤 0
9 runs  7 ✔️ 2 💤 0

Results for commit 4cda273.

♻️ This comment has been updated with latest results.

@FlorentClarret FlorentClarret force-pushed the florentclarret/ray/new_integration branch from ec7b399 to 45d0ef9 Compare August 1, 2023 08:53
@codecov
Copy link

codecov bot commented Aug 1, 2023

Codecov Report

Merging #15441 (4cda273) into master (713d6b4) will increase coverage by 0.12%.
Report is 20 commits behind head on master.
The diff coverage is 96.45%.

Flag Coverage Δ
activemq ?
cassandra ?
hive ?
hivemq ?
hudi ?
ignite ?
jboss_wildfly ?
kafka ?
presto ?
ray 96.45% <96.45%> (?)
solr ?
tomcat ?
weblogic ?

Flags with carried forward coverage won't be shown. Click here to find out more.

@github-actions
Copy link

github-actions bot commented Aug 3, 2023

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

1 similar comment
@github-actions
Copy link

github-actions bot commented Aug 4, 2023

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

@FlorentClarret FlorentClarret force-pushed the florentclarret/ray/new_integration branch 2 times, most recently from 721223f to 3cfefce Compare August 14, 2023 14:05
@FlorentClarret FlorentClarret force-pushed the florentclarret/ray/new_integration branch 5 times, most recently from ce5119f to fb1f75c Compare August 18, 2023 12:59
@FlorentClarret FlorentClarret force-pushed the florentclarret/ray/new_integration branch 2 times, most recently from 71fd9df to 59decb6 Compare September 4, 2023 12:11
@FlorentClarret FlorentClarret marked this pull request as ready for review September 4, 2023 12:12
@FlorentClarret FlorentClarret requested review from a team as code owners September 4, 2023 12:12
@FlorentClarret FlorentClarret force-pushed the florentclarret/ray/new_integration branch 2 times, most recently from 5397704 to c94210f Compare September 5, 2023 09:48
@github-actions
Copy link

github-actions bot commented Sep 5, 2023

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

Copy link
Contributor

@estherk15 estherk15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FlorentClarret This does not need Documentation review yet correct?

ray/README.md Show resolved Hide resolved
ray/CHANGELOG.md Show resolved Hide resolved
'ray_internal_num_infeasible_scheduling_classes': 'internal_num.infeasible_scheduling_classes',
'ray_internal_num_processes_skipped_job_mismatch': 'internal_num.processes.skipped.job_mismatch',
# There is a typo in their metric name, adding another entry to be ready as soon as they fix it
'ray_internal_num_processes_skipped_runtime_enviornment_mismatch': 'internal_num.processes.skipped.runtime_environment_mismatch', # noqa: E501
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is there a doc for these metrics? I know some are system metrics but not all. I was trying to find where the misspelled ray_internal_num_processes_skipped_runtime_enviornment_mismatch is documented so we can check that doc in the future to know it has been fixed.

Copy link
Member Author

@FlorentClarret FlorentClarret Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.ray.io/en/latest/ray-observability/reference/system-metrics.html#system-metrics

These metrics are all system metrics. "Application metrics" are manually added by the customer in their own code, so they would have to use the extras_metrics option

This particular metric is not documented and this is why I added both versions so we are already ready when a patch will be out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They aren't all documented on that page though? I do not see ray_internal_num_infeasible_scheduling_classes or ray_grpc_server_req_process_time_ms for example, so I wasn't sure if there's a separate page for some of these metrics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the metrics are not documented. I had to take a look at the code to get them 😭

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sad to hear 😢

ray/manifest.json Outdated Show resolved Hide resolved
ray/assets/service_checks.json Outdated Show resolved Hide resolved
@FlorentClarret
Copy link
Member Author

@FlorentClarret This does not need Documentation review yet correct?

@estherk15 Yes, I will open a PR dedicated to the documentation update to make it easier on our side if this sounds good to you.

@FlorentClarret FlorentClarret force-pushed the florentclarret/ray/new_integration branch 2 times, most recently from bd712ab to 7fc36d6 Compare September 6, 2023 07:59
Copy link
Contributor

@ofek ofek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ray/tests/docker/serve_tasks/__init__.py needs a license header

🥇

[build-system]
requires = [
"hatchling>=0.13.0",
"setuptools>=66; python_version > '3.0'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only required for editable installations on Python 2

Suggested change
"setuptools>=66; python_version > '3.0'",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we can drop it from the templates? #15766

ray/tests/common.py Outdated Show resolved Hide resolved
ray/tests/conftest.py Outdated Show resolved Hide resolved
Comment on lines +8 to +11
command:
- /bin/bash
- -c
- |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha nice I never thought to use multi-line strings to send to bash before

ray/manifest.json Show resolved Hide resolved
ray/tests/test_e2e.py Outdated Show resolved Hide resolved
ray/tests/test_unit.py Outdated Show resolved Hide resolved
ray/tests/test_e2e.py Outdated Show resolved Hide resolved
ray/tests/common.py Show resolved Hide resolved
@FlorentClarret FlorentClarret force-pushed the florentclarret/ray/new_integration branch 3 times, most recently from 76a2cb8 to bdf9b76 Compare September 7, 2023 11:06
@FlorentClarret FlorentClarret force-pushed the florentclarret/ray/new_integration branch from bdf9b76 to 4cda273 Compare September 8, 2023 09:36
Copy link
Contributor

@yzhan289 yzhan289 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@yzhan289 yzhan289 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM again!

@FlorentClarret FlorentClarret merged commit 8749d34 into master Sep 12, 2023
@FlorentClarret FlorentClarret deleted the florentclarret/ray/new_integration branch September 12, 2023 11:13
github-actions bot pushed a commit that referenced this pull request Sep 12, 2023
@bgoldberg122
Copy link
Contributor

bgoldberg122 commented Sep 21, 2023

@FlorentClarret Not sure if you're aware, but it looks like the metrics metadata sync task is failing from this PR: see logs.
I think you need to change percentage to percent in the unit_name for ray.node.gpus_utilization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants