[core] telemetry for when memory monitor is enabled #29152

clarng · 2022-10-07T04:14:33Z

Signed-off-by: Clarence Ng clarence.wyng@gmail.com

Why are these changes needed?

Add telemetry to record at worker init time whether the ray cluster is using memory monitor

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Manual test per https://docs.google.com/document/d/1XSWzx9ZatnlttsurCXzvV4vVsb2g1KxJ2T4zpf33bI8/edit

cat /tmp/ray/session_latest/usage_stats.json

{"usage_stats": {"ray_version": "3.0.0.dev0", "python_version": "3.7.7", "schema_version": "0.1", "source": "OSS", "session_id": "330fafc5-718b-48c5-92d0-dbe8dbc784a5", "git_commit": "{{RAY_COMMIT_SHA}}", "os": "linux", "collect_timestamp_ms": 1665116028212, "session_start_timestamp_ms": 1665115658078, "cloud_provider": null, "min_workers": null, "max_workers": null, "head_node_instance_type": null, "worker_node_instance_types": null, "total_num_cpus": 12, "total_num_gpus": null, "total_memory_gb": 10.672229005023837, "total_object_store_memory_gb": 5.336114501580596, "library_usages": ["dataset", "train", "tune"], "total_success": 302, "total_failed": 0, "seq_number": 302, "extra_usage_tags": {"gcs_storage": "memory", "memory_monitor_enabled": "false"}, "total_num_nodes": 1, "total_num_running_jobs": 1}, "success": true, "error": null}

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

scv119 · 2022-10-07T04:40:41Z

need fix CI though

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

rkooo567

Can we update this from usage_lib.py? I think we can do this way;

Generate a method that gets all internal config from GetInternalConfigRequest.
When the dashboard first starts, we get internal config and update only relevant fields only once (in this case memory monitor config)

python/ray/_private/worker.py

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

rkooo567 · 2022-10-13T08:46:12Z

Hmm I think it is not going to work if you do ray start --head. Can you add a test with this case?

rkooo567 · 2022-10-13T08:46:33Z

I'd love to talk in person to talk about how I'd approach this

clarng · 2022-10-13T17:15:15Z

I'd love to talk in person to talk about how I'd approach this

sounds good, should be back on this very soon

…metrymm Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

clarng · 2022-11-23T01:42:37Z

Fixed in #30472

telemetry for when memory monitor is enabled

a031585

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

clarng requested review from scv119 and rkooo567 October 7, 2022 04:15

clarng assigned scv119 and rkooo567 Oct 7, 2022

clarng marked this pull request as ready for review October 7, 2022 04:15

clarng requested review from ericl, richardliaw and jjyao as code owners October 7, 2022 04:15

scv119 approved these changes Oct 7, 2022

View reviewed changes

clarng added 3 commits October 6, 2022 21:54

[core] telemetry for when memory monitor is enabled

c8f5154

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

[core] telemetry for when memory monitor is enabled

ac2f450

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

[core] telemetry for when memory monitor is enabled

a86bbad

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

rkooo567 assigned jjyao Oct 7, 2022

rkooo567 requested changes Oct 7, 2022

View reviewed changes

jjyao reviewed Oct 7, 2022

View reviewed changes

python/ray/_private/worker.py Outdated Show resolved Hide resolved

python/ray/_private/worker.py Outdated Show resolved Hide resolved

clarng added 4 commits October 7, 2022 10:18

[core] telemetry for when memory monitor is enabled

602d532

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

[core] telemetry for when memory monitor is enabled

5da5eb8

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

[core] telemetry for when memory monitor is enabled

9c883a5

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' into telemetrymm

a1e3f76

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 13, 2022

Merge branch 'master' of https://github.com/ray-project/ray into tele…

58ff5ec

…metrymm Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

clarng closed this Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] telemetry for when memory monitor is enabled #29152

[core] telemetry for when memory monitor is enabled #29152

clarng commented Oct 7, 2022 •

edited

Loading

scv119 commented Oct 7, 2022

rkooo567 left a comment

rkooo567 commented Oct 13, 2022

rkooo567 commented Oct 13, 2022

clarng commented Oct 13, 2022

clarng commented Nov 23, 2022

[core] telemetry for when memory monitor is enabled #29152

[core] telemetry for when memory monitor is enabled #29152

Conversation

clarng commented Oct 7, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

scv119 commented Oct 7, 2022

rkooo567 left a comment

Choose a reason for hiding this comment

rkooo567 commented Oct 13, 2022

rkooo567 commented Oct 13, 2022

clarng commented Oct 13, 2022

clarng commented Nov 23, 2022

clarng commented Oct 7, 2022 •

edited

Loading