-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] telemetry for when memory monitor is enabled #29152
Conversation
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
need fix CI though |
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update this from usage_lib.py? I think we can do this way;
- Generate a method that gets all internal config from
GetInternalConfigRequest
. - When the dashboard first starts, we get internal config and update only relevant fields only once (in this case memory monitor config)
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Hmm I think it is not going to work if you do ray start --head. Can you add a test with this case? |
I'd love to talk in person to talk about how I'd approach this |
sounds good, should be back on this very soon |
…metrymm Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Fixed in #30472 |
Signed-off-by: Clarence Ng clarence.wyng@gmail.com
Why are these changes needed?
Add telemetry to record at worker init time whether the ray cluster is using memory monitor
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.Manual test per https://docs.google.com/document/d/1XSWzx9ZatnlttsurCXzvV4vVsb2g1KxJ2T4zpf33bI8/edit
cat /tmp/ray/session_latest/usage_stats.json
{"usage_stats": {"ray_version": "3.0.0.dev0", "python_version": "3.7.7", "schema_version": "0.1", "source": "OSS", "session_id": "330fafc5-718b-48c5-92d0-dbe8dbc784a5", "git_commit": "{{RAY_COMMIT_SHA}}", "os": "linux", "collect_timestamp_ms": 1665116028212, "session_start_timestamp_ms": 1665115658078, "cloud_provider": null, "min_workers": null, "max_workers": null, "head_node_instance_type": null, "worker_node_instance_types": null, "total_num_cpus": 12, "total_num_gpus": null, "total_memory_gb": 10.672229005023837, "total_object_store_memory_gb": 5.336114501580596, "library_usages": ["dataset", "train", "tune"], "total_success": 302, "total_failed": 0, "seq_number": 302, "extra_usage_tags": {"gcs_storage": "memory", "memory_monitor_enabled": "false"}, "total_num_nodes": 1, "total_num_running_jobs": 1}, "success": true, "error": null}