Skip to content

Comments

chore(deps): update dependency ray to v2.54.0 [security]#6633

Open
renovate[bot] wants to merge 1 commit intodevelopfrom
renovate/pypi-ray-vulnerability
Open

chore(deps): update dependency ray to v2.54.0 [security]#6633
renovate[bot] wants to merge 1 commit intodevelopfrom
renovate/pypi-ray-vulnerability

Conversation

@renovate
Copy link
Contributor

@renovate renovate bot commented Feb 20, 2026

ℹ️ Note

This PR body was truncated due to platform limits.

This PR contains the following updates:

Package Change Age Confidence
ray 2.53.02.54.0 age confidence

GitHub Vulnerability Alerts

CVE-2026-27482

Summary

Ray’s dashboard HTTP server blocks browser-origin POST/PUT but does not cover DELETE, and key DELETE endpoints are unauthenticated by default. If the dashboard/agent is reachable (e.g., --dashboard-host=0.0.0.0), a web page via DNS rebinding or same-network access can
issue DELETE requests that shut down Serve or delete jobs without user interaction. This is a drive-by availability impact.

Details

  • Middleware: python/ray/dashboard/http_server_head.py#get_browsers_no_post_put_middleware only checks POST/PUT via is_browser_request (UA/Origin/Sec-Fetch heuristics). DELETE is not gated.
  • Endpoints lacking browser protection/auth by default:
    • python/ray/dashboard/modules/serve/serve_head.py: @​routes.delete("/api/serve/applications/") calls serve.shutdown().
    • python/ray/dashboard/modules/job/job_head.py: @​routes.delete("/api/jobs/{job_or_submission_id}").
    • python/ray/dashboard/modules/job/job_agent.py: @​routes.delete("/api/job_agent/jobs/{job_or_submission_id}") (not wrapped with deny_browser_requests either).
  • Dashboard token auth is optional and off by default; binding to 0.0.0.0 is common for remote access.

PoC

Prereqs: dashboard reachable (e.g., ray start --head --dashboard-host=0.0.0.0), no token auth.

  1. Start Serve (or have jobs present).
  2. From any browser-reachable origin (DNS rebinding or same-LAN page), issue a DELETE fetch:
fetch("http://<dashboard-host>:8265/api/serve/applications/", {
    method: "DELETE",
    headers: { "User-Agent": "Mozilla/5.0" }  // browsers set this automatically
  });

Result: Serve shuts down.
3) Similarly, delete jobs:

fetch("http://<dashboard-host>:8265/api/jobs/<job_or_submission_id>", { method: "DELETE" });
fetch("http://<dashboard-agent>:52365/api/job_agent/jobs/<job_or_submission_id>", { method: "DELETE" });

Browsers will send the Mozilla UA and Origin/Sec-Fetch headers, but DELETE is not blocked by the middleware, so the requests succeed.

Impact

  • Availability loss: Serve shutdown; job deletion. Triggerable via drive-by browser requests if the dashboard/agent ports are reachable and auth is disabled (default).
  • No code execution from this vector, but breaks isolation/trust assumptions for “developer-only” endpoints.

Fix

The fix for this vulnerability is to update to Ray 2.54.0 or higher.

Fix PR: https://github.com/ray-project/ray/pull/60526


Release Notes

ray-project/ray (ray)

v2.54.0

Compare Source

Ray Data

🎉 New Features

  • Add checkpointing support to Ray Data (#​59409)
  • Compute Expressions: list operations (#​59346), fixed-size arrays (#​58741), string padding (#​59552), logarithmic (#​59549), trigonometric (#​59712), arithmetic (#​59678), and rounding (#​59295)
  • Add sql_params support to read_sql (#​60030)
  • Add AsList aggregation (#​59920)
  • Support CountDistinct aggregate (#​59030)
  • Add credential provider abstraction for Databricks UC datasource (#​60457)
  • Support callable classes for UDFExpr (#​56725)
  • Add autoscaler metrics to Data Dashboard (#​60472)
  • Add optional filesystem parameter to download expression (#​60677)
  • Allow specifying partitioning style or flavor in write_parquet() (#​59102)
  • New cluster autoscaler enabled by default (#​60474)

💫 Enhancements

  • Improve numerical stability in scalers by handling near-zero values (#​60488)
  • Export dataset operator output schema to event logger (#​60086)
  • Iceberg: add retry policy for Storage + Catalog writes (#​60620)
  • Iceberg: remove calls to Catalog Table in write tasks (#​60476)
  • Expose logical operators and rules via package exports (#​60297, #​60296)
  • Demote Sort from requiring preserve_order (#​60555)
  • Improve appearance of repr(dataset) (#​59631)
  • Allow configuring DefaultClusterAutoscalerV2 thresholds via env vars (#​60133)
  • Use Arrow IPC for Arrow Schema serialization/deserialization (#​60195)
  • Store _source_paths in object store to prevent excessive spilling during read task serialization (#​59999)
  • Add more shuffle fusion rules (#​59985)
  • Enable and tune DownstreamCapacityBackpressurePolicy (#​59753)
  • Enable concurrency cap backpressure with tuning (#​59392)
  • Set default actor pool scale up threshold to 1.75 (#​59512)
  • Don't downscale actors if the operator hasn't received any inputs (#​59883)
  • Don't reserve GPU budget for non-GPU tasks (#​59789)
  • Only return selected data columns in hive-partitioned Parquet files (#​60236)
  • Ordered + FIFO bundle queue (#​60228)
  • Add node_id, pid, attempt number for hanging tasks (#​59793)
  • Revise resource allocator task scheduling to factor in pending task outputs (#​60639)
  • Track block serialization time (#​60574)
  • Use metrics from OpRuntimeMetrics for progress (#​60304)
  • Tabular form for streaming executor op metrics (#​59774)
  • Info-log cluster scale-up decisions (#​60357)
  • Use plain mode instead of grid mode for OpMetrics logging (#​59907)
  • Progress reporting refactors (#​59350, #​59629, #​59880)
  • Remove deprecated TENSOR_COLUMN_NAME constant (#​60573)
  • Remove meta_provider parameter (#​60379)
  • Decouple Ray Train from Ray Data by removing top-level ray.data imports (#​60292)
  • Move extension types to ray.data (#​59420)
  • Skip upscaling validation warning for fixed-size actor pools (#​60569)
  • Make StatefulShuffleAggregation.finalize allow incremental streaming (#​59972)
  • Revisit OutputSplitter semantics to avoid unnecessary buffer accumulation (#​60237)
  • Update to PyArrow 23 (#​60739, #​59489)
  • Add BackpressurePolicy to streaming executor progress bar (#​59637)
  • Support Arrow-based transformations for preprocessors (#​59810)
  • StandardScaler preprocessor with Arrow format (#​59906)
  • OneHotEncoder with Arrow format (#​59890)

🔨 Fixes

  • Fuse MapBatches even if they modify the row count (#​60756)
  • Don't push limit past map_batches by default (#​60448)
  • Fix wrong type hint of other dataset in zip and union (#​60653)
  • Fix ActorPoolMapOperator to guarantee dispatch of all given inputs (#​60763)
  • Fix ArrowInvalid error when backfilling missing fields from map tasks (#​60643)
  • Fix attribute error in UnionOperator.clear_internal_output_queue (#​60538)
  • Fix DefaultClusterAutoscalerV2 raising KeyError: 'CPU' (#​60208)
  • Fix ReorderingBundleQueue handling of empty output sequences (#​60470)
  • Fix task completion time without backpressure grafana panel metric name (#​60481)
  • Fix Union operator blocking when preserve_order is set (#​59922)
  • Fix autoscaler requesting empty resources instead of previous allocation when not scaling up (#​60321)
  • Fix autoscaler not respecting user-configured resource limits (#​60283)
  • Fix DefaultAutoscalerV2 not scaling nodes from zero (#​59896)
  • Fix Iceberg warning message (#​60044)
  • Fix Parquet datasource path column support (#​60046)
  • Fix ProgressBar with use_ray_tqdm (#​59996)
  • Fix stale stats on refit for preprocessors (#​60031)
  • Fix StreamingRepartition hang with empty upstream results (#​59848)
  • Fix operator fusion bug to preserve UDF modifying row count (#​59513)
  • Fix AutoscalingCoordinator double-allocating resources for multiple datasets (#​59740)
  • Fix DownstreamCapacityBackpressurePolicy issues (#​59990)
  • Fix AutoscalingCoordinator crash when requesting 0 GPUs on CPU-only cluster (#​59514)
  • Fix TensorArray to Arrow tensor conversion (#​59449)
  • Fix resource allocator not respecting max resource requirement (#​59412)
  • Fix GPU autoscaling when max_actors is set (#​59632)
  • Fix checkpoint filter PyArrow zero-copy conversion error (#​59839)
  • Restore class aliases to fix deserialization of existing datasets (#​59828, #​59818)
  • Fix DataContext deserialization issue with StatsActor (#​59471)

📖 Documentation

  • Sort references in "Loading data and Saving data" pages (#​60084)
  • Fix inconsistent heading levels in "How to write tests" guide (#​60706)
  • Clarify resource_limits refers to logical resources (#​60109)
  • Update read_lance doc (#​59673)
  • Fix broken link in read_unity_catalog docstring (#​59745)
  • Fix bug in docs for enable_true_multi_threading (#​60515)
  • Add more education around transformations (#​59415)

Ray Serve

🎉 New Features

  • Queue-based autoscaling for TaskConsumer deployments (phase 1). Introduces a QueueMonitor actor that queries message brokers (Redis, RabbitMQ) for queue length, enabling TaskConsumer scaling based on pending tasks rather than HTTP load. (#​59430)
  • Default autoscaling parameters for custom policies. New apply_autoscaling_config decorator allows custom autoscaling policies to automatically benefit from Ray Serve's standard parameters (delays, scaling factors, bounds) without reimplementation. (#​58857)
  • label_selector and bundle_label_selector in Serve deployments. Deployments can now specify node label selectors for scheduling and bundle-level label selectors for placement groups, useful for targeting specific hardware (e.g., TPU topologies). (#​57694)
  • Deployment-level autoscaling observability. The controller now emits a structured JSON serve_autoscaling_snapshot log per autoscaling-enabled deployment each control-loop tick, with an event summarizer that reduces duplicate logs. (#​56225)
  • Batching with multiplexing support. Batching now guarantees each batch contains requests for the same multiplexed model, enabling correct multiplexed model serving with @serve.batch. (#​59334)

💫 Enhancements

  • Replica routing data structure optimizations. O(1) pending-request lookups, cached replica lists, lazy cleanup, optimized retry insertion, and metrics throttling yield significant routing performance improvements. (#​60139)
  • New operational metrics suite. Added long-poll metrics, replica lifecycle metrics, app/deployment status metrics, proxy health and request routing delay metrics, event loop utilization metrics, and controller health metrics — greatly improving monitoring and debugging capabilities. (#​59246, #​59235, #​59244, #​59238, #​59535, #​60473)
  • Autoscaling config validation. lookback_period_s must now be greater than metrics_interval_s, preventing silent misconfigurations. (#​59456)
  • Cross-version root_path support for uvicorn. root_path now works correctly across all uvicorn versions, including >=0.26.0 which changed how root_path is processed. (#​57555)
  • Preserve user-set gRPC status codes. When deployments raise exceptions after setting a gRPC status code on the context, that code is now correctly propagated to the client instead of being overwritten with INTERNAL. Error messages are truncated to 4 KB to respect HTTP/2 trailer limits. (#​60482)
  • Replica ThreadPoolExecutor capped to num_cpus. The user-code event loop's default ThreadPoolExecutor is now limited to the deployment's num_cpus, preventing oversubscription when using asyncio.to_thread. (#​60271)
  • Generic actor registration API for shutdown cleanup. Deployments can register auxiliary actors (e.g., PrefixTreeActor) with the controller for automatic cleanup on serve.shutdown(), eliminating cross-library import dependencies. (#​60067)
  • Deployment config logging in controller. Deployment configurations are now logged in the controller for easier debugging and auditability. (#​59222, #​59501)
  • Pydantic v1 deprecation warning. A FutureWarning is now emitted at ray.init() when Pydantic v1 is detected, as support will be removed in Ray 2.56. (#​59703)

🔨 Fixes

  • Fixed tracing signature mismatch across processes. Resolved TypeError: got an unexpected keyword argument _ray_trace_ctx when calling actors from a different process than the one that created them (e.g., serve start + dashboard interaction). (#​59634)
  • Fixed ingress deployment name collision. Ingress deployment name was incorrectly modified when a child deployment shared the same name, causing routing failures. (#​59577)
  • Fixed downstream deployment over-provisioning. Downstream deployments no longer over-provision replicas when receiving DeploymentResponse objects. (#​60747)
  • Fixed replicas hanging forever during draining. Replicas no longer hang indefinitely when requests are stuck during the draining phase. (#​60788)
  • Fixed TaskProcessorAdapter shutdown during rolling updates. Removed shutdown() from __del__, which was broadcasting a kill signal to all Celery workers instead of just the local one, breaking rolling updates. (#​59713)
  • Fixed Windows test failures. Resolved tracing file handle cleanup on Windows, skipped incompatible gRPC and tracing tests on Windows. (#​60078, #​60356, #​60393, #​59771)
  • Fixed flaky tests. Addressed gauge throttling race in test_router_queue_len_metric, ensured proxy replica queue cache is populated before GCS failure tests, and added metrics server readiness checks. (#​60333, #​60466, #​60468)
  • Fixed distilbert test segfault. Worked around a pyarrow/jemalloc crash triggered by specific import ordering of FastAPI, torch, and TensorFlow. (#​60478)

📖 Documentation

  • Improved autoscaling documentation. Clarified the relationship between delays, metric push intervals, and the autoscaling control loop. (#​59475)
  • New example: video analysis inference. End-to-end notebook demonstrating a Serve application for scene change detection, - tagging, and video description. (#​59859)
  • New examples: model multiplexing and model composition. Published workload-based examples for forecasting with model multiplexing and recommendation systems with model composition. (#​59166)
  • Model registry integration guide. Added documentation for integrating Serve with model registries (e.g., MLflow). (#​59080)
  • Fixed broken documentation links. Resolved 404 errors for async inference, MLflow registry example, and LLM code examples. (#​59917, #​60071, #​59520, #​59521, #​60181)
  • Fixed monitoring docs. Corrected target replicas metric emission to enable time-series comparison with actual replicas. (#​59571)
  • Async inference template. Added an end-to-end template for building asynchronous inference applications with Ray Serve. (#​58393, #​59926)

🏗 Architecture refactoring

  • Environment variable cleanup (5-part series). Removed deprecated and redundant env vars (RAY_SERVE_DEFAULT_HTTP_HOST, RAY_SERVE_DEFAULT_HTTP_PORT, RAY_SERVE_DEFAULT_GRPC_PORT, RAY_SERVE_HTTP_KEEP_ALIVE_TIMEOUT_S, RAY_SERVE_REQUEST_PROCESSING_TIMEOUT_S, RAY_SERVE_ENABLE_JSON_LOGGING, RAY_SERVE_ALWAYS_RUN_PROXY_ON_HEAD_NODE), cleaned up legacy constant fallbacks, and added documentation for previously undocumented env vars (e.g., RAY_SERVE_CONTROLLER_MAX_CONCURRENCY, RAY_SERVE_ROOT_URL, proxy health check settings, and fault tolerance params). Users relying on removed env vars should migrate to the Serve config API (http_options, grpc_options, LoggingConfig). (#​59470, #​59619, #​59647, #​59963, #​60093)

Ray Train

🎉 New Features

  • Add TPU multi-slice support to JaxTrainer (#​58629)
  • Update async validation API (#​59428)
  • Add a CallbackManager and guardrail some callback hooks (#​60117)
  • Add inter-execution file shuffling for deterministic multi-epoch training (#​59528)
  • Resume validations on driver restoration (#​59270)

💫 Enhancements

  • Pass ray remote args to validation task (#​60203)
  • Deprecate Predictor API (#​60305)
  • Increase worker group start default timeout to 60s (#​60376)
  • Unify PlacementGroup and SlicePlacementGroup interface in WorkerGroup (#​60116)
  • Cleanup zombie RayTrainWorker actors (#​59872)
  • Add usage telemetry for checkpointing and validation (#​59490)
  • Validate that validation is called with a checkpoint (#​60548)
  • Replace pg.ready() with pg.wait() in worker group (#​60568)
  • Rename DatasetsSetupCallback to DatasetsCallback (#​59423)
  • Update "Checkpoint Report Time" metric title to "Cumulative Checkpoint Report Time" (#​58470)
  • Add training failed error back to failure policy log (#​59957)
  • Decouple Ray Train from Ray Data by removing top-level imports (#​60292)

🔨 Fixes

  • Add try-except for pg.wait() (#​60743)
  • TrainController reraises AsyncioActorExit (#​59461)

📖 Documentation

  • Add a JaxTrainer template (#​59842)
  • Update Jax doc to include GPU and multi-slice TPU support (#​60593)
  • Document checkpoint_upload_fn backend and cuda:nccl backend support (#​60541)
  • Rename checkpoint_upload_func to checkpoint_upload_fn in docs (#​60390)
  • Fix Ray Train workloads and PyTorch with ASHA templates (#​60537)
  • Publish Ray Train workload example (#​58936)

Ray Tune

🔨 Fixes

  • Avoid file deletion race by using unique tmp file names (#​60556)

Ray LLM

🎉 New Features

  • Add /tokenize and /detokenize endpoints (#​59787)
  • Add /collective_rpc endpoint for RLHF weight synchronization (#​59529)
  • Add Control Plane API for Sleep/Wakeup (#​59455)
  • Add Pause/Resume Control Plane API (#​59523)
  • Add support for classification and scoring models (#​59499)
  • Add pooling parameter (#​59534)
  • Support vLLM structured outputs with backward-compat for guided_decoding (#​59421)
  • Add CPU support to Ray Serve LLM (#​58334)
  • Add should_continue_on_error support for ServeDeploymentStage (#​59395)
  • Support configuring HttpRequestUDF resources (#​60313)

💫 Enhancements

  • Upgrade vLLM to 0.15.0 (#​60679)
  • Unify schema of success and failure rows (#​60572)
  • Prefer uniproc executor over mp executor when world_size==1 (#​60403)
  • Use compute instead of concurrency to specify ActorPool size (#​59645)
  • Remove DataContext overrides in Ray Data LLM Processor (#​60142)
  • Use numpy arrays for embeddings to avoid torch.Tensor serialization overhead (#​59919)
  • Make PrefixCacheAwareRouter imbalance threshold less surprising (#​59390)
  • Allow tokenized_prompt without prompt in vLLMEngineStage (#​59801)
  • Avoid passing enums through fn_constructor_kwargs (#​59806)
  • Refactor Control Plane endpoints into mixins (#​59502)
  • Remove CUDA_VISIBLE_DEVICES deletion workaround (#​60502)

🔨 Fixes

  • Fix nested dict to Namespace conversion in vLLM engine initialization (#​60380)
  • Fix JSON non-serializable ndarray exception in http_request_stage (#​60299)
  • Exit actor on EngineDeadError to enable recovery (#​60145)
  • Fix NIXL port conflict in prefill-decode disaggregation test (#​60057)

📖 Documentation

  • Batch inference docs reorg and update to reflect per-stage config refactor (#​59214)
  • Add resiliency section and refine doc code (#​60594)
  • Add video/audio examples for vLLMEngineProcessor (#​59446)
  • Add SGLang integration example (#​58366)
  • Remove inaccurate statement in docs (#​60425)

Ray RLlib

🎉 New Features

  • Add TQC (Truncated Quantile Critics) algorithm implementation (#​59808)
  • Add LR scheduling ability to BC and MARWIL (#​59067)
  • RLlib and Ray Tune: Hyperparameter Optimisation example (#​60182)

💫 Enhancements

  • 🔥 APPO improvements: learner pipeline performance improvements (#​59544)
  • Improve stateful model training on offline data (#​59345)
  • Create resource bundle per learner (#​59620)
  • Improve env runner sampling by replacing recursive solution with iterative solution (#​56082)
  • Improve IMPALA examples and premerge (#​59927)
  • Remove MLAgents dependency (#​59524)
  • Upgrade to gymnasium v1.2.2 (#​59530)
  • Decrease log quantity for learning tests (#​59005)
  • Update learner state warnings to the debug level (#​60178)
  • Don't log np.nanmean warnings in EMA stats (#​60408)

🔨 Fixes

  • Fix DQN RLModule forward methods to handle dict spaces (#​60451)
  • Fix LearnerGroup.load_module_state() and mark as deprecated (#​60354)
  • Fix static dimension issue in ONNX export of Torch attention models (#​60102)
  • Fix Multi-Agent Episode concatenation for sequential environments (#​59895)
  • Fix module episode returns metrics accumulation for shared module IDs (#​60234)
  • Fix rollout fragment length calculation in AlgorithmConfig (#​59438)
  • Fix checkpointable issues with cloud storages (#​60440)
  • Update flatten_observations.py for nested spaces for ignored multi-agent (#​59928)

Ray Core

🎉 New Features

  • Resource Isolation: unify config construction, add public docs, and expose cgroup_path in ray.init() (#​59372, #​60183, #​60726)
  • Support tensor-level deduplication for NIXL (#​60509)
  • Add CUDA IPC transport for RDT (#​59838)
  • Register custom transport at runtime for RDT (#​59255)
  • Support TPU v7x accelerator type for device discovery (#​60338)
  • Introduce local port service discovery (#​59613)
  • Cancel sync actor by checking is_canceled() (#​58914)
  • Support labels for ray job submit --entrypoint-resource (#​59735)
  • Add --ip option in ray attach (#​59931)
  • Add bearer token support for remote URI downloads (#​60050)
  • Support HTTP redirection download (#​59384)
  • Add ray kill-actor --name/--namespace for force/graceful shutdown (#​60258)

💫 Enhancements

  • Bound object spilling file size to avoid disk increase pressure (#​60098)
  • Replace SHA-1 with SHA-256 for internal hash operations (#​60242)
  • Use whitelist approach to block mutation requests from browser (#​60526)
  • Pass authentication headers to WebSocket connections in tail_job_logs (#​60346)
  • Add auth to Dashboard HTTP agent and client (#​59891)
  • Use dedicated service account path for Ray auth tokens (#​60409)
  • Update Kubernetes token auth verb to ray:write (#​60411)
  • Replace RAY_AUTH_MODE=k8s with separate config for Kubernetes token auth (#​59621)
  • Optimize token auth: use shared_ptr caching and avoid per-RPC construction (#​59500)
  • Optimize OpenTelemetry metric recording calls (#​59337)
  • Throttle infeasible resource warning (#​59790)
  • Add default excludes for working_dir uploads (#​59566)
  • Tell users why objects cannot be reconstructed (#​59625)
  • Extend instance allocation timeout in autoscaler v2 (#​60392)
  • Remove GCS centralized scheduling (#​59979, #​60121, #​60188)
  • Demote stale sync message drop log to DEBUG in RaySyncer (#​59616)
  • Migrate remaining std::unordered_map to absl::flat_hash_map (#​59921)
  • Add missing fields to NodeDefinitionEvent proto (#​60314)
  • Add actor and task event missing fields (#​60287)
  • Add node id to the base event (#​59242)
  • Add repr_name to actor_lifecycle_event (#​59925)
  • Support ALL in exposable event config (#​59878)
  • Support publishing events from aggregator to GCS (#​55781)
  • Update the attempt number of actor creation task when actor restarts (#​58877)
  • Unify node feasibility and availability checking for GPU fractions (#​59278)
  • Update TPU utils for multi-slice compatibility (#​59136)
  • Improve SubprocessModuleHandle.destroy_module() resource cleanup (#​60172)
  • Support viewing PIDs for Dashboard and Runtime Env Agent (#​58701)
  • Optimize autoscaler monitor by moving resource demand parsing outside loop (#​59190)
  • Avoid GCS query for is_head in dashboard agent startup (#​59378)
  • Skip reporter and event aggregator client creation in minimal mode (#​59846)
  • Support out-of-order actors by extracting metadata when creating (RDT) (#​59610)
  • Synchronize CUDA stream before registering for NIXL (#​60072)
  • Atomically send/recv for two-sided ordering (RDT) (#​60202)
  • Add get_session_name() to RuntimeContext (#​59469)
  • Make MAX_APPLICATION_ERROR_LEN configurable via env var (#​59543)
  • Preserve function signatures through Ray decorators (#​60479)

🔨 Fixes

  • Fix idle_time_ms resetting for nodes not running tasks (#​60581)
  • Fix task event loss during shutdown (#​60247)
  • Filter bad subscriber messages from taking down GCS publisher (#​60252)
  • Fix RAY_EXPERIMENTAL_NOSET_* environment variable parsing in accelerator managers (#​60577)
  • Fix ray start --no-redirect-output crash (#​60394)
  • Fix drain state propagation race condition (#​59536)
  • Fix use-after-free race condition in OpenTelemetry gauge metric callback during shutdown (#​60048)
  • Fix PSUTIL_PROCESS_ATTRS returning empty list on Windows (#​60173)
  • Fix deadlock in garbage collection when holding lock (#​60014)
  • Fix incorrect error handling in autoscaler for available_node_types on on-prem clusters (#​60184)
  • Fix invalid status transitions in autoscaler v2 (#​60412, #​59550)
  • Fix GCS crash from race condition in MetricsAgentClient exporter initialization (#​59611)
  • Fix tracing signature mismatch when calling actors from different processes (#​59634)
  • Fix crash when killing actor handle from previous session (#​59425)
  • Fix multiple deployment same name resolve (#​59577)
  • Handle dual task errors with read-only args (#​59507)
  • Handle exceptions raised by internal_ip() within StandardAutoscaler (#​57279)
  • Fix uv_runtime_env_hook.py to pin worker Python version (#​59768)
  • Fix STRICT_PACK placement groups ignoring bundle label selectors (#​60170)
  • Fix logging bug when log value is an empty string (#​59434)
  • Fix aggregator-to-GCS event conversion (#​59783)
  • Raise error on tail log job error in newer Ray versions (#​59506)
  • Fix num retries left message (#​59829)
  • Fix psutil internal API usage in dashboard disk usage reporting (#​59659)
  • Fix event exporter init ray check (#​60073)
  • Prevent use-after-free error in core worker shutdown (#​58435)
  • Fix task name inconsistency in RUNNING vs FINISHED metrics (#​59893)
  • Fix symmetric_run using wrong condition to check GCS readiness (#​59794)
  • Preserve Pydantic details when serialization fails (#​59401)
  • Retry GCP project metadata updates on HTTP 412 errors (#​60429)
  • Fix v1 autoscaler TypeError when using bundle_label_selectors (#​59850)
  • Shorten SHA-256 hex with base32 to comply with GCP label limits (#​60722)

📖 Documentation

  • Add initial user guide for Ray resource isolation with writable cgroups (#​59051)
  • Add token authentication internals documentation (#​59299)
  • Update metric exporter docs (#​59874)
  • Add internal documentation for Port Service Discovery (#​59844)
  • Update misleading Ray job diagram (#​59940)
  • Add debugging logs related to pinned argument size limit (#​60175)
  • Add slow startup tip to podman troubleshooting docs (#​59942)
  • Clarify ray.shutdown() behavior for local vs remote clusters (#​59845)
  • Improve placement group fault tolerance doc (#​59830)
  • Add head-node memory growth and OOM guidance (#​58695)
  • Add documentation for RAY_RUNTIME_ENV_BEARER_TOKEN env var (#​60136)

Dashboard

💫 Enhancements

  • Support more panels in dashboard (#​60018)
  • Add autoscaler metrics to Data Dashboard (#​60472)
  • Support viewing PIDs for Dashboard and Runtime Env Agent (#​58701)

🔨 Fixes

  • Update total for dark mode color (#​60106)

Ray Wheels and Images

Documentation

  • Add committership documentation (#​60069)
  • Update contribution guide with common labels (#​59473)
  • Add KubeRay & Volcano integration docs update (#​59636)
  • Add RayJob InTreeAutoscaling with Kueue docs after Kueue 0.16.0 release (#​59648)
  • Refactor LLM batch inference template (#​59897)
  • Add async inference template (#​58393)
  • Add RunLLM chat widget for Ray docs (#​59126)
  • Fix various typos and broken links (#​60249, #​59901, #​60181)
  • Replace Ray Tune + Train example with vanilla Ray Tune in homepage (#​60229)
  • Add Ray technical charter (#​60068)

Thanks

Thank you to everyone who contributed to this release!
@​KaisennHu, @​MiXaiLL76, @​slfan1989, @​krisselberg, @​JasonLi1909, @​Priya-753, @​pseudo-rnd-thoughts, @​zzchun, @​ZacAttack, @​pushpavanthar, @​jjyao, @​ryanaoleary, @​pcmoritz, @​akshay-anyscale, @​HassamSheikh, @​yurekami, @​Hyunoh-Yeo, @​ruoliu2, @​nrghosh, @​wxwmd, @​myandpr, @​J-Meyers, @​trilamsr, @​kouroshHakha, @​limarkdcunha, @​manhld0206, @​jreiml, @​preneond, @​yuchen-ecnu, @​Yicheng-Lu-llll, @​AchimGaedkeLynker, @​vaishdho1, @​israbbani, @​OneSizeFitsQuorum, @​Sathyanarayanaa-T, @​nadongjun, @​xinyuangui2, @​Rob12312368, @​as-jding, @​lee1258561, @​popojk, @​coqian, @​rajeshg007, @​jeffreywang-anyscale, @​kamil-kaczmarek, @​alexeykudinkin, @​Aydin-ab, @​mgchoi239, @​dragongu, @​edoakes, @​smortime, @​tk42, @​abrarsheikh, @​jakubzimny, @​Future-Outlier, @​axreldable, @​owenowenisme, @​g199209, @​cem-anyscale, @​dayshah, @​akelloway, @​daiping8, @​dlwh, @​robertnishihara, @​400Ping, @​matthewdeng, @​antoine-galataud, @​cristianjd, @​Partth101, @​goutamvenkat-anyscale, @​codope, @​seanlaii, @​andrew-anyscale, @​andrewsykim, @​liulehui, @​simonsays1980, @​Sparks0219, @​yifanmai, @​landscapepainter, @​win5923, @​kangwangamd, @​srinarayan-srikanthan, @​KeeProMise, @​srinathk10, @​my-vegetable-has-exploded, @​MengjinYan, @​yancanmao, @​yuhuan130, @​ArturNiederfahrenhorst, @​akyang-anyscale, @​rushikeshadhav, @​kongjy, @​harshit-anyscale, @​justinvyu, @​dancingactor, @​Vito-Yang, @​cr7258, [@​marwan116](https://redirect.github.com/marwan1


Configuration

📅 Schedule: Branch creation - "" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Enabled.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about these updates again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot added the changelog/chore A trivial change label Feb 20, 2026
@renovate renovate bot enabled auto-merge (squash) February 20, 2026 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/chore A trivial change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants