-
Notifications
You must be signed in to change notification settings - Fork 6k
Insights: ray-project/ray
Overview
Could not load contribution data
Please try again later
124 Pull requests merged by 41 people
-
[RLlib] Fix
MultiAgentEnvRunner
env check bug.#50891 merged
Feb 25, 2025 -
[data] add iceberg write support through pyiceberg
#50590 merged
Feb 25, 2025 -
[ray.llm.batch] Ensure the type of
prompt_token_ids
#50865 merged
Feb 25, 2025 -
[serve.llm] remove backoff dependency
#50822 merged
Feb 25, 2025 -
[core] Implement log rotation for worker
#50759 merged
Feb 25, 2025 -
[docs][core] Improve Compiled Graphs docs
#50627 merged
Feb 25, 2025 -
[llm.serving] Reconfigure router to better perform under high concurrency (#50876)
#50884 merged
Feb 25, 2025 -
[core] Cover cpplint for
ray/src/ray/internal
#50662 merged
Feb 25, 2025 -
[ci] convert the if-else change in cond testing script into a config file
#50776 merged
Feb 25, 2025 -
[llm.serving] Reconfigure router to better perform under high concurrency
#50876 merged
Feb 25, 2025 -
[Dashboard] Added a proper message when Ray jobs submission is used a…
#49409 merged
Feb 25, 2025 -
[core] [cgroup] Interface for cgroup setup and its fake implementation
#50787 merged
Feb 25, 2025 -
[1/N] bazel lint C++ code
#50869 merged
Feb 25, 2025 -
Change conda env validity checking to activate and run Python
#48552 merged
Feb 25, 2025 -
[Serve] Faster bulk imperative Serve Application deploys
#49168 merged
Feb 25, 2025 -
[data/preprocessors] feat: allow vectorizers to be executed in append mode
#50847 merged
Feb 24, 2025 -
[data/preprocessors] feat: allow tokenizer to execute in append mode
#50848 merged
Feb 24, 2025 -
cherrypick #50860
#50867 merged
Feb 24, 2025 -
[CI] pass cloud into service.status calls
#50860 merged
Feb 24, 2025 -
[ray.llm][Batch] Support LoRA
#50804 merged
Feb 24, 2025 -
[train v2] Fix trainer import deserialization when captured within a Ray tasks
#50862 merged
Feb 24, 2025 -
[core] Remove unused
release_resources
flag#50854 merged
Feb 24, 2025 -
[llm.serving] Fix using uni executor when world size == 1 (#50849)
#50863 merged
Feb 24, 2025 -
[llm.serving] Fix using uni executor when world size == 1
#50849 merged
Feb 24, 2025 -
[cleanup] Remove
ray.storage._load_class
indirection#50857 merged
Feb 24, 2025 -
Docs: add suspend to RayJob Configuration in RayJob QuickStart
#50641 merged
Feb 24, 2025 -
[docs] add shell completion instructions for kubectl plugin
#50808 merged
Feb 24, 2025 -
Add recommendations for when to use
make local
vsmake develop
#50773 merged
Feb 24, 2025 -
[core] Avoid creating a boost::asio::thread_pool with 0 threads
#50837 merged
Feb 24, 2025 -
[data/preprocessors] feat: allow discretizers to be used in append mode
#50584 merged
Feb 24, 2025 -
[core] Remove test_events_with_crash
#50780 merged
Feb 24, 2025 -
[doc] unify the look and feel of anyscale buttons
#50800 merged
Feb 24, 2025 -
Make core team codeowners for workflows
#50855 merged
Feb 24, 2025 -
[Core] Add CancelTaskWithResourceShapes to Node Raylet Client and Node Manager
#50200 merged
Feb 24, 2025 -
[ADAG]Redundant code optimization
#50305 merged
Feb 24, 2025 -
[core][cgraph] Fix compiled graph buffer release issues
#50434 merged
Feb 24, 2025 -
[serve.llm] remove asyncache and cachetools from dependencies.
#50806 merged
Feb 23, 2025 -
[core] Guard concurrent access to generator IDs with a mutex
#50845 merged
Feb 23, 2025 -
[core] Guard concurrent access to generator IDs with a mutex
#50740 merged
Feb 23, 2025 -
[core] Cover cpplint for
ray/tree/master/src/ray/rpc
#50727 merged
Feb 23, 2025 -
[core] Cover cpplint for
src/ray/pubsub
#50732 merged
Feb 23, 2025 -
[ci] more project id fixes for release test launching
#50841 merged
Feb 23, 2025 -
cherrypick #50841
#50842 merged
Feb 23, 2025 -
cherrypick #50836
#50840 merged
Feb 23, 2025 -
[ci] allow specifying project id for release test
#50836 merged
Feb 23, 2025 -
[doc] replace PNGs
#50810 merged
Feb 22, 2025 -
cherrypick #50816 and #50829
#50834 merged
Feb 22, 2025 -
[ci] fix release test launching
#50829 merged
Feb 22, 2025 -
[RLlib] APPO accelerate (vol 19): Torch clip util enhancements.
#50791 merged
Feb 22, 2025 -
cherrypick #50805
#50832 merged
Feb 22, 2025 -
[release] change version to 2.43.0
#50831 merged
Feb 22, 2025 -
[Core] Update Error Message and Anti-Pattern for the Case of Forking New Processes in Worker Processes
#50705 merged
Feb 22, 2025 -
[release test] upgrade anyscale cli version
#50816 merged
Feb 22, 2025 -
Fix minor typo in ray-client.rst
#50809 merged
Feb 21, 2025 -
[deps] add anyscale CLI dependency into resolving
#50700 merged
Feb 21, 2025 -
Various improvements to Serve Request Batching tutorial
#50400 merged
Feb 21, 2025 -
[Core] Fix windows compilation error
#50805 merged
Feb 21, 2025 -
[RLlib] Implement vectorization for
MultiAgentEnv
(new API stack).#50437 merged
Feb 21, 2025 -
[Release Tests][Data] Print Resource Manager stats in release tests
#50801 merged
Feb 21, 2025 -
[Core] Split gcs into smaller targets
#50764 merged
Feb 21, 2025 -
Fix indefinite article in CONTRIBUTING rst
#50651 merged
Feb 21, 2025 -
[llm] removes unused aiobotocore dependency
#50797 merged
Feb 21, 2025 -
[Core] Record core usage
#50756 merged
Feb 21, 2025 -
[data] Adding in
TaskDurationStats
andon_execution_step
callback#50766 merged
Feb 21, 2025 -
[llm.serving] add requirements sections to overview page
#50788 merged
Feb 21, 2025 -
[train][ci] Bump up the timeout of
test_data_parallel_trainer
#50796 merged
Feb 21, 2025 -
[llm.serving] Update default batch timeout to 50ms
#50786 merged
Feb 21, 2025 -
[core] Run redis tests on postmerge only
#50795 merged
Feb 21, 2025 -
[core] Cover cpplint for ray/src/ray/scheduling
#50686 merged
Feb 21, 2025 -
[core] [easy] [noop] Move pid_t alias to compatibility header
#50789 merged
Feb 21, 2025 -
[llm.serving] Address dependencies related issues for llm serving
#50785 merged
Feb 21, 2025 -
[llm.serving] fix telemetry test related to autoscaling changes
#50779 merged
Feb 21, 2025 -
[LLM Serve] Fix docs example for vision language model
#50782 merged
Feb 21, 2025 -
[docs][data/llm] Minor http docstring improvement
#50777 merged
Feb 21, 2025 -
[serve.llm] Post dogfooding changes to the docs
#50781 merged
Feb 21, 2025 -
[Data] Fixing aggregation protocol to be appropriately associative
#50757 merged
Feb 21, 2025 -
[doc] Add Anyscale as an option to get started with Ray
#50772 merged
Feb 21, 2025 -
[llm.serving] Refactor
LLMConfig#get_serve_options
#50753 merged
Feb 21, 2025 -
[serve.llm] Rename APIs to LLMRouter and VLLMService
#50775 merged
Feb 21, 2025 -
[deps] remove indirect referencing
#50742 merged
Feb 21, 2025 -
[ci] better line splitting on diff output
#50765 merged
Feb 20, 2025 -
[llm.batching] fix doc
#50774 merged
Feb 20, 2025 -
[Core] Add a Function to Get Per Node Infeasible Request Resource Shapes in GCS Autoscaling State Manager
#50085 merged
Feb 20, 2025 -
[llm.serving] validate null response content
#50771 merged
Feb 20, 2025 -
[ray.serve.llm][docs] Added llm serving docs
#50675 merged
Feb 20, 2025 -
[core] Fix windows build from ray.wait race pr
#50758 merged
Feb 20, 2025 -
[doc][core][cgraph] Passing filename to viz
#50752 merged
Feb 20, 2025 -
[serve][tests] Print the wrk error to the driver logs if it exists
#50755 merged
Feb 20, 2025 -
[RLlib]
TorchLearner
: Don't callno_sync
(DDP/multi-GPU) on non-torch modules.#50760 merged
Feb 20, 2025 -
[core] Fix missing type alias on windows
#50716 merged
Feb 20, 2025 -
[ray.llm] Minor improvments
#50748 merged
Feb 20, 2025 -
Revert "[core] Implement redirection for core worker (#50398)"
#50738 merged
Feb 20, 2025 -
[CI] Enable check-json hooks in pre-commit
#50366 merged
Feb 20, 2025 -
[llm.serving] fix more diff on test_usage
#50744 merged
Feb 20, 2025 -
[data/llm/docs] LLM Batch API documentation improvements
#50747 merged
Feb 20, 2025 -
[core][cgraph] Revive max_buffered_result arg
#50725 merged
Feb 20, 2025 -
[data/preprocessors] feat: allow simple imputer to execute on append mode
#50713 merged
Feb 20, 2025 -
[deps] upgrade certifi
#50701 merged
Feb 19, 2025 -
[deps] fix bash styling in install-deps script
#50734 merged
Feb 19, 2025 -
[data] Simplify Operator.__repr__
#50620 merged
Feb 19, 2025 -
[RLLib] Pass large AlgorithmConfig by reference to RolloutWorker
#50688 merged
Feb 19, 2025 -
[deps] explicitly add dl-cpu requirements in ray-ml docker build file
#50733 merged
Feb 19, 2025 -
[data] Make iter_torch_batches release test only use GPU for the head node
#50712 merged
Feb 19, 2025 -
[deps] compile dependencies in the repo root dir
#50702 merged
Feb 19, 2025 -
[doc][cgraph] Reference compiled graph in Ray docs
#50654 merged
Feb 19, 2025 -
[data/preprocessors] feat: allow normalizer to be used in append mode
#50714 merged
Feb 19, 2025 -
[llm.serving] fix usage test
#50730 merged
Feb 19, 2025 -
[core] Avoid logger flush at every write
#50722 merged
Feb 19, 2025 -
[ray.serve.llm] Fix setting up AutoProcessor
#50715 merged
Feb 19, 2025 -
Update working-with-llms.rst
#50717 merged
Feb 19, 2025 -
[core] Remove unused mock class
#50719 merged
Feb 19, 2025 -
[release] Try fix data loss
#50709 merged
Feb 19, 2025 -
[core] Implement redirection for core worker
#50398 merged
Feb 19, 2025 -
[ray.llm] Support assistant prefill in chat template stage
#50628 merged
Feb 19, 2025 -
[data/llm/docs] Initial draft of user guide for Data LLM APIs
#50674 merged
Feb 19, 2025 -
[LLM] OSS LLM Serving
#50643 merged
Feb 19, 2025 -
[ray.llm][Batch] Add __init__.py
#50699 merged
Feb 18, 2025 -
[deps] add more dependencies for ray[llm]
#50692 merged
Feb 18, 2025 -
[Data] Replace
AggregateFn
withAggregateFnV2
, cleaning up Aggregation infrastructure#50585 merged
Feb 18, 2025 -
[core][cgraph] Only show type on visualize with
channel_details
#50689 merged
Feb 18, 2025 -
[data][dashboard] allow reusing of dashboard conftest
#50614 merged
Feb 18, 2025 -
[deps] upgrade h11, httpcore and httpx
#50691 merged
Feb 18, 2025 -
Add nightly Python 3.13 wheels to documentation and build aarch64 wheels
#50667 merged
Feb 18, 2025 -
[core] Add back workflow test
#50660 merged
Feb 18, 2025
39 Pull requests opened by 24 people
-
minor improvements to hyperopt tutorial
#50697 opened
Feb 18, 2025 -
various improvements to lightgbm tutorial:
#50704 opened
Feb 19, 2025 -
[Autoscaler][V2] Check IM instance_status before terminating nodes
#50707 opened
Feb 19, 2025 -
[doc] Add documentation for Asynchronous HyperBand Example in Tune
#50708 opened
Feb 19, 2025 -
[core] Adding warnings for non zero-copy serialization
#50731 opened
Feb 19, 2025 -
[WIP][core] Use `DestroyWorker` to disconnect raylet client when killing leased workers
#50736 opened
Feb 19, 2025 -
[doc][core][cgraph] Add Compiled Graph API
#50754 opened
Feb 20, 2025 -
[core] [wip attempt] StatusOr union construction sometimes breaks windows build
#50761 opened
Feb 20, 2025 -
docs: add quickstart button to examples
#50763 opened
Feb 20, 2025 -
[data] Add dataset/operator state, progress, total metrics
#50770 opened
Feb 20, 2025 -
[core] Unit tests for tensor serialization
#50778 opened
Feb 21, 2025 -
[WIP] Ray Collective Communication Lib Support HCCL Backend
#50790 opened
Feb 21, 2025 -
[docs] add missing step to install KubeRay in gke-gcs-bucket.md
#50811 opened
Feb 21, 2025 -
[wip] Detect socket closed
#50812 opened
Feb 21, 2025 -
[serve.llm] remove asyncio_timeout from dependencies
#50815 opened
Feb 22, 2025 -
[serve.llm] import boto3 and google lazily but make them part of requirements
#50820 opened
Feb 22, 2025 -
[serve.llm] Made json validator a singleton and jsonref packages lazy imported
#50821 opened
Feb 22, 2025 -
[core] Subprocess to cleanup resource for parent process
#50830 opened
Feb 22, 2025 -
[core][cgraph] Keep cgraph reference alive for teardown
#50851 opened
Feb 24, 2025 -
[wip] remove block/unblock calls to see what breaks
#50852 opened
Feb 24, 2025 -
[wip] Try removing `NotifyUnblocked`
#50853 opened
Feb 24, 2025 -
[data/preprocessors] feat: allow transformer to be executed in append mode
#50856 opened
Feb 24, 2025 -
Add perf metrics for 2.43.0
#50864 opened
Feb 24, 2025 -
Improvements to PBT example
#50870 opened
Feb 25, 2025 -
[llm.serving] Import AutoscalingConfig & DeploymentConfig from Serve
#50871 opened
Feb 25, 2025 -
[train] Remove ray storage dependency and deprecate `RAY_STORAGE` env var configuration option
#50872 opened
Feb 25, 2025 -
[llm.serving] Made accelerator_type an optional field in LLMConfig
#50873 opened
Feb 25, 2025 -
[Train V2] Hide the private functions of train context to avoid abuse
#50874 opened
Feb 25, 2025 -
[core] Setup log rotation for runtime env agent
#50877 opened
Feb 25, 2025 -
Various improvements to the Getting Started page
#50878 opened
Feb 25, 2025 -
[WIP] Rebasing materialized dataset on iterator back-pressure is active upon materialization
#50880 opened
Feb 25, 2025 -
[core] Split object manager into small C++ targets
#50885 opened
Feb 25, 2025 -
[Core] Logic Added to Cancel Infeasible Tasks in GCS Based on Autoscaler State
#50886 opened
Feb 25, 2025 -
[compiled graphs] Remove unused external stream in ExecutableTask
#50887 opened
Feb 25, 2025 -
[serve] Wait for proxy to start before running wrk trial
#50888 opened
Feb 25, 2025 -
bazel lint all cpp folder
#50889 opened
Feb 25, 2025 -
[data] mark pyarrow nightly tests as soft fail
#50892 opened
Feb 25, 2025 -
Minor readability improvements to Ray Data Quickstart
#50893 opened
Feb 25, 2025 -
Add perf metrics for 2.43.0
#50894 opened
Feb 25, 2025
93 Issues closed by 16 people
-
[Serve] Can't autoscale deployment when target ongoing requests is 1
#24793 closed
Feb 25, 2025 -
[serve] Prefer stopping pending replicas instead of running ones
#12929 closed
Feb 25, 2025 -
[Serve] Handle prefers replicas on the same node
#13108 closed
Feb 25, 2025 -
[serve] Extend multi-app unit tests
#34450 closed
Feb 25, 2025 -
[Core] Extend Ray Data with IcebergDataSink for Distributed Writes to Iceberg Tables
#49032 closed
Feb 25, 2025 -
CI test linux://rllib:learning_tests_stateless_cartpole_appo_gpu is flaky
#47295 closed
Feb 25, 2025 -
[Core] Ray creates a Issue with catboost
#50843 closed
Feb 25, 2025 -
CI test linux://rllib:learning_tests_cartpole_dqn_gpu is flaky
#46683 closed
Feb 25, 2025 -
[core] Cover cpplint for ray/src/ray/internal
#50608 closed
Feb 25, 2025 -
CI test linux://rllib:examples/connectors/mean_std_filtering_ppo is consistently_failing
#47435 closed
Feb 25, 2025 -
CI test linux://python/ray/train/v2:test_v2_api is consistently_failing
#50879 closed
Feb 25, 2025 -
CI test windows://python/ray/tests:test_reference_counting_2 is flaky
#45964 closed
Feb 25, 2025 -
CI test windows://python/ray/tests:test_actor_retry2 is flaky
#47415 closed
Feb 25, 2025 -
Ray cluster launcher on existing GCP VMs, head node unable to SSH to worker nodes
#34838 closed
Feb 25, 2025 -
[Ray Data] read_binary_files does not load data from S3 in parallel
#44215 closed
Feb 25, 2025 -
[Tune] pbt_transformers.py failing to run - error code
#37688 closed
Feb 25, 2025 -
[Train] Ray air benchmark example test break with nightly build
#43091 closed
Feb 25, 2025 -
[Ray[Data]] Put failure: FileNotFoundError
#43282 closed
Feb 25, 2025 -
RayTaskError in my Trainable class.
#43137 closed
Feb 25, 2025 -
[<Ray component: Train>] OOM when loading 70B LLAMA2
#43318 closed
Feb 25, 2025 -
tune.with_parameters() throws resource exhaustion exception
#42590 closed
Feb 25, 2025 -
[Feature] Add support for GAIL/AIRL
#19394 closed
Feb 25, 2025 -
How to enable login and logout button in Ray Dashboard
#42606 closed
Feb 25, 2025 -
Ray tune - _TrainingRunMetadata generates incorrect results
#42628 closed
Feb 25, 2025 -
[CI] `linux://python/ray/air:test_resource_manager_placement_group` is failing/flaky on master.
#40505 closed
Feb 25, 2025 -
[<Ray component: Tune>] TensorflowCheckpoint Temp Folder is hard coded to system variable
#41205 closed
Feb 25, 2025 -
[Train] Add user guide for configuring cuda devices
#41958 closed
Feb 25, 2025 -
[Train] LightGBM stuck with more workers
#41790 closed
Feb 25, 2025 -
Experiment class and search algorithm
#41295 closed
Feb 25, 2025 -
Ray 2.6.1 error: pydantic doesn't have attribute `__version__`
#38343 closed
Feb 25, 2025 -
[docs][rllib] Broken link to PyTorch Dynamo /rllib/rllib-torch2x.html
#40868 closed
Feb 25, 2025 -
[RLlib] "RuntimeError: The learner thread died while training!" after a view training cycles.
#36801 closed
Feb 25, 2025 -
[Doc] Help every user understand what a "driver" is
#41046 closed
Feb 25, 2025 -
[Doc] Update AIR examples to not use the `preprocessor` arg to Trainer
#36757 closed
Feb 25, 2025 -
[Dashboard] Provide GPU stats per actor or per worker process in the node table
#31998 closed
Feb 25, 2025 -
[RLlib] timesteps per minute and cpu/gpu load logging
#25847 closed
Feb 25, 2025 -
[Dashboard] Confusing `workers` terminology
#24277 closed
Feb 25, 2025 -
Callback URL
#40288 closed
Feb 25, 2025 -
PPO centralized critic example with more than two agents?
#40040 closed
Feb 25, 2025 -
Unable to read '/sys/fs/cgroup/cpuset/cpuset.cpus' through jupyter notebook
#40351 closed
Feb 25, 2025 -
[Core] specifying runtime_env using conda env fullpath no work
#50720 closed
Feb 25, 2025 -
custom stage name in map_batches
#40008 closed
Feb 25, 2025 -
[Core] Crash using C++ API when calling a task and passing an actor handle
#29879 closed
Feb 25, 2025 -
Ray
#39855 closed
Feb 25, 2025 -
[RLlib] Error in executing demo example with attention net.
#39769 closed
Feb 25, 2025 -
[RLlib] Error importing the library
#39583 closed
Feb 25, 2025 -
[Rllib] critic reguliarized regression: wrong action clipping for critic update
#39683 closed
Feb 25, 2025 -
ray.get() can't get a large-scale object store
#39720 closed
Feb 25, 2025 -
[core] Flaky test `test_release_resources_race` in test_basic when Ray Client is enabled
#39688 closed
Feb 25, 2025 -
[RLlib] PPO training on Atari Environment using standard hyperparameters gives poor results
#39255 closed
Feb 25, 2025 -
[RLlib] Framework "tf2" raises error in `MLPEncoderConfig`
#37413 closed
Feb 25, 2025 -
[Tune] `Tuner.restore` does not fully work with WandB Callback
#38894 closed
Feb 25, 2025 -
[tune] Can't move file with different disk types
#38486 closed
Feb 25, 2025 -
Dreamer error: frame_skip unexpected
#32508 closed
Feb 25, 2025 -
[RLlib] PPO speed and performance issues
#29623 closed
Feb 25, 2025 -
[External Doc] Update `pytorch_tutorials_hyperparameter_tuning_tutorial.py` in the `pytorch/tutorials` repo
#38956 closed
Feb 25, 2025 -
[tune] 'tune.utils.wait_for_gpu' typo
#38721 closed
Feb 25, 2025 -
[Doc] Warning about batch size on page /data/examples/huggingface_vit_batch_prediction.html
#38762 closed
Feb 25, 2025 -
[Tune] TensorflowCheckpoint cannot save subclassed Keras model because it uses legacy H5 format
#44804 closed
Feb 25, 2025 -
[RFC] Ray Serve model multiplexing support
#33253 closed
Feb 25, 2025 -
serve run and serve deploy for multiple users
#42622 closed
Feb 25, 2025 -
[Serve] Tidy up lifecycle of gRPC context
#42347 closed
Feb 25, 2025 -
[Serve] Pull out `disconnected_task` from `ProxyResponseGenerator`
#40796 closed
Feb 25, 2025 -
CI test linux://python/ray/dag:tests/experimental/test_accelerated_dag is flaky
#45922 closed
Feb 24, 2025 -
[core] Cover cpplint for `ray/tree/master/src/ray/rpc`
#50647 closed
Feb 23, 2025 -
[core] Cover cpplint for `src/ray/pubsub`
#50728 closed
Feb 23, 2025 -
Please delete
#50835 closed
Feb 22, 2025 -
[core] Split giant ray core C++ targets into small ones (GCS)
#50685 closed
Feb 21, 2025 -
CI test darwin://python/ray/tests:test_runtime_env_working_dir_3 is flaky
#44765 closed
Feb 21, 2025 -
CI test linux://python/ray/train/v2:test_data_parallel_trainer is flaky
#50698 closed
Feb 21, 2025 -
[core] Cover cpplint for ray/src/ray/scheduling
#50679 closed
Feb 21, 2025 -
Minio as S3 storage
#50762 closed
Feb 21, 2025 -
CI test windows://python/ray/tests:test_object_spilling_debug_mode is flaky
#43796 closed
Feb 21, 2025 -
[core] release test degradation: multi_client_put_gigabytes
#50769 closed
Feb 20, 2025 -
CI test windows://python/ray/tests:test_task_metrics is consistently_failing
#43770 closed
Feb 20, 2025 -
CI test darwin://python/ray/tests:test_gcs_fault_tolerance is consistently_failing
#43777 closed
Feb 20, 2025 -
CI test darwin://python/ray/tests:test_logging is consistently_failing
#50724 closed
Feb 20, 2025 -
CI test windows://python/ray/serve/tests:test_logging is consistently_failing
#46043 closed
Feb 20, 2025 -
[Core] ray distributed debugger, always connecting to cluster..
#50682 closed
Feb 20, 2025 -
[core][compiled graphs] Ray 2.42 got RayCgraphCapacityExceeded in vLLM
#50381 closed
Feb 20, 2025 -
[Core] Whether the environment variables set in runtime_env can be directly added to ray client?
#50659 closed
Feb 20, 2025 -
[Core] Can the path of IGNORE_GITIGNORE be set as configurable?
#50658 closed
Feb 20, 2025 -
CI test linux://python/ray/tests:test_network_failure_e2e is consistently_failing
#49556 closed
Feb 19, 2025 -
Ray Data checkpoint
#49438 closed
Feb 19, 2025 -
CI test darwin://python/ray/tests:test_threaded_actor is flaky
#44663 closed
Feb 19, 2025 -
[Core] Learner not respecting Catalog causes warning since 2.39
#50690 closed
Feb 19, 2025 -
CI test linux://rllib:learning_tests_multi_agent_stateless_cartpole_ppo_multi_gpu is flaky
#47332 closed
Feb 19, 2025 -
Release test serve_microbenchmarks.aws failed
#50606 closed
Feb 19, 2025 -
CI test linux://doc:source/serve/doc_code/object_detection is consistently_failing
#49142 closed
Feb 19, 2025 -
CI test linux://python/ray/serve/tests/unit:test_pow_2_replica_scheduler is flaky
#48736 closed
Feb 18, 2025 -
CI test linux://python/ray/workflow:tests/test_events_with_crash is consistently_failing
#50193 closed
Feb 18, 2025
36 Issues opened by 22 people
-
[Data] Ordering of blocks after map and map_batches
#50890 opened
Feb 25, 2025 -
[Serve] Ray Serve APIs for users to define when the Ray Serve applications are ready to serve requests
#50883 opened
Feb 25, 2025 -
[core] Split object manager into smaller C++ targets
#50882 opened
Feb 25, 2025 -
CI test linux://doc/source/llm/examples/batch:vllm-with-lora is consistently_failing
#50881 opened
Feb 25, 2025 -
bazel-lint all BUILD files
#50875 opened
Feb 25, 2025 -
[Autoscaler][V2] Updating max replicas while Pods are pending causes v2 autoscaler to hang
#50868 opened
Feb 24, 2025 -
CI test darwin://python/ray/tests:test_reconstruction_2 is consistently_failing
#50859 opened
Feb 24, 2025 -
Installing dependencies from pyproject.toml
#50858 opened
Feb 24, 2025 -
CI test linux://python/ray/data:test_transform_pyarrow is flaky
#50827 opened
Feb 22, 2025 -
CI test linux://python/ray/data:test_strict_mode is flaky
#50826 opened
Feb 22, 2025 -
CI test linux://python/ray/data:test_numpy_support is flaky
#50825 opened
Feb 22, 2025 -
[Train] Unable to gain long-term access to S3 storage for training state/checkpoints when running on AWS EKS
#50823 opened
Feb 22, 2025 -
CI test linux://python/ray/air:test_tensor_extension is flaky
#50819 opened
Feb 22, 2025 -
CI test linux://python/ray/air:test_object_extension is flaky
#50818 opened
Feb 22, 2025 -
CI test linux://python/ray/air:test_arrow is flaky
#50817 opened
Feb 22, 2025 -
[Core] Ray Data job hanging with flooded Cancelling stale RPC with seqno 125 < 127 error
#50814 opened
Feb 21, 2025 -
[Core] Handle transient network error for pushing object chunks
#50803 opened
Feb 21, 2025 -
[core] Serve microbenchmarks occasionally crash with segfault or invalid memory access
#50802 opened
Feb 21, 2025 -
[Data] supper passing `pyarrow.dataset.Expression`s to `Dataset.filter`'s `expr`
#50799 opened
Feb 21, 2025 -
Enable Ray debug logging when installing env packages with pip/uv
#50798 opened
Feb 21, 2025 -
[Data]: Categorizer fails with non uniform distributions
#50792 opened
Feb 21, 2025 -
[Core] Negative available resources
#50739 opened
Feb 19, 2025 -
[Core] Stale ray_cluster_<state>_nodes metrics
#50735 opened
Feb 19, 2025 -
[core] Cover cpplint for `src/ray/object_manager/plasma`
#50729 opened
Feb 19, 2025 -
slow torch.distributed with non-default CUDA_VISIBLE_DEVICES
#50723 opened
Feb 19, 2025 -
how to solve this problem
#50721 opened
Feb 19, 2025 -
[core] Fix mock dependency
#50718 opened
Feb 19, 2025 -
[nsys plugin] How about add an option `name` to nsys dumped file
#50711 opened
Feb 19, 2025 -
[Serve] Serve no longer retries deployments after 3 failures
#50710 opened
Feb 19, 2025 -
[Serve] group requests by `model_id` in Model Multiplexing
#50695 opened
Feb 18, 2025 -
Make sure precommit hook linter and CI matches
#50694 opened
Feb 18, 2025
293 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
(WIP) [core][compiled graphs] Unify code paths for NCCL P2P and collectives scheduling
#48649 commented on
Feb 24, 2025 • 54 new comments -
[core][compiled graphs]Support reduce scatter and all gather collective for GPU communicator in compiled graph
#50624 commented on
Feb 25, 2025 • 23 new comments -
[Data] Adding in per node metrics
#49705 commented on
Feb 25, 2025 • 13 new comments -
[Core][Autoscaler] Refactor v2 Log Formatting
#49350 commented on
Feb 25, 2025 • 10 new comments -
[core] Utils to cleanup cgroup folder
#49941 commented on
Feb 25, 2025 • 8 new comments -
[core][1/N] Make executor to be a long-running Python thread
#50644 commented on
Feb 24, 2025 • 7 new comments -
Various enhancements to the Gradio Ray Serve tutorial
#50276 commented on
Feb 18, 2025 • 6 new comments -
[Fix][Core] Execute user requested actor exit in C++ side
#49918 commented on
Feb 25, 2025 • 6 new comments -
[data/preprocessors] feat: allow hasher to run on append mode
#50632 commented on
Feb 25, 2025 • 4 new comments -
Discover TPU logs in Ray Dashboard
#47737 commented on
Feb 25, 2025 • 4 new comments -
[train][v2] implement state export
#50622 commented on
Feb 25, 2025 • 4 new comments -
[Feat][Dashboard] Remove StateHead's dependencies on DataSource
#50605 commented on
Feb 25, 2025 • 3 new comments -
[core] Implement dup2 wrapper
#50439 commented on
Feb 19, 2025 • 3 new comments -
Add Semi-Random Weighting to AutoScaler Node Scheduler
#49983 commented on
Feb 19, 2025 • 2 new comments -
[core] add RAY_IGNORE_VERSION_MISMATCH when ray start --address
#50513 commented on
Feb 22, 2025 • 2 new comments -
[doc][core] fix a wrong url in ray-dag.rst
#49980 commented on
Feb 25, 2025 • 2 new comments -
[Core] Split stats_metric into smaller targets to improve build performance
#50595 commented on
Feb 24, 2025 • 2 new comments -
[core] Cover cpplint for ray/src/ray/stats
#50678 commented on
Feb 25, 2025 • 2 new comments -
[data] add ClickHouse sink
#50377 commented on
Feb 19, 2025 • 1 new comment -
[Docs] Update Volcano Integration with The New Flag
#47911 commented on
Feb 25, 2025 • 1 new comment -
[bazel] move python rules up
#47260 commented on
Feb 25, 2025 • 0 new comments -
[Doc] RayServe Single-Host TPU v6e Example with vLLM
#47814 commented on
Feb 19, 2025 • 0 new comments -
[core][dashboard] Make updates to DataSource.(node_workers|core_worker_stats) on delta.
#47186 commented on
Feb 25, 2025 • 0 new comments -
[core] GcsPublisher bindings
#47062 commented on
Feb 25, 2025 • 0 new comments -
[core][dashboard] Change the StateDataSourceClient from using gRPC stub -> NewGcsClient.
#47056 commented on
Feb 25, 2025 • 0 new comments -
Bump aiohttp from 3.9.5 to 3.10.2 in /release
#47050 commented on
Feb 25, 2025 • 0 new comments -
Bump tensorflow from 2.11.0 to 2.12.1 in /rllib_contrib/simple_q
#47001 commented on
Feb 25, 2025 • 0 new comments -
[ci][core] GCS FT Chaos test
#46996 commented on
Feb 25, 2025 • 0 new comments -
Bump tensorflow from 2.11.0 to 2.12.1 in /rllib_contrib/td3
#46978 commented on
Feb 25, 2025 • 0 new comments -
Bump keras from 2.7.0 to 2.13.1 in /python/requirements/compat
#46977 commented on
Feb 25, 2025 • 0 new comments -
[RLlib] Deprecate algo config (python) dicts; must be `AlgorithmConfig` objects.
#46896 commented on
Feb 25, 2025 • 0 new comments -
[POC] A Reactor style GCS. #1: GcsNodeManager
#46891 commented on
Feb 25, 2025 • 0 new comments -
Add docs link to Serve page of Ray Dashboard
#46812 commented on
Feb 25, 2025 • 0 new comments -
[Docs][hotfix] Correct the desc of nums of blocks
#47741 commented on
Feb 24, 2025 • 0 new comments -
[wip] revive zero copy torch tensor serialization
#47665 commented on
Feb 25, 2025 • 0 new comments -
[ADAG]Enable NPU (hccl) communication for CG
#47658 commented on
Feb 21, 2025 • 0 new comments -
Bump send and express in /python/ray/dashboard/client
#47643 commented on
Feb 25, 2025 • 0 new comments -
Bump serve-static and express in /python/ray/dashboard/client
#47641 commented on
Feb 25, 2025 • 0 new comments -
Bump body-parser and express in /python/ray/dashboard/client
#47589 commented on
Feb 25, 2025 • 0 new comments -
Bump path-to-regexp and express in /python/ray/dashboard/client
#47588 commented on
Feb 25, 2025 • 0 new comments -
[Do not merge] Run release tests for export API
#47568 commented on
Feb 25, 2025 • 0 new comments -
Improvements and Artificial Intelligence-based Improvements for Ray Cross-Language Functionality Testing
#47499 commented on
Feb 25, 2025 • 0 new comments -
[core][dashboard] make a flamegraph on event loop lag.
#47491 commented on
Feb 25, 2025 • 0 new comments -
Bump webpack from 5.76.2 to 5.94.0 in /python/ray/dashboard/client
#47428 commented on
Feb 25, 2025 • 0 new comments -
[todo] Migrate redis kv get sync
#47348 commented on
Feb 25, 2025 • 0 new comments -
[Core][aDAG] Remove busy waiting semaphore acquire in linux
#47322 commented on
Feb 25, 2025 • 0 new comments -
idempotent replies by seq_no for sequential actors.
#47314 commented on
Feb 25, 2025 • 0 new comments -
Bump micromatch from 4.0.5 to 4.0.8 in /python/ray/dashboard/client
#47310 commented on
Feb 25, 2025 • 0 new comments -
[RLlib; docs] New API stack docs: Add `ConnectorV2` documentation
#47278 commented on
Feb 24, 2025 • 0 new comments -
Update py_modules.py AttributeError: module has no attribute '__path__'
#46302 commented on
Feb 25, 2025 • 0 new comments -
[WIP] CI: jemalloc & mimalloc
#46271 commented on
Feb 25, 2025 • 0 new comments -
Pinterest/release 2.9.3+pinterest5
#46175 commented on
Feb 25, 2025 • 0 new comments -
[ADAG] Detect if ADAG is at capacity for execution
#46158 commented on
Feb 25, 2025 • 0 new comments -
Revert "[doc]Make vllm example works with latest vllm version"
#46094 commented on
Feb 25, 2025 • 0 new comments -
[RLlib] - `"Synchronized"` sampling for multi-agent buffers.
#46083 commented on
Feb 25, 2025 • 0 new comments -
[serve] fix lossy serve config
#45938 commented on
Feb 25, 2025 • 0 new comments -
Fix malformed `temp_dir` path when connecting Windows workers to cluster with Linux head
#45930 commented on
Feb 25, 2025 • 0 new comments -
[spark] Fix nvidia-smi hanging issue
#45896 commented on
Feb 25, 2025 • 0 new comments -
Xgui/test subdataset
#45889 commented on
Feb 25, 2025 • 0 new comments -
Adds new working dir upload protocol PLASMA, and use it in job submission.
#45880 commented on
Feb 25, 2025 • 0 new comments -
MADDPG framework should be TensorFlow
#45863 commented on
Feb 25, 2025 • 0 new comments -
Improve code snippet in docs to set up `ray[serve]` gRPC service
#45862 commented on
Feb 25, 2025 • 0 new comments -
[WIP] Benchmark data shuffle
#45847 commented on
Feb 25, 2025 • 0 new comments -
[Core] Add warning when uploading large working dirs
#45818 commented on
Feb 25, 2025 • 0 new comments -
release 2.9.3+pinterest4
#45807 commented on
Feb 25, 2025 • 0 new comments -
remove cleaned metadata
#45643 commented on
Feb 25, 2025 • 0 new comments -
Update rllib-env.rst
#46750 commented on
Feb 25, 2025 • 0 new comments -
Introducing StaleTaskError
#46705 commented on
Feb 25, 2025 • 0 new comments -
[Core] If possible, force flush the trace when the worker ends.
#46654 commented on
Feb 25, 2025 • 0 new comments -
[dashboard] Place the submit job on a separate page
#46613 commented on
Feb 25, 2025 • 0 new comments -
[ADAG] Fix DAG input
#46604 commented on
Feb 25, 2025 • 0 new comments -
[util] remove pygloo support
#46590 commented on
Feb 25, 2025 • 0 new comments -
Fix mlflow artifact logging
#46570 commented on
Feb 25, 2025 • 0 new comments -
[dont review] release 2.10.0+pinterest1
#46551 commented on
Feb 25, 2025 • 0 new comments -
[Core]Fix the issue of actor tasks hanging during resubmission
#46539 commented on
Feb 25, 2025 • 0 new comments -
[RLlib] Optimize rnn_sequencing performance
#46502 commented on
Feb 25, 2025 • 0 new comments -
fix performance bug in arrow to numpy transform
#46433 commented on
Feb 25, 2025 • 0 new comments -
avoid merge errors when blocks contain different type in DelegatingBl…
#46407 commented on
Feb 25, 2025 • 0 new comments -
[Core] Add ray-start option 'session-name'
#46404 commented on
Feb 25, 2025 • 0 new comments -
[test] cpp20
#46380 commented on
Feb 25, 2025 • 0 new comments -
[Docker] Upgrade base deps docker python env to 3.9.7
#46353 commented on
Feb 25, 2025 • 0 new comments -
fixed a typo in ValueError message for contains_tensor
#46348 commented on
Feb 25, 2025 • 0 new comments -
[Doc] Update directory path for installation
#46318 commented on
Feb 25, 2025 • 0 new comments -
Update dyn-req-batch.md with style edits
#49725 commented on
Feb 25, 2025 • 0 new comments -
changes to get ray serve responding on REST API calls when distribute…
#49730 commented on
Feb 25, 2025 • 0 new comments -
[Ray Data Optimization] Remove block merging
#49743 commented on
Feb 25, 2025 • 0 new comments -
[RLlib] Flatten dict-typed observations before comparing them.
#49758 commented on
Feb 25, 2025 • 0 new comments -
[KubeRay] support suspending worker groups in KubeRay autoscaler
#49768 commented on
Feb 25, 2025 • 0 new comments -
[ci] remove pins in runtime_env usage in train examples
#49772 commented on
Feb 25, 2025 • 0 new comments -
Pass checkpointable args through in tf_learner
#49861 commented on
Feb 24, 2025 • 0 new comments -
[dashboard] Add SubprocessModules to the Dashboard routes, and convert HealthzHead.
#49864 commented on
Feb 25, 2025 • 0 new comments -
Fix complex (dict) observation concatenation in single agent episode
#49913 commented on
Feb 24, 2025 • 0 new comments -
Fix Databricks host URL handling in Ray Data
#49926 commented on
Feb 24, 2025 • 0 new comments -
[RFC][dashboard] Use aiohttp client for inter dependencies.
#49932 commented on
Feb 24, 2025 • 0 new comments -
[core] minor optimization for JoinPaths
#49946 commented on
Feb 24, 2025 • 0 new comments -
[RLlib; Offline] - Add single learner gpu training with preloading in `OfflinePreLearner`.
#49960 commented on
Feb 24, 2025 • 0 new comments -
Explicit comm
#49979 commented on
Feb 24, 2025 • 0 new comments -
[kuberay] fix deserialisation of custom resources in autoscaler config
#49993 commented on
Feb 24, 2025 • 0 new comments -
[dashboard] Remove the dashboard grpc server.
#50021 commented on
Feb 24, 2025 • 0 new comments -
[WIP] Move execution loop to the same thread as the constructor of an actor
#50032 commented on
Feb 24, 2025 • 0 new comments -
[core] Upgrade boost
#50039 commented on
Feb 24, 2025 • 0 new comments -
tsan
#50105 commented on
Feb 25, 2025 • 0 new comments -
[RLlib] Disable callbacks callable check for new api stack
#50157 commented on
Feb 22, 2025 • 0 new comments -
fix: WandbLogger crashing silently on a FileNotFoundError
#50308 commented on
Feb 19, 2025 • 0 new comments -
[Autoscaler][V2] Use running node instances to rate-limit upscaling
#50414 commented on
Feb 18, 2025 • 0 new comments -
[tune] Remove loguniform's base
#50415 commented on
Feb 20, 2025 • 0 new comments -
[agent] Ray metrics use the name parsing as Prometheus
#50443 commented on
Feb 25, 2025 • 0 new comments -
[Core][Dashboard] Convert JobHead to subprocess module
#50483 commented on
Feb 20, 2025 • 0 new comments -
[core] Don't build cpp api on pip install
#50499 commented on
Feb 20, 2025 • 0 new comments -
Various enhancements to Tune Keras example:
#50581 commented on
Feb 18, 2025 • 0 new comments -
[RLlib] Enable spliting and zero padding of Dict observation
#50589 commented on
Feb 21, 2025 • 0 new comments -
Integrate Ray Dataset with Daft Dataframe
#50630 commented on
Feb 20, 2025 • 0 new comments -
[WIP; RLlib] APPO accelerate (vol 17): `LearnerGroup` should not pickle remote functions on each update-call; Refactor `LearnerGroup` and `Learner` APIs.
#50665 commented on
Feb 24, 2025 • 0 new comments -
[WIP / try out] Use UV for Python 3.13 tests
#50669 commented on
Feb 25, 2025 • 0 new comments -
[core] Improve/Fix plasma object store wait logic
#50680 commented on
Feb 24, 2025 • 0 new comments -
Add new serve autoscaling parameter `scaling_function`
#47837 commented on
Feb 20, 2025 • 0 new comments -
(WIP) [ADAG] Support dag.experimental_compile(_custom_nccl_group= nccl_group) in aDAG
#47987 commented on
Feb 25, 2025 • 0 new comments -
[doc] Remove unused/unmaintained `doc/source/templates` folder
#48295 commented on
Feb 25, 2025 • 0 new comments -
[doc] fix: Typo and missing import in doc
#48311 commented on
Feb 20, 2025 • 0 new comments -
Fix invalid type for progress_reporter parameter of RunConfig
#48439 commented on
Feb 25, 2025 • 0 new comments -
remove redundant bazel dependencies
#48464 commented on
Feb 25, 2025 • 0 new comments -
Sjoshi/push manager patch
#48475 commented on
Feb 25, 2025 • 0 new comments -
[data] add opensearch datasource
#48555 commented on
Feb 24, 2025 • 0 new comments -
[Fix][GCS] Implement reconnection for RedisContext
#48781 commented on
Feb 25, 2025 • 0 new comments -
[Build][Deps] Add new `ray[azure]` extra package
#48847 commented on
Feb 25, 2025 • 0 new comments -
[train] Make dataset argument covariant
#48999 commented on
Feb 24, 2025 • 0 new comments -
[Jobs] Add metric to track duration of jobs
#49035 commented on
Feb 25, 2025 • 0 new comments -
[data] fix nodeName When the network in KubeRay is set to hostnetwork
#49188 commented on
Feb 25, 2025 • 0 new comments -
[Dashboard] stop ray submmited job through ui
#49201 commented on
Feb 25, 2025 • 0 new comments -
Fix unpacking zip package treats "../" as the top_level_directory
#49204 commented on
Feb 25, 2025 • 0 new comments -
[wandb] Use wandb Run as a context manager
#49307 commented on
Feb 25, 2025 • 0 new comments -
[Core] fail to download s3 py modules
#49332 commented on
Feb 19, 2025 • 0 new comments -
[Serve] Improve serve deploy ignore behavior
#49336 commented on
Feb 25, 2025 • 0 new comments -
Update tune-search-spaces.rst to correct outdated api use
#49386 commented on
Feb 25, 2025 • 0 new comments -
[core][compiled graphs] Support reduce scatter and all gather collective in compiled graph
#49404 commented on
Feb 25, 2025 • 0 new comments -
[Core] Persist the Driver Console Log When Job Execution Not Through Job API
#49452 commented on
Feb 25, 2025 • 0 new comments -
[data] add dataloader for lance datasource
#49459 commented on
Feb 25, 2025 • 0 new comments -
[core][cgraph] Use threadpool and one io_context for mutable object provider
#49500 commented on
Feb 25, 2025 • 0 new comments -
[ci] Remove redundant ML doctests from running in unit test pipelines
#49516 commented on
Feb 25, 2025 • 0 new comments -
[RLlib] Add NPU and HPU support to RLlib
#49535 commented on
Feb 25, 2025 • 0 new comments -
[core] Minor improvements to core worker get
#49567 commented on
Feb 25, 2025 • 0 new comments -
[core] Move observable store client logic into in memory store
#49570 commented on
Feb 25, 2025 • 0 new comments -
[core] Don't get dashboard address after each dashboard connection failure
#49584 commented on
Feb 25, 2025 • 0 new comments -
[Core] Streaming generator supports num_returns
#49586 commented on
Feb 20, 2025 • 0 new comments -
[Dashboard] Support multiple accelerator monitoring and flexible display
#49610 commented on
Feb 25, 2025 • 0 new comments -
[Core] Add virtual cluster
#49717 commented on
Feb 24, 2025 • 0 new comments -
[RLlib] Fix broken stats accumulation for 'MeanStdFilter' connector.
#49718 commented on
Feb 25, 2025 • 0 new comments -
[Distributed Debugger] Newly added breakpoint not works: Breakpoint in file that does not exist
#48778 commented on
Feb 23, 2025 • 0 new comments -
[vLLM] The ray serve using vLLM example on the website does not work.
#50275 commented on
Feb 22, 2025 • 0 new comments -
Test issue (please ignore) - more text then even more text
#49867 commented on
Feb 22, 2025 • 0 new comments -
[Ray Serve] Expose public interface for user to customize the router
#50465 commented on
Feb 21, 2025 • 0 new comments -
[tune] Can't reload a past experiment (pickling error in pyarrow?)
#46740 commented on
Feb 21, 2025 • 0 new comments -
[Jobs] Add metric to track duration of jobs
#48962 commented on
Feb 21, 2025 • 0 new comments -
[RLlib] Adding transformer models
#50648 commented on
Feb 21, 2025 • 0 new comments -
[RLlib] [Solution Found] Action masking example not working when LSTM enabled.
#50526 commented on
Feb 21, 2025 • 0 new comments -
CI test linux://rllib:test_offline_prelearner is flaky
#50340 commented on
Feb 21, 2025 • 0 new comments -
Cannot Install ray[rllib] on Python 3.13
#50226 commented on
Feb 21, 2025 • 0 new comments -
[RLlib] MetricsLogger API problem with series-based data
#50294 commented on
Feb 21, 2025 • 0 new comments -
Release test rllib_learning_tests_pong_appo_torch.aws failed
#50217 commented on
Feb 21, 2025 • 0 new comments -
[RLlib] PPO algorithm can't be trained from checkpoint
#50136 commented on
Feb 21, 2025 • 0 new comments -
[RLlib] Callbacks class input check is invalid for new api stack
#50135 commented on
Feb 21, 2025 • 0 new comments -
[RLlib] "TypeError: 'int' object is not iterable when using from_jsonable with nested Discrete in Dict"
#50131 commented on
Feb 21, 2025 • 0 new comments -
[RLlib] TFLearner does not correctly implement `restore_from_path`
#49860 commented on
Feb 21, 2025 • 0 new comments -
[RFC] LLM APIs for Ray Data and Ray Serve
#50639 commented on
Feb 20, 2025 • 0 new comments -
[Core] Python 3.13 wheel
#49738 commented on
Feb 20, 2025 • 0 new comments -
[AIR: TransformersTrainer] Local rank detection from Deepspeed conflicts with Ray multi-node provision
#37212 commented on
Feb 24, 2025 • 0 new comments -
`gcp_cluster_launcher_full` release test failure
#37276 commented on
Feb 24, 2025 • 0 new comments -
[core] Dedup session name/cluster ID
#37546 commented on
Feb 24, 2025 • 0 new comments -
Object error when running /ray-air/examples/torch_image_example.html on cluster
#37623 commented on
Feb 24, 2025 • 0 new comments -
[Tune] Unset signal catching event after tune.run() finished.
#37737 commented on
Feb 24, 2025 • 0 new comments -
[AIR] Multiple API Changes in 2.7
#37868 commented on
Feb 24, 2025 • 0 new comments -
[data] RefBundle doesn't always eagerly free data
#37910 commented on
Feb 24, 2025 • 0 new comments -
page /serve/tutorials/streaming.html adding Token Streaming LLM example for serve
#38094 commented on
Feb 24, 2025 • 0 new comments -
[core] Recent windows test flakiness
#38413 commented on
Feb 24, 2025 • 0 new comments -
Registering & Using OpenAI Gym environments from the RoboHive API in Ray RLlib & Tune
#37231 commented on
Feb 24, 2025 • 0 new comments -
[Core|Data] read_csv(): Exception from as task of operator "ReadCSV->SplitBlocks(100)"
#47839 commented on
Feb 24, 2025 • 0 new comments -
Unexpected node deaths cannot be recovered from checkpoints
#46814 commented on
Feb 24, 2025 • 0 new comments -
[aDAG] Support torch profiling with configurable parameters
#47745 commented on
Feb 24, 2025 • 0 new comments -
[environments] Support PEX executables
#15518 commented on
Feb 24, 2025 • 0 new comments -
Make infeasible tasks error much more obvious
#45909 commented on
Feb 24, 2025 • 0 new comments -
[Ray Complied Graph] NCCL Internal Error
#49827 commented on
Feb 24, 2025 • 0 new comments -
[RLlib] Unable to replicate original PPO performance
#45655 commented on
Feb 24, 2025 • 0 new comments -
[Core] Plugable storage backend besides Redis
#50656 commented on
Feb 24, 2025 • 0 new comments -
Fatal Python error: Segmentation fault
#49998 commented on
Feb 20, 2025 • 0 new comments -
[Core] Spot preemption related retries do not count towards the max retries
#50640 commented on
Feb 19, 2025 • 0 new comments -
RAY_IGNORE_VERSION_MISMATCH=True has no effect on ray start --addr.
#50511 commented on
Feb 19, 2025 • 0 new comments -
Ray's execution of the command "start --head" does not perform deduplication.
#50510 commented on
Feb 19, 2025 • 0 new comments -
Exception: The current node timed out during startup. This could happen because some of the Ray processes failed to startup.
#50474 commented on
Feb 19, 2025 • 0 new comments -
Compiled Graphs torch.Tensor serialization device
#50452 commented on
Feb 19, 2025 • 0 new comments -
Fatal Python error: Floating point exception when running on H20
#50418 commented on
Feb 19, 2025 • 0 new comments -
[StateAPI] StateAPI request truncates recent elements
#50378 commented on
Feb 19, 2025 • 0 new comments -
【bug】Ray.data.write_parquet will write twice when use fsspec local filesystem
#49741 commented on
Feb 19, 2025 • 0 new comments -
[data] sort large dataset by ray.data.Dataset always fail
#49679 commented on
Feb 19, 2025 • 0 new comments -
[Data, Train] ray::SplitCoordinator is very slow at every epoch + takes up too much memory
#49190 commented on
Feb 19, 2025 • 0 new comments -
[Data]Fuse operator
#49587 commented on
Feb 19, 2025 • 0 new comments -
CI test linux://rllib:learning_tests_multi_agent_cartpole_ppo_multi_cpu is flaky
#47465 commented on
Feb 19, 2025 • 0 new comments -
[Agent] Make sure Ray metrics perform same validation as Prometheus to weed out invalid names early on
#40586 commented on
Feb 19, 2025 • 0 new comments -
[Air] WandB logger / _WandbLoggingActor crashes silently when logging a video with relative path or if it cannot find the given file.
#50307 commented on
Feb 19, 2025 • 0 new comments -
[core][logging] Customizable Python standard log attributes.
#49502 commented on
Feb 19, 2025 • 0 new comments -
[core] Add a util function to initialize NCCL communicator
#50681 commented on
Feb 19, 2025 • 0 new comments -
trial_name_creator does not change name in get_dataframe()
#50635 commented on
Feb 19, 2025 • 0 new comments -
[Data] ray.data.from_torch fails on datasets with variable shaped images
#50229 commented on
Feb 18, 2025 • 0 new comments -
[RLlib] 'MeanStdFilter' connector is broken.
#49716 commented on
Feb 20, 2025 • 0 new comments -
[<Ray component: Core|RLlib|etc...>]
#49609 commented on
Feb 20, 2025 • 0 new comments -
The mistake of using multi-head networks
#49582 commented on
Feb 20, 2025 • 0 new comments -
[RLlib][Windows] Windows Invalid Directory Name Error in Ray RLlib
#49477 commented on
Feb 20, 2025 • 0 new comments -
[RLlib|Custom Policy]Custom Policy Implementation in Reinforcement Learning
#49334 commented on
Feb 20, 2025 • 0 new comments -
[usability][Feature] Throw error message if resolved ip address doesn't match the localhost
#19052 commented on
Feb 20, 2025 • 0 new comments -
[Core] Getting node id for usage in NodeAffinitySchedulingStrategy
#28195 commented on
Feb 20, 2025 • 0 new comments -
[Core] Remove pg resource notation from user-facing APIs.
#31064 commented on
Feb 20, 2025 • 0 new comments -
[Core] classmethod support for actors
#36986 commented on
Feb 20, 2025 • 0 new comments -
[core] Check if a ray task has errored without calling `ray.get` on it
#45229 commented on
Feb 20, 2025 • 0 new comments -
[Serve] Use concurrency group
#20054 commented on
Feb 20, 2025 • 0 new comments -
[Clusters][Azure] Custom ARM template for Azure Clusters
#50684 commented on
Feb 20, 2025 • 0 new comments -
[Core] Too many threads in ray worker
#36936 commented on
Feb 20, 2025 • 0 new comments -
[Ray debugger] Unable to use debugger on Ray Cluster on k8s
#45541 commented on
Feb 20, 2025 • 0 new comments -
[Core] Please provide better message where 'RuntimeError: Failed to unpickle serialized exception'
#49885 commented on
Feb 20, 2025 • 0 new comments -
[core] Cover cpplint for ray/src/ray/raylet
#50687 commented on
Feb 20, 2025 • 0 new comments -
[Ray Core] Task/resource not properly reclaimed after ray job exception or stopped
#47531 commented on
Feb 20, 2025 • 0 new comments -
[Core] Why does the statistical information of node report that the message is too large?
#50661 commented on
Feb 20, 2025 • 0 new comments -
[WIP] poc / hack relpath
#45003 commented on
Feb 25, 2025 • 0 new comments -
[testing] cumulative pinterest upgrades for 2.10
#45000 commented on
Feb 25, 2025 • 0 new comments -
add more execution and iteration metrics to prometheus
#44971 commented on
Feb 25, 2025 • 0 new comments -
[Core] Profile Ray start
#44818 commented on
Feb 25, 2025 • 0 new comments -
Update bert.ipynb
#44455 commented on
Feb 25, 2025 • 0 new comments -
[RLlib] Cleanup `examples` folder 02: Add shared value function example script for MultiAgentRLModule.
#44421 commented on
Feb 25, 2025 • 0 new comments -
change naming to intel gaudi habana for ray train example
#44412 commented on
Feb 25, 2025 • 0 new comments -
[misc] Reformat RLLib BUILD files
#44153 commented on
Feb 25, 2025 • 0 new comments -
[misc] Reformat train/tune BUILD files
#44151 commented on
Feb 25, 2025 • 0 new comments -
verify windows wheels.
#43442 commented on
Feb 25, 2025 • 0 new comments -
[docs][clusters] Improve instructions for GPU autodetection and manual cluster launching
#43219 commented on
Feb 25, 2025 • 0 new comments -
[RLlib] ConnectorV2 API: Add heuristic action logits mixin example script.
#43107 commented on
Feb 25, 2025 • 0 new comments -
[docs][Serve] add text about pip-pack installation
#41088 commented on
Feb 25, 2025 • 0 new comments -
[docs] Documentation fixes (logging and profiling)
#40915 commented on
Feb 25, 2025 • 0 new comments -
[docs] [train] copy edits to E2E ingest example
#40176 commented on
Feb 25, 2025 • 0 new comments -
[Logging] Fix Deduplication URL
#39830 commented on
Feb 25, 2025 • 0 new comments -
Upgrade default AWS DLAMI
#39721 commented on
Feb 25, 2025 • 0 new comments -
[ci] remove is_automated_build in setup.py
#36547 commented on
Feb 25, 2025 • 0 new comments -
Unpin setproctitile
#45640 commented on
Feb 25, 2025 • 0 new comments -
[core][2/2] Kill worker on root detached actor died.
#45638 commented on
Feb 25, 2025 • 0 new comments -
[core][1/2] Add SubscribeAllActors to GcsClient.
#45637 commented on
Feb 25, 2025 • 0 new comments -
[core] Eagerly kill idle workers on job finish.
#45633 commented on
Feb 25, 2025 • 0 new comments -
Create a singleton io context and thread, and standalone gcs client on it.
#45524 commented on
Feb 25, 2025 • 0 new comments -
[train] Update Torch default timeout_s to use Torch's default timeout
#45501 commented on
Feb 25, 2025 • 0 new comments -
blind try on ubuntu upgrade ..
#45427 commented on
Feb 25, 2025 • 0 new comments -
[Data] Allow configuration of MAX_IMAGE_PIXELS in ImageDatasource
#45415 commented on
Feb 25, 2025 • 0 new comments -
add ray debugger references to ray docs
#45414 commented on
Feb 25, 2025 • 0 new comments -
[RLlib] Enhance callbacks test case for EnvRunners; Add (optional) explicit `enable_multi_agent` setting to AlgorithmConfig.
#45385 commented on
Feb 25, 2025 • 0 new comments -
[POC][core] GcsClient async binding, aka remove PythonGcsClient.
#45289 commented on
Feb 25, 2025 • 0 new comments -
[core] Change all object_size to uint64_t and use 0 for unknown. Also adds a method `ray.experimental.get_local_object_locations`
#45247 commented on
Feb 25, 2025 • 0 new comments -
[RFC] Splitted Dashboard Heads.
#45175 commented on
Feb 25, 2025 • 0 new comments -
[RLlib] Fix async (multiprocessing) gymnasium vector envs in `SingleAgentEnvRunner`.
#45144 commented on
Feb 25, 2025 • 0 new comments -
[RLlib; Tune] Fix default behavior of default tune `CLIReporter` (based on `Algorithm._progress_metrics`).
#45122 commented on
Feb 25, 2025 • 0 new comments -
[wip][train][tune] handle s3fs permissions
#45100 commented on
Feb 25, 2025 • 0 new comments -
Add roundtrip (ping-pong) microbenchmarks for accelerated DAG channels
#45064 commented on
Feb 25, 2025 • 0 new comments -
[WIP] add env var to enable debug
#45009 commented on
Feb 25, 2025 • 0 new comments -
lonnie's workspace
#36406 commented on
Feb 25, 2025 • 0 new comments -
[Feature] Remote call timeout required.
#18916 commented on
Feb 25, 2025 • 0 new comments -
[Dashboard] Explain Disk usage for KubeRay
#36362 commented on
Feb 25, 2025 • 0 new comments -
[Doc] Broken links due to the o11y refactoring
#36330 commented on
Feb 25, 2025 • 0 new comments -
[Train] Provide a list of models for people to choose from in the HF transformer example
#36837 commented on
Feb 25, 2025 • 0 new comments -
[Serve| Observability] Show the duration of each request
#36633 commented on
Feb 25, 2025 • 0 new comments -
[RLlib] Deprecate methods/configs for RLlib 2.0
#36592 commented on
Feb 24, 2025 • 0 new comments -
Tests are not being skipped when they fail more than 5 times in a row.
#36499 commented on
Feb 24, 2025 • 0 new comments -
[data] Add migration guide for DataConfig
#36668 commented on
Feb 24, 2025 • 0 new comments -
[Workflow] Logger can't handle PyTorch Lightning progress bar
#33106 commented on
Feb 24, 2025 • 0 new comments -
Rllib - is_training never true
#35833 commented on
Feb 24, 2025 • 0 new comments -
[Serve| Observability] The "Logs" tab is very overwhelming
#36632 commented on
Feb 24, 2025 • 0 new comments -
[runtime env] make the `working_dir` field more flexible
#31588 commented on
Feb 24, 2025 • 0 new comments -
GradioIngress do not support Blocks.queue()
#36977 commented on
Feb 24, 2025 • 0 new comments -
random.sample
#35245 commented on
Feb 24, 2025 • 0 new comments -
Issue on page /ray-core/objects/object-spilling.html
#37002 commented on
Feb 24, 2025 • 0 new comments -
Issue on page /cluster/vms/user-guides/community/slurm.html
#35711 commented on
Feb 24, 2025 • 0 new comments -
[Core] [RLlib] single machine inside kubernetes/docker hangs at ray init
#37144 commented on
Feb 24, 2025 • 0 new comments -
CheckpointConfig does not work on Windows
#37226 commented on
Feb 24, 2025 • 0 new comments -
[serve] Make latency buckets configurable
#38223 commented on
Feb 25, 2025 • 0 new comments -
CI test linux://python/ray/data:test_arrow_block is flaky
#48859 commented on
Feb 25, 2025 • 0 new comments -
[<Ray component: Serve>] Support strawberry-graphql in serve with FastAPI
#50677 commented on
Feb 25, 2025 • 0 new comments -
[Ray Core] After the ray job is finished, it will stably trigger resource leakage
#49999 commented on
Feb 25, 2025 • 0 new comments -
[Core] ray.actor.exit_actor() does not seem to work from within an async background thread
#49451 commented on
Feb 25, 2025 • 0 new comments -
[Epic][CI] Migrate linter to Ruff
#47991 commented on
Feb 25, 2025 • 0 new comments -
[RLlib] Attribute error when trying to compute action after training Multi Agent PPO with New API Stack
#44475 commented on
Feb 25, 2025 • 0 new comments -
[core] Cover cpplint for all C++ folders
#50583 commented on
Feb 25, 2025 • 0 new comments -
[Dashboard] Update speedscope from 1.5.3 to the latest 1.13.0
#23118 commented on
Feb 25, 2025 • 0 new comments -
[KubeRay] Documentation of using custom docker images with KubeRay needs improvements
#31641 commented on
Feb 25, 2025 • 0 new comments -
[Core] Collecting and monitoring metrics documentation should reflect different possible grafana.ini file paths
#42430 commented on
Feb 25, 2025 • 0 new comments -
[core] Split giant ray core C++ targets into small ones
#50586 commented on
Feb 25, 2025 • 0 new comments -
[core][runtime env] Remove code_search_path for job config and unify this functionality in Runtime Env
#26784 commented on
Feb 25, 2025 • 0 new comments -
[serve] Status should switch from UPDATING to UPSCALING when in [min, max_replica] range
#41038 commented on
Feb 25, 2025 • 0 new comments -
[RLlib] Moving all Envs to have the c'tor signature
#21611 commented on
Feb 25, 2025 • 0 new comments -
[RFC] More programmable API that has same output as `ray memory` command
#13792 commented on
Feb 25, 2025 • 0 new comments -
[Rllib] InvalidArgumentError: cannot compute ConcatV2 as input #1(zero-based) was expected to be a double tensor but is a float tensor
#36364 commented on
Feb 25, 2025 • 0 new comments -
How to get started with the ray docker image?
#36533 commented on
Feb 25, 2025 • 0 new comments