Skip to content

[RLlib] Unexpected KeyError while training SAC #54284

@Vamsi-lg

Description

@Vamsi-lg

What happened + What you expected to happen

Hello Ray team,

I am opening this issue here as told by @Lars_Simon_Zehnder in one of the links below. And, I am are coming from this post on ray discussion forum. Multiple posts report the similar issue, below are the links

Ray Discussion Links:

  1. Discussion-1
  2. Discussion-2
  3. Discussion-3

Issue Details:

What I expected?

I am running the below setup for 100k steps across 1000 iterations. It does successfully run for a few iterations every time. Below is the code snippet.

What happened?

But almost every time after reaching nearly 40k-50k steps it results in a key error. And the key error is always the same number: 2097151, even across all the posts/discussions linked above. Below is the entire error trace.

Versions / Dependencies

  • Ray version: 2.47.1
  • Python version: 3.11.11
  • OS: Mac Sequoia 15.5

Reproduction script

Config:

SACConfig()
        .environment(
            env='custom_env',
            env_config=env_config
        )
        .training(
            actor_lr=7e-5,
            critic_lr=2e-5,
            alpha_lr=2e-4,
            initial_alpha=0.1,
            target_entropy="auto",
            n_step=1,
            tau=0.005,
            train_batch_size_per_learner=512,
            target_network_update_freq=1,
            num_steps_sampled_before_learning_starts=3000,
            grad_clip=1.0
        )
        .learners(
            num_learners=1,
            num_cpus_per_learner=7
        )
        .rl_module(
            model_config=DefaultModelConfig(
                fcnet_hiddens=[360, 250, 180],
                fcnet_activation="relu",
                fcnet_kernel_initializer=nn.init.xavier_uniform_,
                fcnet_kernel_initializer_kwargs={"gain": 1.0},
                head_fcnet_hiddens=[100, 30],
                head_fcnet_activation='relu',
                head_fcnet_kernel_initializer=nn.init.xavier_uniform_,
                head_fcnet_kernel_initializer_kwargs={"gain": 0.01}
                )
        )
        .evaluation(
            evaluation_interval=3,
            evaluation_num_env_runners=1,
            evaluation_duration=5,
            evaluation_parallel_to_training=True,
        )
        .reporting(
            metrics_num_episodes_for_smoothing=5,
            min_sample_timesteps_per_iteration=200
        )
        .api_stack(
            enable_rl_module_and_learner=True, 
            enable_env_runner_and_connector_v2=True
        )

Trace:

File "code/.venv/lib/python3.11/site-packages/ray/tune/trainable/trainable.py", line 330, in train
    raise skipped from exception_cause(skipped)
  File "/code/.venv/lib/python3.11/site-packages/ray/tune/trainable/trainable.py", line 327, in train
    result = self.step()
             ^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/.venv/lib/python3.11/site-packages/ray/rllib/algorithms/algorithm.py", line 1035, in step
    train_results, train_iter_ctx = self._run_one_training_iteration()
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/.venv/lib/python3.11/site-packages/ray/rllib/algorithms/algorithm.py", line 3309, in _run_one_training_iteration
    training_step_return_value = self.training_step()
                                 ^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/.venv/lib/python3.11/site-packages/ray/rllib/algorithms/dqn/dqn.py", line 644, in training_step
    return self._training_step_new_api_stack()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/.venv/lib/python3.11/site-packages/ray/rllib/algorithms/dqn/dqn.py", line 686, in _training_step_new_api_stack
    episodes = self.local_replay_buffer.sample(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/.venv/lib/python3.11/site-packages/ray/rllib/utils/replay_buffers/prioritized_episode_buffer.py", line 477, in sample
    index_triple = self._indices[self._tree_idx_to_sample_idx[idx]]
                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
KeyError: 2097151

Issue Severity

High: It blocks me from completing my task.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Important issue, but not time-criticalbugSomething that is supposed to be working; but isn'trllibRLlib related issuesrllib-algorithmsAn RLlib algorithm/Trainer is not learning.stability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions