Skip to content

[RLlib] Attribute error when trying to compute action after training Multi Agent PPO with New API Stack #44475

Open
@Dr-IceCream

Description

What happened + What you expected to happen

After training Multi Agent PPO with new New API Stack under the guidance of how-to-use-the-new-api-stack
I tried to compute actions:

    saved_algorithm = Algorithm.from_checkpoint(
        checkpoint=algorithm_path,
        policy_ids={"controlled_vehicle_0", "controlled_vehicle_1"},
        policy_mapping_fn=lambda agent_id, episode, **kwargs: f"controlled_vehicle_{agent_id}",
    )
    print("saved_algorithm type:", type(saved_algorithm))
    # Evaluate the model
    obs, info = env.reset()
    print("obs:", obs)
    actions = {}
    for agent_id, agent_obs in obs.items():
        policy_id = f"controlled_vehicle_{agent_id}"
        action = saved_algorithm.get_policy(policy_id).compute_single_action(agent_obs)
        actions[agent_id] = action
    print("action", actions)

but I get the error message:

AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy'

I also tried some other way like:
action = saved_algorithm.compute_single_action(agent_obs, policy_id)
but still get the same error message: AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy'.
I have seen a similar issue in #40312, are these two issues the same?

detailed error message are as follows:

Traceback (most recent call last):
File "test_evaluate.py", line 151, in
evaluate_agent(saved_algorithm, env)
File "test_evaluate.py", line 112, in evaluate_agent
action = saved_algorithm.get_policy(policy_id).compute_single_action(agent_obs)
File "C:\Users\Ice Cream\miniconda3\envs\env_highway\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
return method(self, *_args, **_kwargs)
File "C:\Users\Ice Cream\miniconda3\envs\env_highway\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2051, in get_policy
return self.workers.local_worker().get_policy(policy_id)
AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy'

and before I call this method, I also printed the relevant info, this part looks normal:

saved_algorithm type: <class 'ray.rllib.algorithms.ppo.ppo.PPO'>
saved_algorithm.get_config() <ray.rllib.algorithms.ppo.ppo.PPOConfig object at 0x0000014B0E4D4370>

through the code:

    print("saved_algorithm type:", type(saved_algorithm))
    print("saved_algorithm.get_config()",saved_algorithm.get_config())

Versions / Dependencies

Ray 2.10.0
Python 3.8.18
Windows11

Reproduction script

the code used for training is as follows:

    register_env("ray_dict_highway_env", create_env)
    config = (
        PPOConfig().environment(env="ray_dict_highway_env")
        .experimental(_enable_new_api_stack=True)
        .rollouts(env_runner_cls=MultiAgentEnvRunner)
        .resources(
            num_learner_workers=0,
            num_gpus_per_learner_worker=1,
            num_cpus_for_local_worker=1,
        )
        .training(model={"uses_new_env_runners": True})
        .multi_agent(
            policies={
                "controlled_vehicle_0",
                "controlled_vehicle_1"
            },
            policy_mapping_fn=lambda agent_id, episode, **kwargs: f"controlled_vehicle_{agent_id}",
        )
        .framework("torch")
    )
    current_script_directory = os.path.dirname(os.path.abspath(__file__))
    ray_result_path = os.path.join(current_script_directory, folder_path, "ray_results")
    tuner = tune.Tuner(
        "PPO",
        run_config=RunConfig(
            storage_path=ray_result_path,
            name="2-agent-PPO",
            stop={"timesteps_total": 5e5}
        ),
        param_space=config.to_dict() 
    )
    results = tuner.fit()

And the code for loading checkpoints:

algorithm_path = r"D:\DRL_Project\DRL_highway\experiments\hw-fast-ma-dict-v0_rllib-mappo\2024-04-01_01-28\ray_results\2-agent-PPO\PPO_ray_dict_highway_env_1c7ab_00000_0_2024-04-01_01-28-38\checkpoint_000000"
saved_algorithm = Algorithm.from_checkpoint(
        checkpoint=algorithm_path,
        policy_ids={"controlled_vehicle_0", "controlled_vehicle_1"},
        policy_mapping_fn=lambda agent_id, episode, **kwargs: f"controlled_vehicle_{agent_id}",
    )

Issue Severity

High: It blocks me from completing my task.

Metadata

Assignees

Labels

P2Important issue, but not time-criticalbugSomething that is supposed to be working; but isn'trllibRLlib related issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions