[RLlib] Attribute error when trying to compute action after training Multi Agent PPO with New API Stack #44475
Description
What happened + What you expected to happen
After training Multi Agent PPO with new New API Stack under the guidance of how-to-use-the-new-api-stack
I tried to compute actions:
saved_algorithm = Algorithm.from_checkpoint(
checkpoint=algorithm_path,
policy_ids={"controlled_vehicle_0", "controlled_vehicle_1"},
policy_mapping_fn=lambda agent_id, episode, **kwargs: f"controlled_vehicle_{agent_id}",
)
print("saved_algorithm type:", type(saved_algorithm))
# Evaluate the model
obs, info = env.reset()
print("obs:", obs)
actions = {}
for agent_id, agent_obs in obs.items():
policy_id = f"controlled_vehicle_{agent_id}"
action = saved_algorithm.get_policy(policy_id).compute_single_action(agent_obs)
actions[agent_id] = action
print("action", actions)
but I get the error message:
AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy'
I also tried some other way like:
action = saved_algorithm.compute_single_action(agent_obs, policy_id)
but still get the same error message: AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy'.
I have seen a similar issue in #40312, are these two issues the same?
detailed error message are as follows:
Traceback (most recent call last):
File "test_evaluate.py", line 151, in
evaluate_agent(saved_algorithm, env)
File "test_evaluate.py", line 112, in evaluate_agent
action = saved_algorithm.get_policy(policy_id).compute_single_action(agent_obs)
File "C:\Users\Ice Cream\miniconda3\envs\env_highway\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
return method(self, *_args, **_kwargs)
File "C:\Users\Ice Cream\miniconda3\envs\env_highway\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2051, in get_policy
return self.workers.local_worker().get_policy(policy_id)
AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy'
and before I call this method, I also printed the relevant info, this part looks normal:
saved_algorithm type: <class 'ray.rllib.algorithms.ppo.ppo.PPO'>
saved_algorithm.get_config() <ray.rllib.algorithms.ppo.ppo.PPOConfig object at 0x0000014B0E4D4370>
through the code:
print("saved_algorithm type:", type(saved_algorithm))
print("saved_algorithm.get_config()",saved_algorithm.get_config())
Versions / Dependencies
Ray 2.10.0
Python 3.8.18
Windows11
Reproduction script
the code used for training is as follows:
register_env("ray_dict_highway_env", create_env)
config = (
PPOConfig().environment(env="ray_dict_highway_env")
.experimental(_enable_new_api_stack=True)
.rollouts(env_runner_cls=MultiAgentEnvRunner)
.resources(
num_learner_workers=0,
num_gpus_per_learner_worker=1,
num_cpus_for_local_worker=1,
)
.training(model={"uses_new_env_runners": True})
.multi_agent(
policies={
"controlled_vehicle_0",
"controlled_vehicle_1"
},
policy_mapping_fn=lambda agent_id, episode, **kwargs: f"controlled_vehicle_{agent_id}",
)
.framework("torch")
)
current_script_directory = os.path.dirname(os.path.abspath(__file__))
ray_result_path = os.path.join(current_script_directory, folder_path, "ray_results")
tuner = tune.Tuner(
"PPO",
run_config=RunConfig(
storage_path=ray_result_path,
name="2-agent-PPO",
stop={"timesteps_total": 5e5}
),
param_space=config.to_dict()
)
results = tuner.fit()
And the code for loading checkpoints:
algorithm_path = r"D:\DRL_Project\DRL_highway\experiments\hw-fast-ma-dict-v0_rllib-mappo\2024-04-01_01-28\ray_results\2-agent-PPO\PPO_ray_dict_highway_env_1c7ab_00000_0_2024-04-01_01-28-38\checkpoint_000000"
saved_algorithm = Algorithm.from_checkpoint(
checkpoint=algorithm_path,
policy_ids={"controlled_vehicle_0", "controlled_vehicle_1"},
policy_mapping_fn=lambda agent_id, episode, **kwargs: f"controlled_vehicle_{agent_id}",
)
Issue Severity
High: It blocks me from completing my task.