Description
Describe the bug
Grid sensor isn't working: Leads to division by zero somewhere
To Reproduce
Use the default grid sensor component with basic parameters, nothing more, I was able to create a minimalist env with the same issue. If the gridsensor gameobject is disabled, the training start normaly, when I enable the gridsensor, it's crashing.
One wierd thing is the the warning: "[WARNING] Trainer has no policies, not saving anything." when the gridsensor is enabled.
EDIT : I've been able to reproduce the bug in the ml-agent examples, and to make my minimalist env working, by tweaking the grid size parameters. Changing the foodCollector agent grid size to 8,0,8 or 16,0,16 will give differents error. Moving grid size from 16,0,16 to 20,0,20 in my minimalist env makes the training run fine.
Console logs / stack traces
Version information:
ml-agents: 0.26.0,
ml-agents-envs: 0.26.0,
Communicator API: 1.5.0,
PyTorch: 1.7.1+cu110
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
[INFO] Connected to Unity environment with package version 2.0.0-exp.1 and communication version 1.5.0
[INFO] Connected new brain: HeroBehaviorAlexisMonoHero?team=0
[INFO] Hyperparameters for behavior name HeroBehaviorAlexisMonoHero:
trainer_type: ppo
hyperparameters:
batch_size: 256
buffer_size: 2048
learning_rate: 0.0003
beta: 0.01
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: False
hidden_units: 256
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
network_settings:
normalize: False
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
init_path: None
keep_checkpoints: 5
checkpoint_interval: 500000
max_steps: 20000000
time_horizon: 64
summary_freq: 10000
threaded: False
self_play: None
behavioral_cloning: None
[WARNING] Trainer has no policies, not saving anything.
Traceback (most recent call last):
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\alexi\anaconda3\envs\mlagentconda37\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module>
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\learn.py", line 250, in main
run_cli(parse_command_line())
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\learn.py", line 246, in run_cli
run_training(run_seed, options)
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\learn.py", line 125, in run_training
tc.start_learning(env_manager)
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\trainer_controller.py", line 173, in start_learning
self._reset_env(env_manager)
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\trainer_controller.py", line 107, in _reset_env
self._register_new_behaviors(env_manager, env_manager.first_step_infos)
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\trainer_controller.py", line 268, in _register_new_behaviors
self._create_trainers_and_managers(env_manager, new_behavior_ids)
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\trainer_controller.py", line 166, in _create_trainers_and_managers
self._create_trainer_and_manager(env_manager, behavior_id)
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\trainer_controller.py", line 140, in _create_trainer_and_manager
create_graph=True,
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\trainer\rl_trainer.py", line 119, in create_policy
return self.create_torch_policy(parsed_behavior_id, behavior_spec)
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\ppo\trainer.py", line 231, in create_torch_policy
separate_critic=True, # Match network architecture with TF
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 70, in __init__
tanh_squash=tanh_squash,
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\torch\networks.py", line 592, in __init__
self.network_body = NetworkBody(observation_specs, network_settings)
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\torch\networks.py", line 194, in __init__
self.normalize,
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\torch\networks.py", line 54, in __init__
normalize=normalize,
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\torch\utils.py", line 207, in create_input_processors
obs_spec, normalize, h_size, attention_embedding_size, vis_encode_type
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\torch\utils.py", line 162, in get_encoder_for_obs
return (visual_encoder_class(shape[0], shape[1], shape[2], h_size), h_size)
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\torch\encoders.py", line 174, in __init__
kernel_gain=1.41, # Use ReLU gain
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\torch\layers.py", line 49, in linear_layer
layer = torch.nn.Linear(input_size, output_size)
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\torch\nn\modules\linear.py", line 83, in __init__
self.reset_parameters()
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\torch\nn\modules\linear.py", line 86, in reset_parameters
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\torch\nn\init.py", line 381, in kaiming_uniform_
std = gain / math.sqrt(fan)
ZeroDivisionError: float division by zero
Environment (please complete the following information):
- Unity Version: Unity 2019.4.12f1 same with Unity 2020
- OS + version: Windows 10
- ML-Agents version: release 17/ 2.0.0-exp1
- Torch version: 1.7.1+cu110 (I use conda)