Skip to content

Grid sensor : Division by zero #5342

Closed
Closed
@TheTrope

Description

@TheTrope

Describe the bug
Grid sensor isn't working: Leads to division by zero somewhere

To Reproduce
Use the default grid sensor component with basic parameters, nothing more, I was able to create a minimalist env with the same issue. If the gridsensor gameobject is disabled, the training start normaly, when I enable the gridsensor, it's crashing.
One wierd thing is the the warning: "[WARNING] Trainer has no policies, not saving anything." when the gridsensor is enabled.

EDIT : I've been able to reproduce the bug in the ml-agent examples, and to make my minimalist env working, by tweaking the grid size parameters. Changing the foodCollector agent grid size to 8,0,8 or 16,0,16 will give differents error. Moving grid size from 16,0,16 to 20,0,20 in my minimalist env makes the training run fine.

Console logs / stack traces

Version information:
  ml-agents: 0.26.0,
  ml-agents-envs: 0.26.0,
  Communicator API: 1.5.0,
  PyTorch: 1.7.1+cu110
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
[INFO] Connected to Unity environment with package version 2.0.0-exp.1 and communication version 1.5.0
[INFO] Connected new brain: HeroBehaviorAlexisMonoHero?team=0
[INFO] Hyperparameters for behavior name HeroBehaviorAlexisMonoHero:
        trainer_type:   ppo
        hyperparameters:
          batch_size:   256
          buffer_size:  2048
          learning_rate:        0.0003
          beta: 0.01
          epsilon:      0.2
          lambd:        0.95
          num_epoch:    3
          learning_rate_schedule:       linear
        network_settings:
          normalize:    False
          hidden_units: 256
          num_layers:   2
          vis_encode_type:      simple
          memory:       None
          goal_conditioning_type:       hyper
        reward_signals:
          extrinsic:
            gamma:      0.99
            strength:   1.0
            network_settings:
              normalize:        False
              hidden_units:     128
              num_layers:       2
              vis_encode_type:  simple
              memory:   None
              goal_conditioning_type:   hyper
        init_path:      None
        keep_checkpoints:       5
        checkpoint_interval:    500000
        max_steps:      20000000
        time_horizon:   64
        summary_freq:   10000
        threaded:       False
        self_play:      None
        behavioral_cloning:     None
[WARNING] Trainer has no policies, not saving anything.
Traceback (most recent call last):
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\alexi\anaconda3\envs\mlagentconda37\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module>
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\learn.py", line 250, in main
    run_cli(parse_command_line())
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\learn.py", line 246, in run_cli
    run_training(run_seed, options)
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\learn.py", line 125, in run_training
    tc.start_learning(env_manager)
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\trainer_controller.py", line 173, in start_learning
    self._reset_env(env_manager)
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\trainer_controller.py", line 107, in _reset_env
    self._register_new_behaviors(env_manager, env_manager.first_step_infos)
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\trainer_controller.py", line 268, in _register_new_behaviors
    self._create_trainers_and_managers(env_manager, new_behavior_ids)
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\trainer_controller.py", line 166, in _create_trainers_and_managers
    self._create_trainer_and_manager(env_manager, behavior_id)
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\trainer_controller.py", line 140, in _create_trainer_and_manager
    create_graph=True,
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\trainer\rl_trainer.py", line 119, in create_policy
    return self.create_torch_policy(parsed_behavior_id, behavior_spec)
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\ppo\trainer.py", line 231, in create_torch_policy
    separate_critic=True,  # Match network architecture with TF
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 70, in __init__
    tanh_squash=tanh_squash,
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\torch\networks.py", line 592, in __init__
    self.network_body = NetworkBody(observation_specs, network_settings)
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\torch\networks.py", line 194, in __init__
    self.normalize,
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\torch\networks.py", line 54, in __init__
    normalize=normalize,
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\torch\utils.py", line 207, in create_input_processors
    obs_spec, normalize, h_size, attention_embedding_size, vis_encode_type
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\torch\utils.py", line 162, in get_encoder_for_obs
    return (visual_encoder_class(shape[0], shape[1], shape[2], h_size), h_size)
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\torch\encoders.py", line 174, in __init__
    kernel_gain=1.41,  # Use ReLU gain
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\mlagents\trainers\torch\layers.py", line 49, in linear_layer
    layer = torch.nn.Linear(input_size, output_size)
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\torch\nn\modules\linear.py", line 83, in __init__
    self.reset_parameters()
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\torch\nn\modules\linear.py", line 86, in reset_parameters
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
  File "c:\users\alexi\anaconda3\envs\mlagentconda37\lib\site-packages\torch\nn\init.py", line 381, in kaiming_uniform_
    std = gain / math.sqrt(fan)
ZeroDivisionError: float division by zero

Environment (please complete the following information):

  • Unity Version: Unity 2019.4.12f1 same with Unity 2020
  • OS + version: Windows 10
  • ML-Agents version: release 17/ 2.0.0-exp1
  • Torch version: 1.7.1+cu110 (I use conda)

Metadata

Metadata

Assignees

Labels

bugIssue describes a potential bug in ml-agents.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions