Skip to content

[feature] Add experimental PyTorch support #4335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 132 commits into from
Aug 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
132 commits
Select commit Hold shift + click to select a range
017c3cb
Begin porting work
awjuliani Apr 14, 2020
d99fc74
Add ResNet and distributions
awjuliani Apr 16, 2020
c981a81
Merge remote-tracking branch 'origin/master' into develop-add-fire
awjuliani Apr 17, 2020
6dfb8fa
Merge remote-tracking branch 'origin/master' into develop-add-fire
awjuliani Apr 20, 2020
a492a9f
Dynamically construct actor and critic
awjuliani Apr 20, 2020
5e6f4ae
Initial optimizer port
awjuliani Apr 21, 2020
7e46bc5
Refactoring policy and optimizer
awjuliani Apr 22, 2020
a3a1c0f
Resolving a few bugs
awjuliani Apr 22, 2020
652b399
Share more code between tf and torch policies
awjuliani Apr 22, 2020
da49aaa
Slightly closer to running model
awjuliani Apr 22, 2020
1ae28be
Training runs, but doesn’t actually work
awjuliani Apr 23, 2020
b68eb20
Fix a couple additional bugs
awjuliani Apr 23, 2020
5e39d84
Add conditional sigma for distribution
awjuliani Apr 23, 2020
a0d6823
Fix normalization
Apr 24, 2020
c807190
Merge remote-tracking branch 'origin/develop-add-fire-debug' into dev…
awjuliani Apr 27, 2020
e2d7fee
Support discrete actions as well
awjuliani Apr 27, 2020
f5b28d3
Continuous and discrete now train
awjuliani Apr 28, 2020
50f5cc1
Mulkti-discrete now working
awjuliani Apr 28, 2020
8c10cd3
Visual observations now train as well
awjuliani Apr 28, 2020
8445661
Merge remote-tracking branch 'origin/master' into develop-add-fire
awjuliani Apr 29, 2020
deb6e92
GRU in-progress and dynamic cnns
awjuliani Apr 30, 2020
57486ab
Fix for memories
awjuliani Apr 30, 2020
f6d5df5
Remove unused arg
awjuliani Apr 30, 2020
5521670
Combine actor and critic classes. Initial export.
awjuliani May 4, 2020
9b9e783
Support tf and pytorch alongside one another
awjuliani May 4, 2020
e98def6
Prepare model for onnx export
awjuliani May 5, 2020
d6d69ad
Merge remote-tracking branch 'origin/master' into develop-add-fire
awjuliani May 5, 2020
2c6daac
Use LSTM and fix a few merge errors
awjuliani May 6, 2020
411d0c4
Merge remote-tracking branch 'origin/master' into develop-add-fire
awjuliani May 11, 2020
8b36db0
Fix bug in probs calculation
awjuliani May 11, 2020
ff72b3e
Optimize np -> tensor operations
awjuliani May 11, 2020
ee9fbd1
Time action sample function
awjuliani May 11, 2020
8f92145
Small performance improvement during inference
awjuliani May 13, 2020
4eead36
Merge remote-tracking branch 'origin/master' into develop-add-fire
awjuliani May 19, 2020
b3d1201
Merge master
awjuliani Jun 15, 2020
892f385
ONNX exporting
awjuliani Jun 18, 2020
d12c053
Fix some issues with pdf
awjuliani Jun 26, 2020
742d322
Fix bug in pdf function
awjuliani Jun 29, 2020
509a858
Fix ResNet
awjuliani Jun 29, 2020
2a22e17
Remove double setting
awjuliani Jul 1, 2020
3442de5
Fix for discrete actions (#4181)
Jul 1, 2020
aadaca9
Fix discrete actions and GridWorld
Jul 2, 2020
a303586
Remove print statement
Jul 2, 2020
b3ca0c9
Convert List[np.ndarray] to np.ndarray before using torch.as_tensor (…
Jul 2, 2020
088cbe9
Develop add fire exp framework (#4213)
vincentpierre Jul 10, 2020
da3a7f8
reformating experiment_torch.py
vincentpierre Jul 10, 2020
5d5c4ea
Pytorch port of SAC (#4219)
Jul 22, 2020
4214ec8
Update add-fire to latest master, including Policy refactor (#4263)
Jul 24, 2020
38c3dd1
[refactor] Refactor normalizers and encoders (#4275)
Jul 29, 2020
13b78e7
fix onnx save path and output_name
Jul 31, 2020
254f83b
add Saver class (only TF working)
Aug 3, 2020
43e32f6
Merge branch 'develop-add-fire-checkpoint' of https://github.com/Unit…
Aug 3, 2020
7756a87
fix pytorch checkpointing. add tensors in Normalizer as parameter
Aug 3, 2020
dbf2daf
remove print
Aug 3, 2020
02f1916
move tf and add torch model serialization
Aug 4, 2020
d57b830
remove
Aug 4, 2020
b62a1cd
remove unused
Aug 4, 2020
8bb30b1
add sac checkpoint
Aug 4, 2020
cce8227
small improvements
Aug 4, 2020
2da4d88
small improvements
Aug 4, 2020
6e8ed26
remove print
Aug 4, 2020
76ef088
move checkpoint_path logic to saver
Aug 4, 2020
17bacbb
[refactor] Refactor Actor and Critic classes (#4287)
Aug 4, 2020
ea93224
fix onnx input
Aug 5, 2020
1ff782a
fix formatting and test
Aug 5, 2020
949aa1f
[bug-fix] Fix non-LSTM SeparateActorCritic (#4306)
Aug 5, 2020
08b810a
small improvements
Aug 6, 2020
560f937
small improvement
Aug 6, 2020
02e35fd
[bug-fix] Fix error with discrete probs (#4309)
Aug 6, 2020
9d0fad2
[tests] Add tests for core PyTorch files (#4292)
Aug 6, 2020
6f9bd88
Merge branch 'develop-add-fire' into develop-add-fire-checkpoint
Aug 6, 2020
19c9ff0
[feature] Fix TF tests, add --torch CLI option, allow run TF without …
Aug 6, 2020
749acff
Test fixes on add-fire (#4317)
vincentpierre Aug 6, 2020
d33ad07
fix tests
Aug 7, 2020
ace4394
Add components directory and init (#4320)
andrewcoh Aug 7, 2020
4759d1f
[add-fire] Halve Gaussian entropy (#4319)
Aug 7, 2020
c2b0074
[add-fire] Add learning rate and beta/epsilon decay to PyTorch (#4318)
Aug 7, 2020
143876b
Added Reward Providers for Torch (#4280)
vincentpierre Aug 7, 2020
7b2c2f9
Fix discrete export (#4322)
dongruoping Aug 8, 2020
9430fb3
[add-fire] Fix CategoricalDistInstance test and replace `range` with …
Aug 10, 2020
7c3ff1d
Develop add fire layers (#4321)
vincentpierre Aug 10, 2020
f54bf42
Merge branch 'master' into develop-add-fire-mm
Aug 10, 2020
e1dce72
fixing typo
vincentpierre Aug 10, 2020
9913e71
[add-fire] Merge post-0.19.0 master into add-fire (#4328)
Aug 11, 2020
d9e6198
Revert "[add-fire] Merge post-0.19.0 master into add-fire (#4328)" (#…
Aug 11, 2020
1bae38e
More comments and Made ResNetBlock (#4329)
vincentpierre Aug 11, 2020
ff667e7
Merge pull request #4331 from Unity-Technologies/develop-add-fire-mm2
Aug 11, 2020
680c823
Merge branch 'develop-add-fire' into develop-add-fire-checkpoint
Aug 11, 2020
b6bc80d
update saver interface and add tests
Aug 11, 2020
42f24b3
update
Aug 13, 2020
9874a35
Fixed the reporting of the discriminator loss (#4348)
vincentpierre Aug 13, 2020
a23669d
Fix ONNX import for continuous
Aug 13, 2020
e51db51
fix export input names
Aug 13, 2020
83e17bb
Behavioral Cloning Pytorch (#4293)
andrewcoh Aug 13, 2020
b706bfe
Merge branch 'develop-add-fire-checkpoint' of https://github.com/Unit…
Aug 13, 2020
6d19f58
fix export input name
Aug 13, 2020
5ce6272
[add-fire] Add LSTM to SAC, LSTM fixes and initializations (#4324)
Aug 13, 2020
9d95298
add comments
Aug 14, 2020
003f4a6
Merge branch 'develop-add-fire' into develop-add-fire-checkpoint
Aug 14, 2020
cb87d78
Merge branch 'master' into develop-add-fire-mm3
Aug 14, 2020
06b2106
fix bc tests
Aug 14, 2020
61f3aca
Merge branch 'develop-add-fire-mm3' into develop-add-fire-checkpoint
Aug 14, 2020
4d7d118
change brain_name to behavior_name
Aug 14, 2020
de0265e
Merge master and add Saver class for save/load checkpoints
dongruoping Aug 14, 2020
291091a
reverting Project settings
vincentpierre Aug 14, 2020
d37960c
[add-fire] Fix masked mean for 2d tensors (#4364)
Aug 14, 2020
c3fae3a
Removing the experiment script from add fire (#4373)
vincentpierre Aug 18, 2020
71e7b17
[add-fire] Add tests and fix issues with Policy (#4372)
Aug 18, 2020
f9273bb
Pytorch ghost trainer (#4370)
andrewcoh Aug 18, 2020
23e8d72
add test_simple_rl tests to torch
andrewcoh Aug 18, 2020
6635413
revert tests
andrewcoh Aug 18, 2020
1d89489
Fix of the test for multi visual input
vincentpierre Aug 18, 2020
48e77c6
Make reset block submodule
Aug 18, 2020
6e75dd1
fix export input_name
Aug 19, 2020
7660a90
[add-fire] Memory class abstraction (#4375)
Aug 19, 2020
4db512b
make visual input channel first for export
Aug 19, 2020
bd41761
Merge branch 'develop-add-fire' into develop-add-fire-export
Aug 19, 2020
47212e5
Don't use torch.split in LSTM
Aug 19, 2020
09c2dc3
Add fire to test_simple_rl.py (#4378)
andrewcoh Aug 19, 2020
b22f412
Merge branch 'develop-add-fire' of github.com:Unity-Technologies/ml-a…
Aug 19, 2020
269a4c8
reverting unity_to_external_pb2_grpc.py
vincentpierre Aug 19, 2020
3d7b809
remove duplicate of curr documentation
andrewcoh Aug 19, 2020
1940d96
Revert "remove duplicate of curr documentation"
andrewcoh Aug 19, 2020
9406624
remove duplicated curriculum doc (#4386)
andrewcoh Aug 19, 2020
0a8b5e0
Fixed discrete models
Aug 19, 2020
e6eb502
Always export one Action tensor (#4388)
Aug 19, 2020
6f46b30
[add-fire] Revert unneeded changes back to master (#4389)
Aug 20, 2020
435d226
add comment
Aug 20, 2020
1a15577
fix test
Aug 20, 2020
38c1007
Fix export
dongruoping Aug 20, 2020
ddcf078
add fire clean up docstrings in create policies (#4391)
andrewcoh Aug 20, 2020
e93c746
[add-fire] Update changelog (#4397)
Aug 20, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions com.unity.ml-agents/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ and this project adheres to
- The interaction between EnvManager and TrainerController was changed; EnvManager.advance() was split into to stages,
and TrainerController now uses the results from the first stage to handle new behavior names. This change speeds up
Python training by approximately 5-10%. (#4259)
- Experimental PyTorch support has been added. Use `--torch` when running `mlagents-learn`, or add
`framework: pytorch` to your trainer configuration (under the behavior name) to enable it.
Note that PyTorch 1.6.0 or greater should be installed to use this feature; see
[the PyTorch website](https://pytorch.org/) for installation instructions. (#4335)

### Minor Changes
#### com.unity.ml-agents (C#)
Expand Down
2 changes: 1 addition & 1 deletion docs/Learning-Environment-Examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -460,7 +460,7 @@ you would like to contribute environments, please see our
head, thighs, shins, feet, arms, forearms and hands.
- Goal: The agents must move its body toward the goal direction without falling.
- `WalkerDynamic`- Goal direction is randomized.
- `WalkerDynamicVariableSpeed`- Goal direction and walking speed are randomized.
- `WalkerDynamicVariableSpeed`- Goal direction and walking speed are randomized.
Copy link
Contributor Author

@ervteng ervteng Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to remove this, this delta isn't picked up by git 👿

- `WalkerStatic` - Goal direction is always forward.
- `WalkerStaticVariableSpeed` - Goal direction is always forward. Walking
speed is randomized
Expand Down
2 changes: 1 addition & 1 deletion ml-agents/mlagents/trainers/buffer.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ def extend(self, data: np.ndarray) -> None:
Adds a list of np.arrays to the end of the list of np.arrays.
:param data: The np.array list to append.
"""
self += list(np.array(data))
self += list(np.array(data, dtype=np.float32))

def set(self, data):
"""
Expand Down
7 changes: 7 additions & 0 deletions ml-agents/mlagents/trainers/cli_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,13 @@ def _create_parser() -> argparse.ArgumentParser:
action=DetectDefaultStoreTrue,
help="Forces training using CPU only",
)
argparser.add_argument(
"--torch",
default=False,
action=DetectDefaultStoreTrue,
help="(Experimental) Use the PyTorch framework instead of TensorFlow. Install PyTorch "
"before using this option",
)

eng_conf = argparser.add_argument_group(title="Engine Configuration")
eng_conf.add_argument(
Expand Down
12 changes: 7 additions & 5 deletions ml-agents/mlagents/trainers/ghost/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,10 @@ def save_model(self) -> None:
self.trainer.save_model()

def create_policy(
self, parsed_behavior_id: BehaviorIdentifiers, behavior_spec: BehaviorSpec
self,
parsed_behavior_id: BehaviorIdentifiers,
behavior_spec: BehaviorSpec,
create_graph: bool = False,
) -> Policy:
"""
Creates policy with the wrapped trainer's create_policy function
Expand All @@ -313,10 +316,10 @@ def create_policy(
team are grouped. All policies associated with this team are added to the
wrapped trainer to be trained.
"""
policy = self.trainer.create_policy(parsed_behavior_id, behavior_spec)
policy.create_tf_graph()
policy = self.trainer.create_policy(
parsed_behavior_id, behavior_spec, create_graph=True
)
self.trainer.saver.initialize_or_load(policy)
policy.init_load_weights()
team_id = parsed_behavior_id.team_id
self.controller.subscribe_team_id(team_id, self)

Expand All @@ -326,7 +329,6 @@ def create_policy(
parsed_behavior_id, behavior_spec
)
self.trainer.add_policy(parsed_behavior_id, internal_trainer_policy)
internal_trainer_policy.init_load_weights()
self.current_policy_snapshot[
parsed_behavior_id.brain_name
] = internal_trainer_policy.get_weights()
Expand Down
94 changes: 94 additions & 0 deletions ml-agents/mlagents/trainers/optimizer/torch_optimizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
from typing import Dict, Optional, Tuple, List
import torch
import numpy as np

from mlagents.trainers.buffer import AgentBuffer
from mlagents.trainers.trajectory import SplitObservations
from mlagents.trainers.torch.components.bc.module import BCModule
from mlagents.trainers.torch.components.reward_providers import create_reward_provider

from mlagents.trainers.policy.torch_policy import TorchPolicy
from mlagents.trainers.optimizer import Optimizer
from mlagents.trainers.settings import TrainerSettings
from mlagents.trainers.torch.utils import ModelUtils


class TorchOptimizer(Optimizer): # pylint: disable=W0223
def __init__(self, policy: TorchPolicy, trainer_settings: TrainerSettings):
super().__init__()
self.policy = policy
self.trainer_settings = trainer_settings
self.update_dict: Dict[str, torch.Tensor] = {}
self.value_heads: Dict[str, torch.Tensor] = {}
self.memory_in: torch.Tensor = None
self.memory_out: torch.Tensor = None
self.m_size: int = 0
self.global_step = torch.tensor(0)
self.bc_module: Optional[BCModule] = None
self.create_reward_signals(trainer_settings.reward_signals)
if trainer_settings.behavioral_cloning is not None:
self.bc_module = BCModule(
self.policy,
trainer_settings.behavioral_cloning,
policy_learning_rate=trainer_settings.hyperparameters.learning_rate,
default_batch_size=trainer_settings.hyperparameters.batch_size,
default_num_epoch=3,
)

def update(self, batch: AgentBuffer, num_sequences: int) -> Dict[str, float]:
pass

def create_reward_signals(self, reward_signal_configs):
"""
Create reward signals
:param reward_signal_configs: Reward signal config.
"""
for reward_signal, settings in reward_signal_configs.items():
# Name reward signals by string in case we have duplicates later
self.reward_signals[reward_signal.value] = create_reward_provider(
reward_signal, self.policy.behavior_spec, settings
)

def get_trajectory_value_estimates(
self, batch: AgentBuffer, next_obs: List[np.ndarray], done: bool
) -> Tuple[Dict[str, np.ndarray], Dict[str, float]]:
vector_obs = [ModelUtils.list_to_tensor(batch["vector_obs"])]
if self.policy.use_vis_obs:
visual_obs = []
for idx, _ in enumerate(
self.policy.actor_critic.network_body.visual_encoders
):
visual_ob = ModelUtils.list_to_tensor(batch["visual_obs%d" % idx])
visual_obs.append(visual_ob)
else:
visual_obs = []

memory = torch.zeros([1, 1, self.policy.m_size])

vec_vis_obs = SplitObservations.from_observations(next_obs)
next_vec_obs = [
ModelUtils.list_to_tensor(vec_vis_obs.vector_observations).unsqueeze(0)
]
next_vis_obs = [
ModelUtils.list_to_tensor(_vis_ob).unsqueeze(0)
for _vis_ob in vec_vis_obs.visual_observations
]

value_estimates, next_memory = self.policy.actor_critic.critic_pass(
vector_obs, visual_obs, memory, sequence_length=batch.num_experiences
)

next_value_estimate, _ = self.policy.actor_critic.critic_pass(
next_vec_obs, next_vis_obs, next_memory, sequence_length=1
)

for name, estimate in value_estimates.items():
value_estimates[name] = estimate.detach().cpu().numpy()
next_value_estimate[name] = next_value_estimate[name].detach().cpu().numpy()

if done:
for k in next_value_estimate:
if not self.reward_signals[k].ignore_done:
next_value_estimate[k] = 0.0

return value_estimates, next_value_estimate
2 changes: 2 additions & 0 deletions ml-agents/mlagents/trainers/policy/tf_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,8 @@ def create_tf_graph(self) -> None:
# We do an initialize to make the Policy usable out of the box. If an optimizer is needed,
# it will re-load the full graph
self.initialize()
# Create assignment ops for Ghost Trainer
self.init_load_weights()

def _create_encoder(
self,
Expand Down
Loading