Skip to content

Remove TrainerMetrics and add CSVWriter using new StatsWriter API #3108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 86 commits into from
Dec 20, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
9a16838
Split buffer into two buffers (PPO works)
Nov 22, 2019
55b2918
buffer split for SAC
Nov 25, 2019
38f5795
Fix buffer tests and truncate
Nov 25, 2019
453dd4c
Fix RL tests
Nov 25, 2019
b00f779
Fix demo loader and remaining tests
Nov 25, 2019
3b7191b
Remove MANIFEST file
Nov 25, 2019
9c47678
Add type hints to Buffer
Nov 25, 2019
a57a220
Rename append_update_buffer to append_to_update_buffer
Nov 25, 2019
efe29c8
Merge branch 'develop' into develop-splitbuffer
Nov 26, 2019
f5f9598
Non-working commit
Nov 26, 2019
f3459eb
Revert buffer for now
Nov 26, 2019
0b603c7
Another nonworking commit
Nov 27, 2019
ea6e79d
Runs but doesn't do anything yet
Nov 27, 2019
a264b48
Merge branch 'develop' into develop-agentprocessor
Dec 4, 2019
5e4f1bc
Use ProcessingBuffer in AgentProcessor
Dec 4, 2019
a5ac988
Convert to trajectory
Dec 4, 2019
a2e33e8
Looks like it's training
Dec 5, 2019
7004db8
Fix memory leak
Dec 5, 2019
0863ff5
Attempt reward reporting
Dec 5, 2019
88feb1b
Stats reporting is working
Dec 5, 2019
d6fe367
Clean up some stuff
Dec 5, 2019
8e43ecd
No longer using ProcessingBuffer for PPO
Dec 5, 2019
2b32d61
Move trajectory and related functions to trajectory.py
Dec 5, 2019
991be2c
Add back max_step logic
Dec 6, 2019
9b7969b
Merge branch 'master' of github.com:Unity-Technologies/ml-agents into…
Dec 6, 2019
5efd4e9
Remove epsilon
Dec 6, 2019
3bfe3df
Migrate SAC
Dec 6, 2019
f7649ae
Remove dead code
Dec 6, 2019
6b40d00
Move some common logic to buffer class
Dec 6, 2019
bf59521
Kill the ProcessingBuffer
Dec 6, 2019
68984df
Convert BC (warning) might be broken
Dec 6, 2019
12d4467
Fix some bugs for visual obs
Dec 6, 2019
2322150
Fixes for recurrent
Dec 7, 2019
2d084ed
Better decoupling for agent processor
Dec 7, 2019
295e3a0
Fix some of the tests
Dec 7, 2019
9334bb6
Add test for trajectory
Dec 9, 2019
93060b5
Fix BC and tests
Dec 9, 2019
3a3eb5b
Lots of test fixes
Dec 9, 2019
4c5bd73
Remove BootstrapExperience
Dec 9, 2019
1c95992
Move agent_id to Trajectory
Dec 9, 2019
a48e7f7
Add back next_obs
Dec 10, 2019
0053517
Fix test again
Dec 10, 2019
29797b1
Fix PPO value tests
Dec 10, 2019
e9dcdd9
Properly report value estimates and episode length
Dec 10, 2019
68a3b3d
Fix np float32 errors
Dec 10, 2019
6298731
Fix one more np float32 issue
Dec 10, 2019
cd4c09c
Merge branch 'master' into develop-agentprocessor
Dec 10, 2019
1a545c1
Fix some import errors
Dec 10, 2019
9452806
Make conversion methods part of NamedTuples
Dec 11, 2019
1052ad5
Add way to check if trajectory is done or max_reached
Dec 11, 2019
94c5f8c
Add docstring
Dec 11, 2019
866bf9c
Address AgentProcessor comments
Dec 11, 2019
03bd3e4
Allow None max steps
Dec 12, 2019
153368c
Merge branch 'master' into develop-agentprocessor
Dec 12, 2019
fd1312b
Fix tests
Dec 12, 2019
1a7fffd
Fix some mypy issues and remove unused code
Dec 12, 2019
d1b30b3
Fix numpy import
Dec 12, 2019
d9abe26
Remove defaultdict that didn't make sense
Dec 12, 2019
f090033
Fixed value estimate bug
Dec 12, 2019
6a1f275
Fix mypy issue
Dec 12, 2019
0f08718
Add stats reporter class and re-enable missing stats (#3076)
Dec 13, 2019
80a3359
Revert gitignore
Dec 13, 2019
a938d61
Normalize based on number of elements
Dec 14, 2019
63d6dd0
Add comment
Dec 16, 2019
82e8191
Merge branch 'master' into develop-agentprocessor
Dec 16, 2019
9a83b66
New way to update mean and var
Dec 17, 2019
c827581
Merge branch 'master' into develop-agentprocessor
Dec 18, 2019
89f9375
Fix tests
Dec 18, 2019
212cc3b
Add comments for normalization
Dec 18, 2019
10dcc1b
Remove dead code
Dec 18, 2019
2d72b06
Add type hints to rl_trainer
Dec 19, 2019
a0c76c7
Cleanup agent_processor
Dec 19, 2019
b1060e5
Make file creation safer
Dec 19, 2019
70f91af
Fix error message
Dec 19, 2019
8a44fc5
Clean up trajectory and splitobs
Dec 19, 2019
919a00b
Use .get for trainer_parameters
Dec 19, 2019
7122d39
Add test for normalization
Dec 19, 2019
cb1ec87
Float32 array in test
Dec 19, 2019
9d554bb
Fix comment in test
Dec 19, 2019
bf1ba10
Remove TrainerMetrics
Dec 19, 2019
9b89eff
Add CSVWriter StatsWriter
Dec 19, 2019
0359052
Add comment
Dec 19, 2019
d33c752
Merge branch 'master' into develop-csvwriter
Dec 19, 2019
14e7df7
Add required fields to CSVWriter
Dec 20, 2019
622cd7b
Fix test comments
Dec 20, 2019
40ebd20
Clean up if else
Dec 20, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions ml-agents/mlagents/trainers/learn.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from mlagents.trainers.exception import TrainerError
from mlagents.trainers.meta_curriculum import MetaCurriculum
from mlagents.trainers.trainer_util import load_config, TrainerFactory
from mlagents.trainers.stats import TensorboardWriter, StatsReporter
from mlagents.trainers.stats import TensorboardWriter, CSVWriter, StatsReporter
from mlagents_envs.environment import UnityEnvironment
from mlagents.trainers.sampler_class import SamplerManager
from mlagents.trainers.exception import SamplerException
Expand Down Expand Up @@ -250,9 +250,15 @@ def run_training(
trainer_config = load_config(trainer_config_path)
port = options.base_port + (sub_id * options.num_envs)

# Configure Tensorboard Writers and StatsReporter
# Configure CSV, Tensorboard Writers and StatsReporter
# We assume reward and episode length are needed in the CSV.
csv_writer = CSVWriter(
summaries_dir,
required_fields=["Environment/Cumulative Reward", "Environment/Episode Length"],
)
tb_writer = TensorboardWriter(summaries_dir)
StatsReporter.add_writer(tb_writer)
StatsReporter.add_writer(csv_writer)

if options.env_path is None:
port = 5004 # This is the in Editor Training Port
Expand Down
5 changes: 0 additions & 5 deletions ml-agents/mlagents/trainers/ppo/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,10 +177,6 @@ def update_policy(self):
The reward signal generators must be updated in this method at their own pace.
"""
buffer_length = self.update_buffer.num_experiences
self.trainer_metrics.start_policy_update_timer(
number_experiences=buffer_length,
mean_return=float(np.mean(self.cumulative_returns_since_policy_update)),
)
self.cumulative_returns_since_policy_update.clear()

# Make sure batch_size is a multiple of sequence length. During training, we
Expand Down Expand Up @@ -221,7 +217,6 @@ def update_policy(self):
for stat, val in update_stats.items():
self.stats_reporter.add_stat(stat, val)
self.clear_update_buffer()
self.trainer_metrics.end_policy_update()


def discount_rewards(r, gamma=0.99, value_next=0.0):
Expand Down
5 changes: 0 additions & 5 deletions ml-agents/mlagents/trainers/sac/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,13 +207,8 @@ def update_policy(self) -> None:
If reward_signal_train_interval is met, update the reward signals from the buffer.
"""
if self.step % self.train_interval == 0:
self.trainer_metrics.start_policy_update_timer(
number_experiences=self.update_buffer.num_experiences,
mean_return=float(np.mean(self.cumulative_returns_since_policy_update)),
)
self.update_sac_policy()
self.update_reward_signals()
self.trainer_metrics.end_policy_update()

def update_sac_policy(self) -> None:
"""
Expand Down
94 changes: 81 additions & 13 deletions ml-agents/mlagents/trainers/stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,28 @@
from typing import List, Dict, NamedTuple
import numpy as np
import abc
import csv
import os

from mlagents.tf_utils import tf


class StatsSummary(NamedTuple):
mean: float
std: float
num: int


class StatsWriter(abc.ABC):
"""
A StatsWriter abstract class. A StatsWriter takes in a category, key, scalar value, and step
and writes it out by some method.
"""

@abc.abstractmethod
def write_stats(self, category: str, key: str, value: float, step: int) -> None:
def write_stats(
self, category: str, values: Dict[str, StatsSummary], step: int
) -> None:
pass

@abc.abstractmethod
Expand All @@ -24,15 +33,23 @@ def write_text(self, category: str, text: str, step: int) -> None:

class TensorboardWriter(StatsWriter):
def __init__(self, base_dir: str):
"""
A StatsWriter that writes to a Tensorboard summary.
:param base_dir: The directory within which to place all the summaries. Tensorboard files will be written to a
{base_dir}/{category} directory.
"""
self.summary_writers: Dict[str, tf.summary.FileWriter] = {}
self.base_dir: str = base_dir

def write_stats(self, category: str, key: str, value: float, step: int) -> None:
def write_stats(
self, category: str, values: Dict[str, StatsSummary], step: int
) -> None:
self._maybe_create_summary_writer(category)
summary = tf.Summary()
summary.value.add(tag="{}".format(key), simple_value=value)
self.summary_writers[category].add_summary(summary, step)
self.summary_writers[category].flush()
for key, value in values.items():
summary = tf.Summary()
summary.value.add(tag="{}".format(key), simple_value=value.mean)
self.summary_writers[category].add_summary(summary, step)
self.summary_writers[category].flush()

def _maybe_create_summary_writer(self, category: str) -> None:
if category not in self.summary_writers:
Expand All @@ -47,10 +64,59 @@ def write_text(self, category: str, text: str, step: int) -> None:
self.summary_writers[category].add_summary(text, step)


class StatsSummary(NamedTuple):
mean: float
std: float
num: int
class CSVWriter(StatsWriter):
def __init__(self, base_dir: str, required_fields: List[str] = None):
"""
A StatsWriter that writes to a Tensorboard summary.
:param base_dir: The directory within which to place the CSV file, which will be {base_dir}/{category}.csv.
:param required_fields: If provided, the CSV writer won't write until these fields have statistics to write for
them.
"""
# We need to keep track of the fields in the CSV, as all rows need the same fields.
self.csv_fields: Dict[str, List[str]] = {}
self.required_fields = required_fields if required_fields else []
self.base_dir: str = base_dir

def write_stats(
self, category: str, values: Dict[str, StatsSummary], step: int
) -> None:
if self._maybe_create_csv_file(category, list(values.keys())):
row = [str(step)]
# Only record the stats that showed up in the first valid row
for key in self.csv_fields[category]:
_val = values.get(key, None)
row.append(str(_val.mean) if _val else "None")
with open(self._get_filepath(category), "a") as file:
writer = csv.writer(file)
writer.writerow(row)

def _maybe_create_csv_file(self, category: str, keys: List[str]) -> bool:
"""
If no CSV file exists and the keys have the required values,
make the CSV file and write hte title row.
Returns True if there is now (or already is) a valid CSV file.
"""
if category not in self.csv_fields:
summary_dir = self.base_dir
os.makedirs(summary_dir, exist_ok=True)
# Only store if the row contains the required fields
if all(item in keys for item in self.required_fields):
self.csv_fields[category] = keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: redundant? You do the same before the with...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - removed

with open(self._get_filepath(category), "w") as file:
title_row = ["Steps"]
title_row.extend(keys)
writer = csv.writer(file)
writer.writerow(title_row)
return True
return False
return True

def _get_filepath(self, category: str) -> str:
file_dir = os.path.join(self.base_dir, category + ".csv")
return file_dir

def write_text(self, category: str, text: str, step: int) -> None:
pass


class StatsReporter:
Expand Down Expand Up @@ -87,11 +153,13 @@ def write_stats(self, step: int) -> None:
:param category: The category which to write out the stats.
:param step: Training step which to write these stats as.
"""
values: Dict[str, StatsSummary] = {}
for key in StatsReporter.stats_dict[self.category]:
if len(StatsReporter.stats_dict[self.category][key]) > 0:
stat_mean = float(np.mean(StatsReporter.stats_dict[self.category][key]))
for writer in StatsReporter.writers:
writer.write_stats(self.category, key, stat_mean, step)
stat_summary = self.get_stats_summaries(key)
values[key] = stat_summary
for writer in StatsReporter.writers:
writer.write_stats(self.category, values, step)
del StatsReporter.stats_dict[self.category]

def write_text(self, text: str, step: int) -> None:
Expand Down
59 changes: 55 additions & 4 deletions ml-agents/mlagents/trainers/tests/test_stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,14 @@
import os
import pytest
import tempfile
import csv

from mlagents.trainers.stats import StatsReporter, TensorboardWriter
from mlagents.trainers.stats import (
StatsReporter,
TensorboardWriter,
CSVWriter,
StatsSummary,
)


def test_stat_reporter_add_summary_write():
Expand Down Expand Up @@ -35,8 +41,12 @@ def test_stat_reporter_add_summary_write():
# Test write_stats
step = 10
statsreporter1.write_stats(step)
mock_writer1.write_stats.assert_called_once_with("category1", "key1", 4.5, step)
mock_writer2.write_stats.assert_called_once_with("category1", "key1", 4.5, step)
mock_writer1.write_stats.assert_called_once_with(
"category1", {"key1": statssummary1}, step
)
mock_writer2.write_stats.assert_called_once_with(
"category1", {"key1": statssummary1}, step
)


def test_stat_reporter_text():
Expand All @@ -61,7 +71,8 @@ def test_tensorboard_writer(mock_filewriter, mock_summary):
category = "category1"
with tempfile.TemporaryDirectory(prefix="unittest-") as base_dir:
tb_writer = TensorboardWriter(base_dir)
tb_writer.write_stats("category1", "key1", 1.0, 10)
statssummary1 = StatsSummary(mean=1.0, std=1.0, num=1)
tb_writer.write_stats("category1", {"key1": statssummary1}, 10)

# Test that the filewriter has been created and the directory has been created.
filewriter_dir = "{basedir}/{category}".format(
Expand All @@ -78,3 +89,43 @@ def test_tensorboard_writer(mock_filewriter, mock_summary):
mock_summary.return_value, 10
)
mock_filewriter.return_value.flush.assert_called_once()


def test_csv_writer():
# Test write_stats
category = "category1"
with tempfile.TemporaryDirectory(prefix="unittest-") as base_dir:
csv_writer = CSVWriter(base_dir, required_fields=["key1", "key2"])
statssummary1 = StatsSummary(mean=1.0, std=1.0, num=1)
csv_writer.write_stats("category1", {"key1": statssummary1}, 10)

# Test that the filewriter has been created and the directory has been created.
filewriter_dir = "{basedir}/{category}.csv".format(
basedir=base_dir, category=category
)
# The required keys weren't in the stats
assert not os.path.exists(filewriter_dir)

csv_writer.write_stats(
"category1", {"key1": statssummary1, "key2": statssummary1}, 10
)
csv_writer.write_stats(
"category1", {"key1": statssummary1, "key2": statssummary1}, 20
)

# The required keys were in the stats
assert os.path.exists(filewriter_dir)

with open(filewriter_dir) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
line_count = 0
for row in csv_reader:
if line_count == 0:
assert "key1" in row
assert "key2" in row
assert "Steps" in row
line_count += 1
else:
assert len(row) == 3
line_count += 1
assert line_count == 3
46 changes: 0 additions & 46 deletions ml-agents/mlagents/trainers/tests/test_trainer_metrics.py

This file was deleted.

3 changes: 0 additions & 3 deletions ml-agents/mlagents/trainers/tests/test_trainer_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@

import mlagents.trainers.trainer_util as trainer_util
from mlagents.trainers.trainer_util import load_config, _load_config
from mlagents.trainers.trainer_metrics import TrainerMetrics
from mlagents.trainers.ppo.trainer import PPOTrainer
from mlagents.trainers.exception import TrainerConfigError
from mlagents.trainers.brain import BrainParameters
Expand Down Expand Up @@ -119,7 +118,6 @@ def mock_constructor(
run_id,
multi_gpu,
):
self.trainer_metrics = TrainerMetrics("", "")
assert brain == brain_params_mock
assert trainer_parameters == expected_config
assert reward_buff_cap == expected_reward_buff_cap
Expand Down Expand Up @@ -178,7 +176,6 @@ def mock_constructor(
run_id,
multi_gpu,
):
self.trainer_metrics = TrainerMetrics("", "")
assert brain == brain_params_mock
assert trainer_parameters == expected_config
assert reward_buff_cap == expected_reward_buff_cap
Expand Down
11 changes: 0 additions & 11 deletions ml-agents/mlagents/trainers/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@

from mlagents_envs.exception import UnityException
from mlagents_envs.timers import set_gauge
from mlagents.trainers.trainer_metrics import TrainerMetrics
from mlagents.trainers.tf_policy import TFPolicy
from mlagents.trainers.stats import StatsReporter
from mlagents.trainers.trajectory import Trajectory
Expand Down Expand Up @@ -52,9 +51,6 @@ def __init__(
self.stats_reporter = StatsReporter(self.summary_path)
self.cumulative_returns_since_policy_update: List[float] = []
self.is_training = training
self.trainer_metrics = TrainerMetrics(
path=self.summary_path + ".csv", brain_name=self.brain_name
)
self._reward_buffer: Deque[float] = deque(maxlen=reward_buff_cap)
self.policy: TFPolicy = None # type: ignore # this will always get set
self.step: int = 0
Expand Down Expand Up @@ -170,13 +166,6 @@ def export_model(self) -> None:
"""
self.policy.export_model()

def write_training_metrics(self) -> None:
"""
Write training metrics to a CSV file
:return:
"""
self.trainer_metrics.write_training_metrics()

def write_summary(self, global_step: int, delta_train_start: float) -> None:
"""
Saves training statistics to Tensorboard.
Expand Down
Loading