Skip to content

Stats SideChannel (for custom TensorBoard metrics) #3660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 25, 2020

Conversation

chriselion
Copy link
Contributor

@chriselion chriselion commented Mar 19, 2020

Proposed change(s)

Add a SideChannel that can be used to sent stats from C# to python, and later add those to TensorBoard.
Also adds methods to the Academy to find an instance (or instances) of specific SideChannel subclass (inspired by MonoBehaviour.GetComponent(s))

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

https://jira.unity3d.com/browse/MLA-675
https://forum.unity.com/threads/custom-graphs-in-tensorboard.843865/
#2984
https://github.com/Unity-Technologies/ml-agents/pull/3594/files

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

Sample graph of the number of time the 3DBallAgent drops the ball
image

/// </summary>
/// <typeparam name="T"></typeparam>
/// <returns></returns>
public T GetSideChannel<T>() where T: SideChannel
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincentpierre @surfnerd I think we had talked about this before. It seems like a clean way to get access to sidechannels without adding a custom method to Academy for each one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can replace Academy.FloatProperties with this later if we're happy with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace Academy.Instance.FloatProperty then. Do you want to do it in this PR or make another one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it in another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@vincentpierre vincentpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how you got the graph for the number of times the ball got dropped. Are you keeping trak of the total number in C#?
I think the PR looks good but we need to make a decision regarding GetSideChannel before moving forward. I think it is a good idea but I think having 2 methods (one that returns a list and one that returns an instance) is a bit confusing. There is no real notion of "first" with side channel and the GetSideChannel method introduces one.

using System;
namespace MLAgents.SideChannels
{
public class StatsSideChannel : SideChannel
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would TensorBoardMetricsSideChannel be a more appropriate name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't just be for TB - with this change it'll show up anywhere Python stats are written (e.g. CSV, timers, and perhaps later the Console).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I kept the name generic but called out TensorBoard in the comments. I'll also add an example of this in the "Using TensorBoard" docs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's then document the different ways that adding a stat here will bubble up on the Python side. I find the term stats odd since TB refers to metrics, but I also can't see a better alternative (esp that it does align with timer stats)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the comments below up here and clarified a bit.

/// </summary>
/// <param name="key"></param>
/// <param name="value"></param>
public void AddStat(string key, float value)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we rename the class based on the method above - this ought to be renamed to AddMetricValue

@@ -108,4 +109,11 @@ def _process_step_infos(self, step_infos: List[EnvironmentStep]) -> int:
name_behavior_id, ActionInfo.empty()
),
)

# In order to prevent conflicts between multiple environments,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be helpful documentation in the C# class too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am rather hesitant on this. Are the stats only usable when training? Is it not possible for users to get the C# stats when using the LL-API? If it is possible to use it with LL-API, then C# should not assume how the stats are consumed (at least not without a warning of type : "When using ml-agents training ™️, the stats will...")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to a comment on StatAggregationMethod.MostRecent:

To avoid conflicts between multiple environments, the ML Agents environment will only
keep stats from worker index 0.

@@ -537,6 +537,43 @@ public void UnregisterSideChannel(SideChannel sideChannel)
}
}

/// <summary>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect the implementation of these two methods to be different for different ICommunicator implementations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation will depend on how the SideChannels are stored by the ICommunicator implementation. There's no guarantee that another implementation would use a UUID->SideChannel dictionary.

We could make this an internal static method in SideChannel.cs if you think it's likely to be reusable.

@@ -147,7 +147,7 @@ def test_advance(self, mock_create_worker, external_brains_mock, step_mock):
env_manager.set_agent_manager(brain_name, agent_manager_mock)

step_info_dict = {brain_name: Mock()}
step_info = EnvironmentStep(step_info_dict, 0, action_info_dict)
step_info = EnvironmentStep(step_info_dict, 0, action_info_dict, {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should put something in the dictionary so that the method gets tested, and similarly run set_environment_stats in the test_agent_processor.py tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@chriselion
Copy link
Contributor Author

I wonder how you got the graph for the number of times the ball got dropped. Are you keeping trak of the total number in C#?

I hacked that as a static for testing :)

@chriselion
Copy link
Contributor Author

Mind taking another look at this? It changed a bit, mostly around the multiple aggregation types.



# Determines the behavior of how multiple stats within the same summary period are combined.
class StatsAggregationMethod(Enum):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better place for this? It needs to live in ml-agents-envs (unless we move the SideChannel?) but it feels strange importing mlagents_envs.side_channel.stats_side_channel in a lot of places just to get the enum. @ervteng thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this actually seems the least obtrusive since the StatsReporter lives in mlagents. The other option would be in the stats.py, but then we'd have to move that to mlagents_envs.

Copy link
Contributor

@vincentpierre vincentpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Should we add stats to one of our environments? For example in FoodCollector, amount of food collected or in crawler, total distance travelled?
I think this would help us make sure it is behaving as expected.

/// </summary>
/// <typeparam name="T"></typeparam>
/// <returns></returns>
public T GetSideChannel<T>() where T: SideChannel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace Academy.Instance.FloatProperty then. Do you want to do it in this PR or make another one?

@@ -44,5 +45,6 @@ void ClearObjects(GameObject[] objects)
public void Update()
{
scoreText.text = $"Score: {totalScore}";
Academy.Instance.GetSideChannel<StatsSideChannel>().AddStat("TotalScore", totalScore);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincentpierre Added to FoodCollector here. If we want to add to other examples, let's do so in a followup PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we store the StatsSideChannel in a private property to avoid calling GetSideChannel every update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably - I'm not to worried about the performance, but we should set a good example...

@chriselion
Copy link
Contributor Author

FWIW here's FoodCollector with the contribute flag set to true everywhere:
image

@chriselion chriselion merged commit 399ad3c into master Mar 25, 2020
@delete-merged-branch delete-merged-branch bot deleted the MLA-675-stats-from-csharp branch March 25, 2020 00:47
@chriselion chriselion mentioned this pull request Mar 25, 2020
1 task
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants