Skip to content

1 to 1 Brain to Agent #2729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Oct 23, 2019
Merged

1 to 1 Brain to Agent #2729

merged 26 commits into from
Oct 23, 2019

Conversation

vincentpierre
Copy link
Contributor

@vincentpierre vincentpierre commented Oct 14, 2019

This is a work in progress
In this PR :

  • Deleted all Brain Objects
  • Moved the BrainParameters into the Agent
  • Gave the Agent a Heuristic method (see Balance Ball for example)
  • Modified the Communicator and ModelRunner : Put can only take one agent at a time
  • Made the IBrain Interface with RequestDecision and DecideAction method
  • Renaming Brain to Policy

No changes made to Python
Design Doc

To do before ready :

This is a work in progess
In this PR :
 - Deleted all Brain Objects
 - Moved the BrainParameters into the Agent
 - Gave the Agent a Heuristic method (see Balance Ball for example)
 - Modified the Communicator and ModelRunner : Put can only take one agent at a time
 - Made the IBrain Interface with RequestDecision and DecideAction method

No changes made to Python
[Design Doc](https://docs.google.com/document/d/1hBhBxZ9lepGF4H6fc6Hu6AW7UwOmnyX3trmgI3HpOmo/edit#)
@vincentpierre vincentpierre self-assigned this Oct 14, 2019
@@ -68,6 +65,15 @@ public override void AgentReset()
SetResetParameters();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what would replace the Heuristic and Player Brains

m_BrainFactoryParameters.brainParameters,
model,
m_BrainFactoryParameters.inferenceDevice,
behaviorName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels strange to me that the agent knows about and can construct all of the concrete IBrain implementations. Seems like an information leak between these two objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an interesting point. Should I move this logic into a Factory or is the problem deeper than that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a Factory is sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the instantiation of brains should happen external to the Agent class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is necessary for now until we get to the end of the refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made factory now but we should discuss this point further.

@@ -621,7 +640,7 @@ void SendInfoToBrain()
m_Info.maxStepReached = m_MaxStepReached;
m_Info.id = m_Id;

brain.SubscribeAgentForDecision(this);
m_Brain.RequestDecision(this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be great if the agent only had to call RequestDecision (much better name than SubscribeAgentForDecision 😂) and DecideAction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what is going on in this PR?
Is there something I am missing to get there ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my comment lacked context. I like the fact that this is happening here. I feel like this is the only interaction Agent should have with a brain IMHO. I don't think Agent should be instantiating Brains itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I don't see how this would happen, since Agent and brains would be tightly coupled in this PR, I was thinking we should have the UI to create the brain live on the Agent inspector. The creation of the factory was to enable the creation of a brain anywhere so I assumed creating in the Agent. If something must pass the Brain to the Agent, we need to think of what that something is and how the user will parameterize it.

Copy link
Contributor

@surfnerd surfnerd Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying this is a requirement for this PR to get approved, but there are a couple of red flags that pop out to me. Even though the factory 'hides' different classes of the IPolicy implementation, the remnants of their inner workings still remain.
Agents still know about:

  • NNModels
  • Heuristic functions
  • Inference Device
  • Whether or not the communicator is connected
  • Whether or not to use a Heuristic brain (which is like a meta leak)

This is all information being leaked into the agent, when in theory, it could just care about the IPolicy it is interacting with and nothing else.

In terms of ways to get this done, it may be controversial to say, but if IPolicy's were components (MonoBehaviours) you could easily parameterize them and instantiate them without having a factory or any of this information in Agent. Agent could then call GetComponent<IPolicy>(); to retrieve the concrete Policy without knowing any of the details of its implementation. I know we've talked about moving away from Unity Dependencies, but to be honest, deriving from MonoBehaviour and being able to add it as a component feels better to me.

@chriselion
Copy link
Contributor

Yeah, I think the overall approach makes sense. It's slightly counterintuitive that the ModelRunner or Communicator 's DecideBatch() get called (# of agents) times and only the first one does any work, but I don't have a better way to do it.

@vincentpierre
Copy link
Contributor Author

Yeah, I think the overall approach makes sense. It's slightly counterintuitive that the ModelRunner or Communicator 's DecideBatch() get called (# of agents) times and only the first one does any work, but I don't have a better way to do it.

Feels weird too and I realize I did not explicitly call this out in the PR comment. Hopefully, this will be temporary and after the Observation and Action refactor, we will be able to return actual action values rather than void in DecideAction.

@vincentpierre
Copy link
Contributor Author

Will put this on hold until #2731 is good to go

@vincentpierre vincentpierre marked this pull request as ready for review October 16, 2019 21:14
* Delting all brains, setting Behavior Parameters

* Removing learning from all the configs

* Forgot one agent

* Removing leftover brains

* Dead meta file

* Adding Heuristics
@surfnerd
Copy link
Contributor

Overall it looks good. I'd like to discuss the instantiation of Policies more, but that shouldn't block this from going in.

@surfnerd
Copy link
Contributor

nit: Could we move all of the Concrete Policies to a 'Policy' folder.
Screen Shot 2019-10-17 at 10 55 50 AM

* Made Policy Factory a component

* Broke the Heuristic

* fix bug

* BugFix

* Reimplemented Heuristic

* Changing all of the prefabs

* forgotten file

* Fixing under which conditions the Heuristic policy is used

* Dispose of brain
@@ -193,7 +194,7 @@ bool IsCommunicatorOn

// Signals to all the Brains at each environment step so they can decide
// actions for their agents.
public event System.Action BrainDecideAction;
public event System.Action DecideAction;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update "Brains " in comment above

@@ -524,7 +534,7 @@ void EnvironmentStep()

using (TimerStack.Instance.Scoped("BrainDecideAction"))
{
BrainDecideAction?.Invoke();
DecideAction?.Invoke();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Update timer name ("BrainDecideAction")

Copy link
Contributor

@chriselion chriselion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Hopefully it doesn't conflict too much with my changes...

* initial changes to the documentation

* More documentation changes, not done.

* More documentation changes

* More docs

* Changed the images

* addressing comments

* Adding one line to the migrating doc
@vincentpierre
Copy link
Contributor Author

Would this be causing problems on develop now?

I think this was broken by this change. The checks for visual obs resolution was ignored no?

@surfnerd
Copy link
Contributor

can you merge the latest develop in to get the Yamato pipelines to run?

@vincentpierre vincentpierre merged commit 22afeef into develop Oct 23, 2019
@vincentpierre vincentpierre deleted the develop-one-to-one branch October 23, 2019 00:52
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants