Skip to content

Trim some public fields on the Agent #3269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 24, 2020
Merged

Trim some public fields on the Agent #3269

merged 6 commits into from
Jan 24, 2020

Conversation

vincentpierre
Copy link
Contributor

@vincentpierre vincentpierre commented Jan 22, 2020

Removed the following fields from the Agent :

  • SetRewards()
  • agentParameters.maxStep --> maxStep
  • IsDone()
  • IsMaxStepReached()
  • Info
  • ResetReward()
  • GetReward()
  • GetValueEstimates()
  • UpdateValueAction()
  • VectorAction.Value

@vincentpierre vincentpierre self-assigned this Jan 22, 2020
@surfnerd
Copy link
Contributor

surfnerd commented Jan 22, 2020

Before approving this, I'd like to see some of the training results with these changes. I'm not sure how modifying the reward function in this way will change things.

/// <returns>
/// <c>true</c>, if max step reached was reached, <c>false</c> otherwise.
/// </returns>
public bool IsMaxStepReached()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem like they would be useful for users to have access to (although maybe reworking them to properties).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a use case where the user knowing when the Agent is Done or reached max step would be useful. Do you have an example ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have an example, but I'm sure some users will after we remove it, and it seems like very little overhead to keep.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsMaxStepReached can be derived from GetStepCount and maxStep, so I that might be OK to remove. But I think keeping it as a method hides the internals better (i.e. you have to know about maxStep <= 0 ==> IsMaxStepReached is never true)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Done(), it's generally really frustrating to be able to set something but not read it's value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a use case where the user knowing when the Agent is Done or reached max step would be useful. Do you have an example?

@vincentpierre I don't think the argument "I can't think of a reason to have it" is sufficient to remove the accessor of a state that is directly modifiable by the user. We use it internally in Agent for a number of reasons. One of the thousands of users that we have may use the IsDone accessor for reasons we haven't thought of. Whether it's custom UI, an event trigger for something else. I'm not sure how they might use it, but it feels really strange to me to let users modify the state of Agent without being to ask what that state is.

I would argue that, in general, if we are allowing a user to modify a state, we should also allow them to access that state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a concrete of why we should be able to read it is right in our own tests:

public bool IsDone()
{
return (bool)typeof(Agent).GetField("m_Done", BindingFlags.Instance | BindingFlags.NonPublic).GetValue(this);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are testing the inner working of the Agent in the tests. I think it is okay, I suppose m_Done can be marked as an internal field on the Agent rather than private.
I think what we need to have access to for testing and for using the tool are very different things.

@surfnerd
Copy link
Contributor

surfnerd commented Jan 22, 2020

@andrewcoh @ervteng can you guys look and leave feedback. Arguing for/against this is out of my area of expertise for the removal of SetReward

@surfnerd surfnerd requested review from ervteng and andrewcoh January 22, 2020 18:55
@ervteng
Copy link
Contributor

ervteng commented Jan 22, 2020

I'm not in support of removing SetReward(). It makes things less confusing a little bit but introduces many cases which are harder to implement. For instance, if you have an incremental reward for velocity/survival but if the Agent hits a wall or falls off the platform, you want to kill it and give -1 reward.

So you put the SetReward(-1) and Done() in the collider OnTriggerEnter. Without SetReward or any way to read the current reward, you'll have no idea how much incremental reward was gathered by the time you hit the callback, and the Agent gets some unknown reward > -1.

@awjuliani
Copy link
Contributor

Agree with @ervteng. SetReward() serves a very specific purpose that is of value in a number of different kinds of games and environments.

@vincentpierre
Copy link
Contributor Author

I reverted my changes to SetReward.

@andrewcoh
Copy link
Contributor

Seems like it could be a little late but I also agree with @ervteng . The SetReward enables implementation of reward functions that would not be possible otherwise. Should we name it something more explicit like SetFullStepReward so that it's less confusing?

@ervteng
Copy link
Contributor

ervteng commented Jan 22, 2020

Seems like it could be a little late but I also agree with @ervteng . The SetReward enables implementation of reward functions that would not be possible otherwise. Should we name it something more explicit like SetFullStepReward so that it's less confusing?

Some other possibilities:

  • Change name to something like SetIncrementalReward()
  • Remove SetReward, add method to reset reward to 0 (ResetIncrementalReward()?)
  • Add method to get current reward (GetIncrementalReward?)
  • Remove SetReward, allow passing a final reward to Done()

/// <summary>
/// Resets the step reward and possibly the episode reward for the agent.
/// </summary>
public void ResetReward()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The places were this was removed don't account for the case where m_Done is true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reset logic is really confusing. We have:

  • ResetIfDone
  • _AgentReset
  • AgentReset
  • ResetData
  • ForceReset

Just by looking, I have no idea which is done when or why.

This issue may not be in the scope of PR, but it feels like something that should be addressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do, I reset the Cumulative reward in ResetData

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and ResetReward

@vincentpierre
Copy link
Contributor Author

I will not rename SetReward in this PR. I think we can do that later

@andrewcoh
Copy link
Contributor

Remove SetReward, allow passing a final reward to Done()

@ervteng I think there are situations other than termination states where we might want the functionality of SetReward

@vincentpierre
Copy link
Contributor Author

On the question of the IsDone(), I think of it more like an event than a change of state. I think it is okay to prevent the user to seeing if the Agent is Done since it is an internal state that is only relevant for the Policy, not to the game mechanic.
I am in general more in favor of removing APIs that are not used most of the time. If there is really a use case for knowing if an Agent is done and has yet to reset, I am curious...

On the SetReward topic, I think you are very unanimous that it is useful (so I won't fight too long) but I think it introduces a lot of confusion (that I don't think renaming would entirely solve). I personally think adding a reward should be an irrevocable action.

@surfnerd
Copy link
Contributor

I think it is okay to prevent the user to seeing if the Agent is Done since it is an internal state that is only relevant for the Policy, not to the game mechanic.

AgentInfo has a public done member on it. It's not really removing the API, just hiding it in one place arbitrarily and passing it to another as part of our public API.

@vincentpierre
Copy link
Contributor Author

AgentInfo has a public done member on it. It's not really removing the API, just hiding it in one place arbitrarily and passing it to another as part of our public API.

I made the AgentInfo property of the Agent private. So yes I am hiding it. What part of the Public API are we passing it to? AgentReset()? I think it is okay, it is more of a delayed event.

@surfnerd
Copy link
Contributor

I made the AgentInfo property of the Agent private. So yes I am hiding it. What part of the Public API are we passing it to? AgentReset()? I think it is okay, it is more of a delayed event.

AgentInfo is passed to RequestDecision

void RequestDecision(AgentInfo info, List<ISensor> sensors, Action<AgentAction> action);

AgentInfo has a public done property.

Copy link
Contributor

@chriselion chriselion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Don't forget to update the migration guide (but that can be another PR).

@vincentpierre vincentpierre merged commit b8ebd41 into master Jan 24, 2020
@delete-merged-branch delete-merged-branch bot deleted the develop-trim01 branch January 24, 2020 18:23
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants