Skip to content

Allow usage with tensorflow 2.0.0 (via tf.compat.v1) #2665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Nov 14, 2019

Conversation

chriselion
Copy link
Contributor

@chriselion chriselion commented Oct 3, 2019

Still hacky but most unit tests are passing locally. Haven't tried any "real" training yet.

HUGE thank to @SetoKaiba for figuring out most of the issues in the branch mentioned in #1850.

Main issues:

  • Better version checking
  • cleanup some of the imports (I don't like tf_flatten, not sure mlagents.trainers is the right spot, etc).
  • Figure out what's up with tf2bc
  • Ban raw "import tensorflow" except in one location
  • Set up CI to work on 1.7, 1.14, and 2.0
  • Measure inference quality - quick test with 3dball didn't look good

@chriselion chriselion changed the base branch from master to develop October 3, 2019 04:56
kernel_initializer=c_layers.variance_scaling_initializer(
factor=0.01
),
kernel_initializer=tf_variance_scaling(0.01),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the PR, I don't love the name. Would like to find a better way to handle this.

Also, the parameter name changed (factor vs scale) to I just left it as positional.

import tensorflow as tf

# TODO better version check, this will do for now though
is_tf2 = tf.__version__ == "2.0.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's definitely better ways to do this, just wanted something up and running initially.

@shihzy
Copy link
Contributor

shihzy commented Oct 3, 2019

Hi @chriselion, had a question. I believe 2.0 does offer some backward compatibility to 1.x versions. However, my understanding is, without making the traininer code 2.0 native, you don't get the major advantages of 2.0. With that, when we say "2.0.0 support", does that mean we are going to (or need to) also make our code 2.0 native?

@chriselion
Copy link
Contributor Author

@unityjeffrey - correct, the goal of this PR is not to take advantage of any 2.0 features. Currently if someone has tensorflow==2.0.0 installed, ml-agents will break (and it will complain at pip install time first). After this PR, we'll be compatible with 2.0.0

As for "2.0 native", I'll have to let other tf experts chime in. I doubt that there's a way to do that without breaking 1.x compat, so we'll have to decide if/when the new features are compelling enough to drop support for users on the old version.

@ervteng
Copy link
Contributor

ervteng commented Oct 3, 2019

@unityjeffrey IMO TF 2.0 will make our trainer codebase a lot cleaner and more modular. However, as @chriselion says, it will completely break 1.0 compatibility, and it will take a substantial effort to change over. It's probably worth the effort, but there's no immediate need right now.

@shihzy
Copy link
Contributor

shihzy commented Oct 3, 2019

ah i see. makes sense. would it be suffice to say that we upgrade to TF 2.0 now (but do not make it 2.0 ready). But this will enable us later in the future to take on more work to make things cleaner and more modular (when we are ready)?

@ervteng
Copy link
Contributor

ervteng commented Oct 3, 2019

ah i see. makes sense. would it be suffice to say that we upgrade to TF 2.0 now (but do not make it 2.0 ready). But this will enable us later in the future to take on more work to make things cleaner and more modular (when we are ready)?

Yes

@chriselion chriselion changed the title [WIP] Add tensorflow 2.0.0 support [WIP] Allow usage with tensorflow 2.0.0 Oct 8, 2019
@chriselion chriselion changed the title [WIP] Allow usage with tensorflow 2.0.0 [WIP] Allow usage with tensorflow 2.0.0 (via tf.compat.v1) Oct 8, 2019
@chriselion
Copy link
Contributor Author

@unityjeffrey Updated the PR title to "Allow usage with tensorflow 2.0.0 (via tf.compat.v1)", so hopefully that's less ambiguous about the goals.

@@ -1,4 +1,5 @@
import tensorflow as tf
from mlagents.trainers import tf
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ervteng @awjuliani @andrewcoh (research pod folks) How do you feel about importing this way instead (instead of import tensorflow as tf)? Would something like import mlagents.tensorflow as tf be nicer?

Note that either way, we should prevent directly importing "raw" tensorflow except for one central spot; I added some checks to flake8 to prevent this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either should be OK. I kind of like the current way rather than the other one since it seems less like we have a "special" version of TensorFlow built into mlagents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "current way", do you mean the one in the PR now, or on develop?

I feel like mlagents.trainers is the wrong place for it; maybe mlagents.utils or something like that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one in the PR.

There could definitely be an mlagents.tf_utils file - there are actually some other generic trainer functions that could go in there (e.g. get_variable in the SAC model, any checkpointing functionality, etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the import wrapper to mlagents.tf_utils

@chriselion chriselion marked this pull request as ready for review November 8, 2019 23:45
@chriselion
Copy link
Contributor Author

I did a full training run last week using tf 2.0, and ran some of the models for inference; behavior looked good.

@chriselion chriselion requested review from ervteng and harperj November 8, 2019 23:48
@chriselion chriselion changed the title [WIP] Allow usage with tensorflow 2.0.0 (via tf.compat.v1) Allow usage with tensorflow 2.0.0 (via tf.compat.v1) Nov 8, 2019

tf_variance_scaling = c_layers.variance_scaling_initializer
tf_flatten = c_layers.flatten
tf_rnn = tf.contrib.rnn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think we might not need these contrib layers. The RNN and flatten even in TF 1.7 exist as tf.layers.flatten and tf.nn.rnn which is aliased to tf.contrib equivalents... I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. Not sure why I thought I needed those originally. I think we still need the "import tensorflow.compat.v1 as tf" though.

Copy link
Contributor

@harperj harperj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me -- I don't know too much about the recommended transition path but I think it all makes sense. Agree that having to set our own versions of different APIs within the different libraries and rename them (e.g. tf_variance_scaling) isn't ideal, but seems OK as long as it doesn't grow to be too many.

Copy link
Contributor

@ervteng ervteng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good :shipit:

@chriselion chriselion merged commit 182a40e into develop Nov 14, 2019
@delete-merged-branch delete-merged-branch bot deleted the try-tf2-support branch November 14, 2019 02:24
@chriselion chriselion mentioned this pull request Nov 14, 2019
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants