-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Allow usage with tensorflow 2.0.0 (via tf.compat.v1) #2665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kernel_initializer=c_layers.variance_scaling_initializer( | ||
factor=0.01 | ||
), | ||
kernel_initializer=tf_variance_scaling(0.01), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the PR, I don't love the name. Would like to find a better way to handle this.
Also, the parameter name changed (factor vs scale) to I just left it as positional.
ml-agents/mlagents/trainers/tf.py
Outdated
import tensorflow as tf | ||
|
||
# TODO better version check, this will do for now though | ||
is_tf2 = tf.__version__ == "2.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's definitely better ways to do this, just wanted something up and running initially.
Hi @chriselion, had a question. I believe 2.0 does offer some backward compatibility to 1.x versions. However, my understanding is, without making the traininer code 2.0 native, you don't get the major advantages of 2.0. With that, when we say "2.0.0 support", does that mean we are going to (or need to) also make our code 2.0 native? |
@unityjeffrey - correct, the goal of this PR is not to take advantage of any 2.0 features. Currently if someone has tensorflow==2.0.0 installed, ml-agents will break (and it will complain at As for "2.0 native", I'll have to let other tf experts chime in. I doubt that there's a way to do that without breaking 1.x compat, so we'll have to decide if/when the new features are compelling enough to drop support for users on the old version. |
@unityjeffrey IMO TF 2.0 will make our trainer codebase a lot cleaner and more modular. However, as @chriselion says, it will completely break 1.0 compatibility, and it will take a substantial effort to change over. It's probably worth the effort, but there's no immediate need right now. |
ah i see. makes sense. would it be suffice to say that we upgrade to TF 2.0 now (but do not make it 2.0 ready). But this will enable us later in the future to take on more work to make things cleaner and more modular (when we are ready)? |
Yes |
@unityjeffrey Updated the PR title to "Allow usage with tensorflow 2.0.0 (via tf.compat.v1)", so hopefully that's less ambiguous about the goals. |
@@ -1,4 +1,5 @@ | |||
import tensorflow as tf | |||
from mlagents.trainers import tf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ervteng @awjuliani @andrewcoh (research pod folks) How do you feel about importing this way instead (instead of import tensorflow as tf
)? Would something like import mlagents.tensorflow as tf
be nicer?
Note that either way, we should prevent directly importing "raw" tensorflow except for one central spot; I added some checks to flake8 to prevent this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either should be OK. I kind of like the current way rather than the other one since it seems less like we have a "special" version of TensorFlow built into mlagents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "current way", do you mean the one in the PR now, or on develop?
I feel like mlagents.trainers
is the wrong place for it; maybe mlagents.utils
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one in the PR.
There could definitely be an mlagents.tf_utils
file - there are actually some other generic trainer functions that could go in there (e.g. get_variable
in the SAC model, any checkpointing functionality, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the import wrapper to mlagents.tf_utils
This reverts commit 844384e.
I did a full training run last week using tf 2.0, and ran some of the models for inference; behavior looked good. |
ml-agents/mlagents/tf_utils/tf.py
Outdated
|
||
tf_variance_scaling = c_layers.variance_scaling_initializer | ||
tf_flatten = c_layers.flatten | ||
tf_rnn = tf.contrib.rnn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think we might not need these contrib layers. The RNN and flatten even in TF 1.7 exist as tf.layers.flatten
and tf.nn.rnn
which is aliased to tf.contrib
equivalents... I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right. Not sure why I thought I needed those originally. I think we still need the "import tensorflow.compat.v1 as tf" though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me -- I don't know too much about the recommended transition path but I think it all makes sense. Agree that having to set our own versions of different APIs within the different libraries and rename them (e.g. tf_variance_scaling
) isn't ideal, but seems OK as long as it doesn't grow to be too many.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Still hacky but most unit tests are passing locally. Haven't tried any "real" training yet.
HUGE thank to @SetoKaiba for figuring out most of the issues in the branch mentioned in #1850.
Main issues: