-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Design doc: Batch Normalization Operator #3748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| use_global_est = False, | ||
| epsilon = 1e-6, | ||
| momentum = 0.99): | ||
| mean_cache = scope.new_var(name = 'estimated_mean', trainable = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be more than one batch_norm_op in the same topology, make sure variables like mean_cache and others have unique names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure variables such as mean_cache invisible to other operators, or may be two operators write it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yes. we need to make sure
mean_cachehas the unique name. mean_cacheis defined insidedef batch_norm_layer, could this make sure it's invisible for other operators?
paddle/operators/batch_norm_op.md
Outdated
| # ... | ||
| ``` | ||
|
|
||
| `is_infer` is an attribute. Once an operator is created, its attributes can not be changed. It suggests us that we shall maintain two `batch_norm_op` in the model, one's `is_infer` is `True`(we call it `infer_batch_norm_op`) and the other one's is `False`(we call it `train_batch_norm_op`). They share all parameters and variables. How to organize them is related with Python API design, so I leave it here for further discussion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, different Block shares the same operator objects. Did this require that different blocks has their own operator objects?
Add some details here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need two distinct batch_norm_op. And it seems really require blocks hold their own operator objects... We shall have more discussion on it.
|
This design is temporarily placed on hold for it's strongly related to Python API design. |
jacquesqiao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Related: #3684
Here may be easier to read.
Some part of
batch_norm_opdesign is strongly related with Python API and needs future discussions.