Remove 'model_.' prefix from onnx model initializers in training #3881

BowenBao · 2020-05-08T23:07:54Z

After a previous change using wrapper module for onnx export, the onnx model initializer names are all inserted with 'model_.', diverging from the parameter names in origianl pt model. This makes it hard for users to figure out weight names, especially when using frozen_weights.

This PR removes this prefix, and adds warning message when weight names in frozen_weights are not found in the model.

onnxruntime/test/python/onnxruntime_test_ort_trainer.py

orttraining/orttraining/python/ort_trainer.py

onnxruntime/test/python/onnxruntime_test_ort_trainer.py

thiagocrepaldi · 2020-05-11T18:09:03Z

orttraining/orttraining/python/ort_trainer.py

-    model = FuseSofmaxNLLToSoftmaxCE(model)
+    onnx_model = onnx.load_model_from_string(f.getvalue())
+
+    # Remove 'model_.' prefix introduced by model wrapper for initializers.


Is it necessary because of class WrapModel? Is there a way to prevent this "model_." to be added in the first place?

"model_" added due to the original model being an attribute "model_" of WrapModel. To prevent this we need to avoid creating this extra layer of module. Maybe instead directly overwrite the forward method of original model, seems more hacky as the original forward might be called somewhere else..

It is good to know that initializer names are changed according to their position in the final model. So in case of there is a loss_fn, initializer names will be model_.model_.xxx? Shall we also have this case handled? Or we could merge 2 wrapper into one.

Good point. We should handle the case where both WrapModel and loss_fn are used. Currently the case is not valid due to some issues with WrapModel. Will fix it in a separate PR.

orttraining/orttraining/python/ort_trainer.py

liqunfu · 2020-05-13T21:27:21Z

orttraining/orttraining/python/ort_trainer.py

-    model = FuseSofmaxNLLToSoftmaxCE(model)
+    onnx_model = onnx.load_model_from_string(f.getvalue())
+
+    # Remove 'model_.' prefix introduced by model wrapper for initializers.


It is good to know that initializer names are changed according to their position in the final model. So in case of there is a loss_fn, initializer names will be model_.model_.xxx? Shall we also have this case handled? Or we could merge 2 wrapper into one.

orttraining/orttraining/python/pt_patch.py

thiagocrepaldi

onnxruntime/test/testdata/ckpt_mnist.pt is being added to the repo, but it is not used in the PR

orttraining/orttraining/python/pt_patch.py

orttraining/orttraining/python/ort_trainer.py

BowenBao · 2020-05-19T20:10:51Z

onnxruntime/test/testdata/ckpt_mnist.pt is being updated. It was used in an existing mnist checkpoint test case. After this PR, the names for initializers saved are updated, so need to update previous ckpt file.

BowenBao added the training issues related to ONNX Runtime training; typically submitted using template label May 8, 2020

BowenBao requested review from kit1980, liqunfu, thiagocrepaldi, ashbhandare and spandantiwari May 8, 2020 23:07

BowenBao requested a review from a team as a code owner May 8, 2020 23:07

BowenBao changed the title ~~Remove 'model_.' prefix for onnx model initializers in training~~ Remove 'model_.' prefix from onnx model initializers in training May 8, 2020

ashbhandare reviewed May 8, 2020

View reviewed changes

onnxruntime/test/python/onnxruntime_test_ort_trainer.py Outdated Show resolved Hide resolved

thiagocrepaldi reviewed May 11, 2020

View reviewed changes

ashbhandare reviewed May 11, 2020

View reviewed changes

orttraining/orttraining/python/ort_trainer.py Outdated Show resolved Hide resolved

ashbhandare reviewed May 12, 2020

View reviewed changes

orttraining/orttraining/python/ort_trainer.py Show resolved Hide resolved

BowenBao force-pushed the bowbao/remove_initializer_prefix branch from 7243b80 to fd688ad Compare May 12, 2020 20:38

thiagocrepaldi reviewed May 12, 2020

View reviewed changes

orttraining/orttraining/python/ort_trainer.py Outdated Show resolved Hide resolved

BowenBao force-pushed the bowbao/remove_initializer_prefix branch from d820c62 to b12a081 Compare May 14, 2020 22:21

liqunfu reviewed May 18, 2020

View reviewed changes

BowenBao added 7 commits May 18, 2020 14:04

Remove 'model_.' prefix for onnx model initializers in training

5d253eb

fix test case remove redundant device test

e5e33d3

rename

33603f2

Fix state_dict/load_state_dict with frozen_weight

a3c35c1

nit

ee4b4e0

Add monkey patch for pt opset 10

635c956

remove pt patch in CI

32beb88

BowenBao force-pushed the bowbao/remove_initializer_prefix branch from ae2ef5f to 32beb88 Compare May 18, 2020 21:06

thiagocrepaldi reviewed May 19, 2020

View reviewed changes

orttraining/orttraining/python/pt_patch.py Outdated Show resolved Hide resolved

orttraining/orttraining/python/ort_trainer.py Show resolved Hide resolved

nit: newline

bd715d1

liqunfu approved these changes May 19, 2020

View reviewed changes

thiagocrepaldi approved these changes May 19, 2020

View reviewed changes

BowenBao merged commit 0a5395b into master May 20, 2020

BowenBao deleted the bowbao/remove_initializer_prefix branch May 20, 2020 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove 'model_.' prefix from onnx model initializers in training #3881

Remove 'model_.' prefix from onnx model initializers in training #3881

BowenBao commented May 8, 2020

thiagocrepaldi May 11, 2020

BowenBao May 12, 2020

liqunfu May 13, 2020

BowenBao May 19, 2020

liqunfu May 13, 2020

thiagocrepaldi left a comment

BowenBao commented May 19, 2020

Remove 'model_.' prefix from onnx model initializers in training #3881

Remove 'model_.' prefix from onnx model initializers in training #3881

Conversation

BowenBao commented May 8, 2020

thiagocrepaldi May 11, 2020

Choose a reason for hiding this comment

BowenBao May 12, 2020

Choose a reason for hiding this comment

liqunfu May 13, 2020

Choose a reason for hiding this comment

BowenBao May 19, 2020

Choose a reason for hiding this comment

liqunfu May 13, 2020

Choose a reason for hiding this comment

thiagocrepaldi left a comment

Choose a reason for hiding this comment

BowenBao commented May 19, 2020