-
Notifications
You must be signed in to change notification settings - Fork 26.2k
hold references to storages during TorchScript serializaiton #59642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 7f61ba2 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
|
@Lilyjjo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Differential Revision: [D28968947](https://our.internmc.facebook.com/intern/diff/D28968947) [ghstack-poisoned]
|
@Lilyjjo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
Can this pr be cherry-picked? Otherwise heavy save/load process could be flaky |
| ".storage"); | ||
| tensor.storage().unsafeGetStorageImpl())); | ||
| tensor_names.push_back(string_id + ".storage"); | ||
| storage_context_.addStorage(string_id, tensor.storage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it's a great idea to use StorageContext. The storage_context_ lives in the life span of the serializer so it should be safe for any tensor serialization. @cccclai could you test if it works for bytecode v5 serialization with quantized models? It looks different than #59488 that you've already tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test it by re-introduce #58629 and use the following code to test:
for i in range(20):
print(i)
m = torch.jit.load(model_path)
m._save_for_lite_interpreter(model_resave_path)
mm = _load_for_lite_interpreter(model_resave_path)
| std::to_string(reinterpret_cast<std::intptr_t>( | ||
| tensor.storage().unsafeGetStorageImpl())) + | ||
| ".storage"); | ||
| tensor.storage().unsafeGetStorageImpl())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the only caveat of using StorageImpl is that the id may change from run to run. Should not be a real issue, but for metadata comparison on data.pkl or bytecode.pkl, the tensor metadata may be different literally, although the tensor content does not change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the loss of determinism in the naming is not the best. Others were wanting this to be fixed as well. I can address it in a follow up PR
cccclai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from mobile side.
|
Do we want to try to get this PR into the 1.9 release? I saw that people were still adding to the branch earlier today #58518 It would fix the serialization issues I think that are currently present in the release |
Let's try. Can you comment under the tracker? I will also comment there as well |
iseeyuan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
|
@Lilyjjo merged this pull request in 3271853. |
…#59642) Summary: Pull Request resolved: pytorch#59642 Test Plan: Imported from OSS Reviewed By: jbschlosser, cccclai Differential Revision: D28968947 Pulled By: Lilyjjo fbshipit-source-id: 0046da8adb3a29fb108965a1d2201749fe2d0b41
Reintroduce sharing constant between bytecode and torchscript (same as #58629) after the fix #59642 Test it by: ``` model_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GKzegAqmfRoYTygDACAlZhsBGFcRbmQwAAAA.ptl" # model_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GICWmAARE1pJ3yQEAFk58lqFfzVGbgdIAAAi.ptl" model_resave_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GKzegAqmfRoYTygDACAlZhsBGFcRbmQwAAAA_resave.ptl" for i in range(20): print(i) m = torch.jit.load(model_path) m._save_for_lite_interpreter(model_resave_path) mm = _load_for_lite_interpreter(model_resave_path) ``` Differential Revision: [D29002345](https://our.internmc.facebook.com/intern/diff/D29002345) [ghstack-poisoned]
Reintroduce sharing constant between bytecode and torchscript (same as #58629) after the fix #59642 Test it by: ``` model_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GKzegAqmfRoYTygDACAlZhsBGFcRbmQwAAAA.ptl" # model_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GICWmAARE1pJ3yQEAFk58lqFfzVGbgdIAAAi.ptl" model_resave_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GKzegAqmfRoYTygDACAlZhsBGFcRbmQwAAAA_resave.ptl" for i in range(20): print(i) m = torch.jit.load(model_path) m._save_for_lite_interpreter(model_resave_path) mm = _load_for_lite_interpreter(model_resave_path) ``` Differential Revision: [D29002345](https://our.internmc.facebook.com/intern/diff/D29002345) [ghstack-poisoned]
Reintroduce sharing constant between bytecode and torchscript (same as #58629) after the fix #59642 Test it by: ``` model_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GKzegAqmfRoYTygDACAlZhsBGFcRbmQwAAAA.ptl" # model_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GICWmAARE1pJ3yQEAFk58lqFfzVGbgdIAAAi.ptl" model_resave_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GKzegAqmfRoYTygDACAlZhsBGFcRbmQwAAAA_resave.ptl" for i in range(20): print(i) m = torch.jit.load(model_path) m._save_for_lite_interpreter(model_resave_path) mm = _load_for_lite_interpreter(model_resave_path) ``` Differential Revision: [D29002345](https://our.internmc.facebook.com/intern/diff/D29002345) [ghstack-poisoned]
Reintroduce sharing constant between bytecode and torchscript (same as #58629) after the fix #59642 Test it by: ``` model_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GKzegAqmfRoYTygDACAlZhsBGFcRbmQwAAAA.ptl" # model_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GICWmAARE1pJ3yQEAFk58lqFfzVGbgdIAAAi.ptl" model_resave_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GKzegAqmfRoYTygDACAlZhsBGFcRbmQwAAAA_resave.ptl" for i in range(20): print(i) m = torch.jit.load(model_path) m._save_for_lite_interpreter(model_resave_path) mm = _load_for_lite_interpreter(model_resave_path) ``` Differential Revision: [D29002345](https://our.internmc.facebook.com/intern/diff/D29002345) [ghstack-poisoned]
Reintroduce sharing constant between bytecode and torchscript (same as #58629) after the fix #59642 Test it by: ``` model_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GKzegAqmfRoYTygDACAlZhsBGFcRbmQwAAAA.ptl" # model_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GICWmAARE1pJ3yQEAFk58lqFfzVGbgdIAAAi.ptl" model_resave_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GKzegAqmfRoYTygDACAlZhsBGFcRbmQwAAAA_resave.ptl" for i in range(20): print(i) m = torch.jit.load(model_path) m._save_for_lite_interpreter(model_resave_path) mm = _load_for_lite_interpreter(model_resave_path) ``` Differential Revision: [D29002345](https://our.internmc.facebook.com/intern/diff/D29002345) [ghstack-poisoned]
Reintroduce sharing constant between bytecode and torchscript (same as #58629) after the fix #59642 Test it by: ``` model_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GKzegAqmfRoYTygDACAlZhsBGFcRbmQwAAAA.ptl" # model_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GICWmAARE1pJ3yQEAFk58lqFfzVGbgdIAAAi.ptl" model_resave_path = "/Users/chenlai/Documents/pytorch/reuse_constant/tmp/other_models/model_GKzegAqmfRoYTygDACAlZhsBGFcRbmQwAAAA_resave.ptl" for i in range(20): print(i) m = torch.jit.load(model_path) m._save_for_lite_interpreter(model_resave_path) mm = _load_for_lite_interpreter(model_resave_path) ``` Differential Revision: [D29002345](https://our.internmc.facebook.com/intern/diff/D29002345) [ghstack-poisoned]
Summary: Pull Request resolved: #59722 Reintroduce sharing constant between bytecode and torchscript (same as #58629) after the fix #59642 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D29002345 Pulled By: cccclai fbshipit-source-id: d9c8e474ff57d0509580183206df038a24ad27e3
Fixes issue for serialization problem caused by using memory address of storages for mobile and torch.package models. - #59642 hold references to storages during TorchScript serialization Uses StorageContext to hold a reference to all storages seen during TorchScript serialization to allow for tensors to be created/destroyed during serialization process. Tracking of the storages solves for the ABA memory problem.

Stack from ghstack:
Uses StorageContext to hold a reference to all storages seen during TorchScript serialization to allow for tensors to be created/destroyed during serialization process. Tracking of the storages solves for the ABA memory problem.
Differential Revision: D28968947