Skip to content

Conversation

@lobstergrindset
Copy link

@lobstergrindset lobstergrindset commented Jun 4, 2021

Stack from ghstack:

Changes made:

  • Adds unique_id field to StorageImpl and added access to the field from Python
  • Changed torch.package and mobile's tensor_cdata_naming_scheme to tensor_unique_id_naming_scheme to use this unique identifier for storages serialization instead of the cptr to the StorageImpl

These change are needed to allow for quantization to create/delete tensors during serialization (which leads to the StorageImpl cptrs being reused). The only functional change for the torch.package/mobile code is that now storages have a different naming scheme.

Differential Revision: D28911924

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 4, 2021

💊 CI failures summary and remediations

As of commit 7565e82 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 05 04:46:27 AssertionError: False is not tr...lowed difference with rtol=0 and atol=0 is only 0!
Jun 05 04:46:27 ----------------------------------------------------------------------
Jun 05 04:46:27 Traceback (most recent call last):
Jun 05 04:46:27   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 398, in wrapper
Jun 05 04:46:27     self._join_processes(fn)
Jun 05 04:46:27   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 590, in _join_processes
Jun 05 04:46:27     self._check_return_codes(elapsed_time)
Jun 05 04:46:27   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 643, in _check_return_codes
Jun 05 04:46:27     i, first_process.exitcode, p.exitcode
Jun 05 04:46:27   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1403, in assertEqual
Jun 05 04:46:27     super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Jun 05 04:46:27 AssertionError: False is not true : Scalars failed to compare as equal! Comparing 1 and -11 gives a difference of 12, but the allowed difference with rtol=0 and atol=0 is only 0!
Jun 05 04:46:27 Expect process 1 exit code to match Process 0 exit code of -11, but got 1
Jun 05 04:46:27 
Jun 05 04:46:27 ----------------------------------------------------------------------
Jun 05 04:46:27 Ran 213 tests in 396.897s
Jun 05 04:46:27 
Jun 05 04:46:27 FAILED (failures=1, skipped=118)
Jun 05 04:46:27 
Jun 05 04:46:27 Generating XML reports...
Jun 05 04:46:27 Generated XML report: test-reports/dist-nccl/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20210605043950.xml
Jun 05 04:46:27 Traceback (most recent call last):

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jun 4, 2021
…identifier"


Changes made:
- Adds unique_id field to `StorageImpl` and added access to the field from Python 
- Changed `torch.package` and mobile's `tensor_cdata_naming_scheme` to `tensor_unique_id_naming_scheme` to use this unique identifier for storages serialization instead of the cptr to the `StorageImpl`

These change are needed to allow for quantization to create/delete tensors during serialization (which leads to the StorageImpl cptrs being reused)

[ghstack-poisoned]
lobstergrindset pushed a commit that referenced this pull request Jun 4, 2021
@lobstergrindset
Copy link
Author

@Lilyjjo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@dreiss dreiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I'll give others a chance to comment before accepting. In particular, I like that this approach makes the storage files have shorter names that are deterministic (assuming you have a deterministic program that you restart every time you want to export your model).

If others feel strongly that we'd be better off by continuing to use the storage addresses and just keep references during serialization to prevent deallocation, I'm fine with that as well.

// local to process cuda memory allocation
bool received_cuda_;
Allocator* allocator_;
uint64_t unique_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have a trailing underscore to match the other members.


namespace c10 {

static std::atomic<std::uint64_t> next_unique_storage_id{0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe drop the std:: for uint64_t. I don't think that's common in our codebase.

Note to self: Profilo uses a std::atomic<int64_t>, so I assume this is safe even on 32-bit apps that are built with very old toolchains.

@kimishpatel
Copy link
Contributor

Should just holding onto the reference of tensors being pickled, for the duration of serialize, be sufficient and perhaps cleaner?

@lobstergrindset
Copy link
Author

Should just holding onto the reference of tensors being pickled, for the duration of serialize, be sufficient and perhaps cleaner?

@kimishpatel Yeah, that was the other implementation idea that was discussed. It's unclear which solution is better. The torch.package pathway could get a little ugly with holding onto the StorageImpl references because users are able to place as many models into the same archive as they want, so we'd have to write up some small infra to track the StorageImpl pointers both on the python and cpp side across the lifetime of the PackageExporter. It would be doable, but it would add more complexity to the torch.package code which is undesirable from the torch.package perspective

@kimishpatel
Copy link
Contributor

Should just holding onto the reference of tensors being pickled, for the duration of serialize, be sufficient and perhaps cleaner?

@kimishpatel Yeah, that was the other implementation idea that was discussed. It's unclear which solution is better. The torch.package pathway could get a little ugly with holding onto the StorageImpl references because users are able to place as many models into the same archive as they want, so we'd have to write up some small infra to track the StorageImpl pointers both on the python and cpp side across the lifetime of the PackageExporter. It would be doable, but it would add more complexity to the torch.package code which is undesirable from the torch.package perspective

Did not know about torch.package interaction. Makes sense now.

return received_cuda_;
}

uint64_t get_unique_id() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment about how this is used

@dhruvbird
Copy link
Contributor

I think I understand how this works. cc @swolchok in case this may be a perf. issue since there's now an std::atomic being incremented for every new memory allocation (I think it should be okay since allocating memory is probably most costly anyway). @Lilyjjo is there a benchmark which shows the impact of this change (curious)?

@dhruvbird
Copy link
Contributor

There are probably some edge cases that may not be handled by this code, and am wondering if these can be documented somewhere, since I find that extremely useful (and I assume others will too). My understanding is that the main motivation behind this change is to prevent the serializer from getting confused if it sees the same memory address but is actually for 2 different tensors (due to memory address re-use). Should we document this somewhere (maybe where @raziel mentioned??)

  1. from_blob() case doesn't seem to be handled. i.e. if you create 2 tensors from the same (unchanged) memory using from_blob(), they will get 2 different IDs (I think) and hence they will be serialized as separate blobs (please could you confirm or deny this)?
  2. Something else??

@dhruvbird
Copy link
Contributor

I can't seem to figure out how to comment on some code that isn't being change in github, so pasting a diffusion link here.

https://www.internalfb.com/intern/diffusion/FBS/browsefile/master/fbcode/caffe2/c10/core/StorageImpl.h?commit=05fa0747d70fbb47c1e4cd89b5c7947f80ee5ff5&lines=175-185

  /**
   * Can only be called when use_count is 1
   */
  void UniqueStorageShareExternalPointer(
      at::DataPtr&& data_ptr,
      size_t size_bytes) {
    data_ptr_ = std::move(data_ptr);
    size_bytes_ = size_bytes;
    allocator_ = nullptr;
    resizable_ = false;
  }

Q. Should the ID be copied over in this case too?

Q. Also, should set_data_ptr() also update the unique ID?

@kimishpatel
Copy link
Contributor

There are probably some edge cases that may not be handled by this code, and am wondering if these can be documented somewhere, since I find that extremely useful (and I assume others will too). My understanding is that the main motivation behind this change is to prevent the serializer from getting confused if it sees the same memory address but is actually for 2 different tensors (due to memory address re-use). Should we document this somewhere (maybe where @raziel mentioned??)

  1. from_blob() case doesn't seem to be handled. i.e. if you create 2 tensors from the same (unchanged) memory using from_blob(), they will get 2 different IDs (I think) and hence they will be serialized as separate blobs (please could you confirm or deny this)?

I think this is a general case that exists outside of this diff also?

@kimishpatel
Copy link
Contributor

Should just holding onto the reference of tensors being pickled, for the duration of serialize, be sufficient and perhaps cleaner?

@kimishpatel Yeah, that was the other implementation idea that was discussed. It's unclear which solution is better. The torch.package pathway could get a little ugly with holding onto the StorageImpl references because users are able to place as many models into the same archive as they want, so we'd have to write up some small infra to track the StorageImpl pointers both on the python and cpp side across the lifetime of the PackageExporter. It would be doable, but it would add more complexity to the torch.package code which is undesirable from the torch.package perspective

Did not know about torch.package interaction. Makes sense now.

@Lilyjjo although I wonder if tensor creation introduces perf issues.

@dhruvbird
Copy link
Contributor

I think this is a general case that exists outside of this diff also?

Yes! My mental model is that this change solves some problems (not all) - the question is if we should more rigorously document the handled and not handled cases so that the next time we need to iterate on this, it's super clear what needs to be changed.


namespace c10 {

static std::atomic<std::uint64_t> next_unique_storage_id{0};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wary of using a global to track this information and add an additional 8 bytes to StorageImpl. It also seems to be a local answer to a more fundamental question, which is: how do we handle identity for torch objects?

So far, we have mostly followed Python's convention and used the pointer. This is also how we do id()/hash() for Python tensors and their storages and thus for their serialization.

This obviously has serious issues if we are re-using memory, but has the advantage that it's consistent across all objects and matches what Python does. @ezyang any thoughts here?

@dhruvbird
Copy link
Contributor

I think this is a general case that exists outside of this diff also?

On second thought, I think that the from_blob() case may be getting handled differently if using the memory (pointer) address, since in that case one would consider the tensor storage to be the same. It of course has other issues that we can trying to solve with this diff/PR.

@kimishpatel
Copy link
Contributor

I think this is a general case that exists outside of this diff also?

On second thought, I think that the from_blob() case may be getting handled differently if using the memory (pointer) address, since in that case one would consider the tensor storage to be the same. It of course has other issues that we can trying to solve with this diff/PR.

My understanding is that from_blob will give you non-owning Tensor since the void* Tensor is created from is not owned by Tensor. In that case you cannot make any assumption. And if such a tensor (very unlikely) gets serialized then yes you can create two tensors via from_blob using same void* that can get serialized separately. And I think that is ok.

@lobstergrindset
Copy link
Author

I can get a version of the reference tracking version of this diff out as well, after discussion with @suo it seems more reasonable than I originally thought for torch.package. @dreiss we can also discuss ways to shorten the names of the tensors in the serialization file (possibly by removing the .storage tag as a start)

@dreiss
Copy link
Contributor

dreiss commented Jun 8, 2021

discuss ways to shorten the names of the tensors in the serialization file

I was thinking about this over the weekend as well. What about, instead of holding on to a vector or set of seen StorageImpl objects, we hold onto a map from StorageImpl to integer where we just increment the number for each added StorageImpl (basically, if storage not in m: m[storage] = len(m)). That way, we'd get short and deterministic names. Even more deterministic than the current diff, you could say.

@ezyang
Copy link
Contributor

ezyang commented Jun 8, 2021

Chiming in to say that I am still pro solution that retains StorageImpls so you don't have address reuse problem.

@suo
Copy link
Member

suo commented Jun 8, 2021

Yeah, I agree. We already have a StorageContext for deserialization, we should have one for serialization. It can hold refs to the storages + map them shorter names as @dreiss suggests. That way, names can be handed out local to a given serialization context and we can ensure they're all good, without having to deal with difficult identity issues globally.

@dreiss
Copy link
Contributor

dreiss commented Jun 8, 2021

map them shorter names as @dreiss suggests

This can be done as a separate diff, since the reader doesn't care what the file names are, so if this feature is holding up progress in any way, let's punt on it and move forward with the fix.

@lobstergrindset
Copy link
Author

Alternative reference-tracking PR: #59642

map them shorter names as @dreiss suggests

I can look into the naming fixes in a follow up! This PR solves storage identity problems for TorchScript/Mobile but not for torch.package. This is because the torch.package won't retain references to storages that are serialized just from the python view (fixing this will requires some tweaking to other parts of the code base, but no users are known to be running into issues with this particular problem yet, so I think it's fine for now).

@kimishpatel
Copy link
Contributor

won't retain references to storages that are serialized just from the python view (fixing this will requires some tweaking to other parts of the code base, but no users are known to be running into issues with this particular problem yet, so I think it's fine for now).

Isn’t that a bit dangerous? The signature of the problem was that lite interpreter module was unloadable occasionally but I suspect the worst could be that you get similar tensors but not the same which leads to incorrect result. And this becomes very hard to debug. I don’t know what is the minimal that can be done here such as throwing an error if this is about to happen?

@lobstergrindset
Copy link
Author

Isn’t that a bit dangerous? The signature of the problem was that lite interpreter module was unloadable occasionally but I suspect the worst could be that you get similar tensors but not the same which leads to incorrect result. And this becomes very hard to debug. I don’t know what is the minimal that can be done here such as throwing an error if this is about to happen?

Closing this as we took the storage reference retaining route, implemented for mobile here: #59642, and a fix for the same problem in torch.package here: #59735

Thanks y'all for the input and for helping solve this problem!

@facebook-github-bot facebook-github-bot deleted the gh/Lilyjjo/83/head branch July 31, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants