Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pytorch/cuda] apply 16-bit mask to the index for device guard registry #45485

Closed
wants to merge 1 commit into from

Conversation

xw285cornell
Copy link
Contributor

@xw285cornell xw285cornell commented Sep 29, 2020

Summary:
Essentially this is the problem reported by ezyang: https://fb.workplace.com/groups/llvm.gcc/permalink/4053565044692080. There are two proposed fixes:

caffe2/c10/core/TensorOptions.h:553:1: error: static_assert failed due to requirement 'sizeof(c10::TensorOptions) <= sizeof(long) * 2' "TensorOptions must fit in 128-bits"
static_assert( sizeof(TensorOptions) <= sizeof(int64_t) * 2,
^

This diff is a temp hack to work around the problem. W/o this patch:

  volatile size_t device_type = static_cast<size_t>(type);
  auto p = device_guard_impl_registry[device_type].load();
  C10_LOG_FIRST_N(WARNING, 10) << "XDW-fail: " << cntr << ", Device type: " << type << ", type cast: " << device_type  << ", guard: " << p;

// output
XDW-fail: 1129, Device type: cuda, type cast: 65537, guard: 0

Another workaround is D23788441, which changes -O3 to -O2. So this seems to be a miscompilation for nvcc or the host compiler.

Test Plan:

Differential Revision: D23972356

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D23972356

@dr-ci
Copy link

dr-ci bot commented Sep 29, 2020

💊 CI failures summary and remediations

As of commit b01e9a3 (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_ios_11_2_1_x86_64_build (1/1)

Step: "Update Homebrew" (full log | diagnosis details | 🔁 rerun) ❄️

fatal: Could not read from remote repository.
Receiving objects:  98% (175/178) Receiving objects:  99% (177/178) Receiving objects: 100% (178/178) Receiving objects: 100% (178/178), 63.90 KiB | 10.65 MiB/s, done. 
Resolving deltas:  96% (89/92) Resolving deltas:  97% (90/92) Resolving deltas: 100% (92/92) Resolving deltas: 100% (92/92), completed with 85 local objects. 
From ssh://github.com/Homebrew/homebrew-cask-versions 
 + 15f6b44...90ed6b8 master     -> origin/master  (forced update) 
+ git reset --hard origin/master 
HEAD is now at 90ed6b8 Update microsoft-edge-beta from 86.0.622.19 to 86.0.622.28 (#9686) 
+ for path in '$(find /usr/local/Homebrew -type d -name .git)' 
+ cd /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/.git/.. 
+ git fetch --depth=1 origin 
Connection to github.com closed by remote host.  
fatal: Could not read from remote repository. 
 
Please make sure you have the correct access rights 
and the repository exists. 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 7 times.

@xw285cornell xw285cornell changed the title [hack][do not land] apply 16-bit mask to the index for device guard registry [pytorch/cuda] apply 16-bit mask to the index for device guard registry Sep 29, 2020
…ry (pytorch#45485)

Summary:
Pull Request resolved: pytorch#45485

Essentially this is the problem reported by ezyang: https://fb.workplace.com/groups/llvm.gcc/permalink/4053565044692080. There are two proposed fixes:
* pytorch#44883: this doesn't work because it fails some static assert at runtime
```
caffe2/c10/core/TensorOptions.h:553:1: error: static_assert failed due to requirement 'sizeof(c10::TensorOptions) <= sizeof(long) * 2' "TensorOptions must fit in 128-bits"
static_assert( sizeof(TensorOptions) <= sizeof(int64_t) * 2,
^
```
* pytorch#44885: to be tested

This diff is a temp hack to work around the problem. W/o this patch:

```
  volatile size_t device_type = static_cast<size_t>(type);
  auto p = device_guard_impl_registry[device_type].load();
  C10_LOG_FIRST_N(WARNING, 10) << "XDW-fail: " << cntr << ", Device type: " << type << ", type cast: " << device_type  << ", guard: " << p;

// output
XDW-fail: 1129, Device type: cuda, type cast: 65537, guard: 0

```

Another workaround is D23788441, which changes -O3 to -O2. So this seems to be a miscompilation for nvcc or the host compiler.

Differential Revision: D23972356

fbshipit-source-id: afabd0d37fbfc1ce685bdf07b320cb204f421a3d
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D23972356

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 2fbe597.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants