[pytorch/cuda] apply 16-bit mask to the index for device guard registry #45485

xw285cornell · 2020-09-29T04:49:58Z

Summary:
Essentially this is the problem reported by ezyang: https://fb.workplace.com/groups/llvm.gcc/permalink/4053565044692080. There are two proposed fixes:

[DO NOT MERGE] Nuclear option for nvcc Device miscompilation #44883: this doesn't work because it fails some static assert at runtime

caffe2/c10/core/TensorOptions.h:553:1: error: static_assert failed due to requirement 'sizeof(c10::TensorOptions) <= sizeof(long) * 2' "TensorOptions must fit in 128-bits"
static_assert( sizeof(TensorOptions) <= sizeof(int64_t) * 2,
^

Workaround nvcc miscompilation of two 16-bit fields together. #44885: to be tested

This diff is a temp hack to work around the problem. W/o this patch:

  volatile size_t device_type = static_cast<size_t>(type);
  auto p = device_guard_impl_registry[device_type].load();
  C10_LOG_FIRST_N(WARNING, 10) << "XDW-fail: " << cntr << ", Device type: " << type << ", type cast: " << device_type  << ", guard: " << p;

// output
XDW-fail: 1129, Device type: cuda, type cast: 65537, guard: 0

Another workaround is D23788441, which changes -O3 to -O2. So this seems to be a miscompilation for nvcc or the host compiler.

Test Plan:

Differential Revision: D23972356

facebook-github-bot · 2020-09-29T04:50:16Z

This pull request was exported from Phabricator. Differential Revision: D23972356

dr-ci · 2020-09-29T04:56:46Z

💊 CI failures summary and remediations

As of commit b01e9a3 (more details on the Dr. CI page):

✅ None of the CI failures appear to be your fault 💚

1/1 tentatively recognized as flaky ❄️
- Click here to rerun these jobs

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

pytorch_ios_11_2_1_x86_64_build (1/1)

Step: "Update Homebrew" (full log | diagnosis details | 🔁 rerun) ❄️

fatal: Could not read from remote repository.

Receiving objects:  98% (175/178) Receiving objects:  99% (177/178) Receiving objects: 100% (178/178) Receiving objects: 100% (178/178), 63.90 KiB | 10.65 MiB/s, done. 
Resolving deltas:  96% (89/92) Resolving deltas:  97% (90/92) Resolving deltas: 100% (92/92) Resolving deltas: 100% (92/92), completed with 85 local objects. 
From ssh://github.com/Homebrew/homebrew-cask-versions 
 + 15f6b44...90ed6b8 master     -> origin/master  (forced update) 
+ git reset --hard origin/master 
HEAD is now at 90ed6b8 Update microsoft-edge-beta from 86.0.622.19 to 86.0.622.28 (#9686) 
+ for path in '$(find /usr/local/Homebrew -type d -name .git)' 
+ cd /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/.git/.. 
+ git fetch --depth=1 origin 
Connection to github.com closed by remote host.  
fatal: Could not read from remote repository. 
 
Please make sure you have the correct access rights 
and the repository exists.

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 7 times.

…ry (pytorch#45485) Summary: Pull Request resolved: pytorch#45485 Essentially this is the problem reported by ezyang: https://fb.workplace.com/groups/llvm.gcc/permalink/4053565044692080. There are two proposed fixes: * pytorch#44883: this doesn't work because it fails some static assert at runtime ``` caffe2/c10/core/TensorOptions.h:553:1: error: static_assert failed due to requirement 'sizeof(c10::TensorOptions) <= sizeof(long) * 2' "TensorOptions must fit in 128-bits" static_assert( sizeof(TensorOptions) <= sizeof(int64_t) * 2, ^ ``` * pytorch#44885: to be tested This diff is a temp hack to work around the problem. W/o this patch: ``` volatile size_t device_type = static_cast<size_t>(type); auto p = device_guard_impl_registry[device_type].load(); C10_LOG_FIRST_N(WARNING, 10) << "XDW-fail: " << cntr << ", Device type: " << type << ", type cast: " << device_type << ", guard: " << p; // output XDW-fail: 1129, Device type: cuda, type cast: 65537, guard: 0 ``` Another workaround is D23788441, which changes -O3 to -O2. So this seems to be a miscompilation for nvcc or the host compiler. Differential Revision: D23972356 fbshipit-source-id: afabd0d37fbfc1ce685bdf07b320cb204f421a3d

facebook-github-bot · 2020-09-30T05:00:53Z

This pull request was exported from Phabricator. Differential Revision: D23972356

facebook-github-bot · 2020-10-06T06:11:24Z

This pull request has been merged in 2fbe597.

facebook-github-bot added the fb-exported label Sep 29, 2020

xw285cornell changed the title ~~[hack][do not land] apply 16-bit mask to the index for device guard registry~~ [pytorch/cuda] apply 16-bit mask to the index for device guard registry Sep 29, 2020

xw285cornell force-pushed the export-D23972356 branch from 49c1b78 to b01e9a3 Compare September 30, 2020 05:00

facebook-github-bot closed this in 2fbe597 Oct 6, 2020

facebook-github-bot added the merged label Oct 6, 2020

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pytorch/cuda] apply 16-bit mask to the index for device guard registry #45485

[pytorch/cuda] apply 16-bit mask to the index for device guard registry #45485

xw285cornell commented Sep 29, 2020 •

edited

Loading

facebook-github-bot commented Sep 29, 2020

dr-ci bot commented Sep 29, 2020 •

edited

Loading

facebook-github-bot commented Sep 30, 2020

facebook-github-bot commented Oct 6, 2020

[pytorch/cuda] apply 16-bit mask to the index for device guard registry #45485

[pytorch/cuda] apply 16-bit mask to the index for device guard registry #45485

Conversation

xw285cornell commented Sep 29, 2020 • edited Loading

facebook-github-bot commented Sep 29, 2020

dr-ci bot commented Sep 29, 2020 • edited Loading

💊 CI failures summary and remediations

❄️ 1 failure tentatively classified as flaky

pytorch_ios_11_2_1_x86_64_build (1/1)

facebook-github-bot commented Sep 30, 2020

facebook-github-bot commented Oct 6, 2020

xw285cornell commented Sep 29, 2020 •

edited

Loading

dr-ci bot commented Sep 29, 2020 •

edited

Loading