-
Notifications
You must be signed in to change notification settings - Fork 23.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pytorch/cuda] apply 16-bit mask to the index for device guard registry #45485
Conversation
This pull request was exported from Phabricator. Differential Revision: D23972356 |
💊 CI failures summary and remediationsAs of commit b01e9a3 (more details on the Dr. CI page): ✅ None of the CI failures appear to be your fault 💚
❄️ 1 failure tentatively classified as flakybut reruns have not yet been triggered to confirm:
|
…ry (pytorch#45485) Summary: Pull Request resolved: pytorch#45485 Essentially this is the problem reported by ezyang: https://fb.workplace.com/groups/llvm.gcc/permalink/4053565044692080. There are two proposed fixes: * pytorch#44883: this doesn't work because it fails some static assert at runtime ``` caffe2/c10/core/TensorOptions.h:553:1: error: static_assert failed due to requirement 'sizeof(c10::TensorOptions) <= sizeof(long) * 2' "TensorOptions must fit in 128-bits" static_assert( sizeof(TensorOptions) <= sizeof(int64_t) * 2, ^ ``` * pytorch#44885: to be tested This diff is a temp hack to work around the problem. W/o this patch: ``` volatile size_t device_type = static_cast<size_t>(type); auto p = device_guard_impl_registry[device_type].load(); C10_LOG_FIRST_N(WARNING, 10) << "XDW-fail: " << cntr << ", Device type: " << type << ", type cast: " << device_type << ", guard: " << p; // output XDW-fail: 1129, Device type: cuda, type cast: 65537, guard: 0 ``` Another workaround is D23788441, which changes -O3 to -O2. So this seems to be a miscompilation for nvcc or the host compiler. Differential Revision: D23972356 fbshipit-source-id: afabd0d37fbfc1ce685bdf07b320cb204f421a3d
49c1b78
to
b01e9a3
Compare
This pull request was exported from Phabricator. Differential Revision: D23972356 |
This pull request has been merged in 2fbe597. |
Summary:
Essentially this is the problem reported by ezyang: https://fb.workplace.com/groups/llvm.gcc/permalink/4053565044692080. There are two proposed fixes:
This diff is a temp hack to work around the problem. W/o this patch:
Another workaround is D23788441, which changes -O3 to -O2. So this seems to be a miscompilation for nvcc or the host compiler.
Test Plan:
Differential Revision: D23972356