-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][InvokeSIMD] Add basic numerics test for simd_mask #8976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
group<1> g = ndi.get_group(); | ||
uint32_t i = | ||
sg.get_group_linear_id() * VL + g.get_group_linear_id() * GroupSize; | ||
uint32_t wi_id = i + sg.get_local_id(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the same as ndi.get_global_linear_id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah sorry, old test i based this off
float *A = | ||
static_cast<float *>(malloc_shared(Size * sizeof(float), dev, ctxt)); | ||
float *C = | ||
static_cast<float *>(malloc_shared(Size * sizeof(float), dev, ctxt)); | ||
|
||
bool *M = static_cast<bool *>(malloc_shared(Size * sizeof(bool), dev, ctxt)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for USM over buffers? I think malloc_shared
is an optional feature and requires us to check for its sycl::aspect
.
And even if we are to use USM, there are overloads of these APIs that accept sycl::queue
and a templated type, something like
auto *M = malloc_shared<bool>(Size, q);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry i just copied simd_mask.cpp, i'll fix this
} | ||
|
||
int main() { | ||
sycl::queue q{gpu_selector_v}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sycl::queue q{gpu_selector_v}; | |
sycl::queue q; |
because we already have GPU_RUN_PLACEHOLDER
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix, thanks
<< "\n"; | ||
} | ||
} | ||
if((i % 2 == 1) && C[i] > 1/1e8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this condition come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the for loop initializing the buffers, just checking that C[i] is 0.
M[i] = i % 2;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was unclear. I'm interested in the second part of it. I'd think we have a zero-init somewhere, but then why aren't we comparing with it? I believe 0.0f is an exact value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah sorry that's clearer, i thought there might be some fp precision issues but since it's initted to 0 and isn't touched there isn't, thanks will fix
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
// | ||
// Check that full compilation works: | ||
// RUN: %clangxx -fsycl -fno-sycl-device-code-split-esimd -Xclang -fsycl-allow-func-ptr %s -o %t.out | ||
// RUN: env IGC_VCSaveStackCallLinkage=1 IGC_VCDirectCallsOnly=1 %GPU_RUN_PLACEHOLDER %t.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my education, can you please elaborate, why these env vars are needed?
What is VISALTO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so VCSaveStackCallLinkage=1 IGC_VCDirectCallsOnly=1
are actually needed to get IGC to run with invoke_simd
, all the tests have them. i'm not sure why it doens't work automatically but we do need to pass them
for VISALTO
, i just copied it from another test but it looks like that's testing with some optimization flags for link time optimization IGC, so that seems to be more testing an IGC feature and not super relevant to us, so i'll remove this, thanks
@intel/dpcpp-devops-reviewers This never ran on hardware in precommit, and had an problem that would cause it to always fail which was missed so it failed in postcommit. Can someone see why it didn't run on hardware? Thanks |
I don't have any ideas... @bader, do you? |
Because of "bug" (or feature?) in CI system, introduced with ee781e4 (@apstasen, FYI). CI uses pre-commit workflow from 41e5efa, but "generate test matrix" reusable workflow from 39d1c65. Udit recently committed 19d023b, which changes the interface of reusable workflow, so outdated pre-commit can't use new "gen test matrix" workflow. @v-klochkov, @intel/llvm-gatekeepers, please, verify that all tests were run and pass before merging PRs. |
does this mean it should start running on hardware again in newer PRs? |
CI works as expected if your branch has 19d023b commit. |
ok cool thanks, bad luck for me i guess |
Yes, but IMHO, CI should not behave like this. Anyway, I expect gatekeepers to request merge with the sycl branch in such cases. |
No description provided.