Skip to content

[SYCL][InvokeSIMD] Add basic numerics test for simd_mask #8976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 7, 2023

Conversation

sarnex
Copy link
Contributor

@sarnex sarnex commented Apr 6, 2023

No description provided.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@sarnex sarnex temporarily deployed to aws April 6, 2023 17:42 — with GitHub Actions Inactive
@sarnex sarnex temporarily deployed to aws April 6, 2023 17:45 — with GitHub Actions Inactive
@sarnex sarnex marked this pull request as ready for review April 6, 2023 17:56
@sarnex sarnex requested a review from a team as a code owner April 6, 2023 17:56
@sarnex sarnex requested review from aelovikov-intel and a team April 6, 2023 17:56
group<1> g = ndi.get_group();
uint32_t i =
sg.get_group_linear_id() * VL + g.get_group_linear_id() * GroupSize;
uint32_t wi_id = i + sg.get_local_id();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the same as ndi.get_global_linear_id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sorry, old test i based this off

Comment on lines 49 to 54
float *A =
static_cast<float *>(malloc_shared(Size * sizeof(float), dev, ctxt));
float *C =
static_cast<float *>(malloc_shared(Size * sizeof(float), dev, ctxt));

bool *M = static_cast<bool *>(malloc_shared(Size * sizeof(bool), dev, ctxt));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for USM over buffers? I think malloc_shared is an optional feature and requires us to check for its sycl::aspect.

And even if we are to use USM, there are overloads of these APIs that accept sycl::queue and a templated type, something like

  auto *M = malloc_shared<bool>(Size, q);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i just copied simd_mask.cpp, i'll fix this

}

int main() {
sycl::queue q{gpu_selector_v};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sycl::queue q{gpu_selector_v};
sycl::queue q;

because we already have GPU_RUN_PLACEHOLDER.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix, thanks

<< "\n";
}
}
if((i % 2 == 1) && C[i] > 1/1e8) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this condition come from?

Copy link
Contributor Author

@sarnex sarnex Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the for loop initializing the buffers, just checking that C[i] is 0.

M[i] = i % 2;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was unclear. I'm interested in the second part of it. I'd think we have a zero-init somewhere, but then why aren't we comparing with it? I believe 0.0f is an exact value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah sorry that's clearer, i thought there might be some fp precision issues but since it's initted to 0 and isn't touched there isn't, thanks will fix

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@sarnex sarnex requested a review from aelovikov-intel April 6, 2023 19:36
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@sarnex sarnex temporarily deployed to aws April 6, 2023 20:04 — with GitHub Actions Inactive
@sarnex sarnex temporarily deployed to aws April 6, 2023 20:04 — with GitHub Actions Inactive
//
// Check that full compilation works:
// RUN: %clangxx -fsycl -fno-sycl-device-code-split-esimd -Xclang -fsycl-allow-func-ptr %s -o %t.out
// RUN: env IGC_VCSaveStackCallLinkage=1 IGC_VCDirectCallsOnly=1 %GPU_RUN_PLACEHOLDER %t.out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education, can you please elaborate, why these env vars are needed?
What is VISALTO?

Copy link
Contributor Author

@sarnex sarnex Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so VCSaveStackCallLinkage=1 IGC_VCDirectCallsOnly=1 are actually needed to get IGC to run with invoke_simd, all the tests have them. i'm not sure why it doens't work automatically but we do need to pass them

for VISALTO, i just copied it from another test but it looks like that's testing with some optimization flags for link time optimization IGC, so that seems to be more testing an IGC feature and not super relevant to us, so i'll remove this, thanks

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@sarnex sarnex requested a review from v-klochkov April 7, 2023 14:45
@sarnex sarnex temporarily deployed to aws April 7, 2023 15:10 — with GitHub Actions Inactive
@sarnex sarnex temporarily deployed to aws April 7, 2023 15:10 — with GitHub Actions Inactive
@v-klochkov v-klochkov merged commit 3a60c85 into intel:sycl Apr 7, 2023
@sarnex
Copy link
Contributor Author

sarnex commented Apr 7, 2023

@intel/dpcpp-devops-reviewers This never ran on hardware in precommit, and had an problem that would cause it to always fail which was missed so it failed in postcommit. Can someone see why it didn't run on hardware? Thanks

@aelovikov-intel
Copy link
Contributor

I don't have any ideas... @bader, do you?

@bader
Copy link
Contributor

bader commented Apr 7, 2023

Because of "bug" (or feature?) in CI system, introduced with ee781e4 (@apstasen, FYI). CI uses pre-commit workflow from 41e5efa, but "generate test matrix" reusable workflow from 39d1c65. Udit recently committed 19d023b, which changes the interface of reusable workflow, so outdated pre-commit can't use new "gen test matrix" workflow.

@v-klochkov, @intel/llvm-gatekeepers, please, verify that all tests were run and pass before merging PRs.

@sarnex
Copy link
Contributor Author

sarnex commented Apr 7, 2023

@bader

so outdated pre-commit can't use new "gen test matrix" workflow

does this mean it should start running on hardware again in newer PRs?

@bader
Copy link
Contributor

bader commented Apr 7, 2023

CI works as expected if your branch has 19d023b commit.

@sarnex
Copy link
Contributor Author

sarnex commented Apr 7, 2023

CI works as expected if your branch has 19d023b commit.

ok cool thanks, bad luck for me i guess

@bader
Copy link
Contributor

bader commented Apr 7, 2023

Yes, but IMHO, CI should not behave like this. Anyway, I expect gatekeepers to request merge with the sycl branch in such cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants