Skip to content

[SYCL][E2E] Use lit env in subprocess calls #17200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 3, 2025
Merged

Conversation

ayylol
Copy link
Contributor

@ayylol ayylol commented Feb 26, 2025

Fixes #16982

subprocess calls for checking if certain features were available used the external shell environment instead of the internal one used for running lit tests, leading to failures in those checks for certain PRs. This changes those subprocess calls to use the internal lit environment instead.

@ayylol ayylol changed the title [Do not merge ~ testing][CI] Always set path to include sycl toolchain [CI] Always set path to include sycl toolchain Feb 27, 2025
@ayylol ayylol marked this pull request as ready for review February 27, 2025 17:44
@ayylol ayylol requested a review from a team as a code owner February 27, 2025 17:44
run: |
cp -r $GITHUB_WORKSPACE/build/install $GITHUB_WORKSPACE/toolchain
echo PATH=$GITHUB_WORKSPACE/toolchain/bin/:$PATH >> $GITHUB_ENV
echo LD_LIBRARY_PATH=$GITHUB_WORKSPACE/toolchain/lib/:$LD_LIBRARY_PATH >> $GITHUB_ENV
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must not be necessary for compilation, only for execution. I'm not convinced the error is here still.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayylol Can you link a workflow run with the error we're trying to fix? I couldn't find it in the workflow run in the linked issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayylol Can you link a workflow run with the error we're trying to fix? I couldn't find it in the workflow run in the linked issue.

The error is a bit opaque, since its based on a compilation we do as a check to see if dev_kits/preview breaking change features should be available. We don't normally report anything there since in theory if those compilations fail it is because those features are not available.

https://github.com/intel/llvm/actions/runs/13310780044/job/37172558738?pr=16992#step:25:2642
Here was a run with extra diagnostics that shows an example of an error

/usr/bin/ld: /__w/llvm/llvm/toolchain/bin/../lib/libsycl.so: undefined reference to `urEnqueueCommandBufferExp@LIBUR_LOADER_0.12'

urEnqueueCommandBufferExp is related to the pr that was having issues.

The main thing to note is the reported global features list in the build stage, as opposed to the prebuilt execution stages. In the build stage we are missing level_zero_dev_kit, cuda_dev_kit, and preview-breaking-changes-mode due to compilation failures like the one above in the checks to add those features. In the prebuilt execution stages however these features are available, since the compilations dont fail due to setting the PATH/LD_LIBRARY_PATH prior to running the tests, which leads to those tests being supported and failing due to them not being built.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is LD_LIBRARY_PATH affecting linking? Shouldn't it be LIB/LIBRARY_PATH or something like that? I'd also half expect compiler driver to throw in proper -L<...> on its own...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aelovikov-intel @sarnex FYI, changed the approach to fix this at the lit level instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry can you explain the new approach? Is it that we lose envvars set in the shell calling lit when doing subprocess calls within lit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running lit tests we use an environment that is not just that of the external shell, which is stored in config.environment. We add the toolchain bin/lib folders to PATH/LD_LIBRARY_PATH to that variable inside of lit.cfg.py.

When we make the subprocess calls to check if we can compile the l0/cuda/preview breaking changes files we are actually using the external environment rather than the one stored in that variable, which in the case for the build stage on CI, is missing the toolchain/lib folder from LD_LIBRARY_PATH.

With these changes we would also be using that internal environment for the subprocess calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok that makes sense to me, thx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight change, doing the same as described but with a helper that sets and restores the environment rather than doing string manipulation to create an env ... command

@ayylol ayylol requested a review from a team as a code owner February 28, 2025 14:46
@ayylol ayylol requested a review from steffenlarsen February 28, 2025 14:46
@ayylol ayylol changed the title [CI] Always set path to include sycl toolchain [SYCL][E2E] Use lit env in subprocess calls Feb 28, 2025
@ayylol
Copy link
Contributor Author

ayylol commented Feb 28, 2025

Further investigated this, turns out the issue stems from the subprocess calls in lit.cfg.py not using the same environment as when the lit tests are ran. I figure it would be preferable to fix this there, rather than at the CI level, so I've changed this pr to do that instead.

@Seanst98 I tested this locally and it seems to also resolve the issue, but I think we definitely need to check on CI with a pr that had the issues, would you mind using these changes and removing the previous CI changes from your draft pr to check? I briefly tried on my own but I ran into some merge conflicts on UR related files...

@Seanst98
Copy link
Contributor

@Seanst98 I tested this locally and it seems to also resolve the issue, but I think we definitely need to check on CI with a pr that had the issues, would you mind using these changes and removing the previous CI changes from your draft pr to check? I briefly tried on my own but I ran into some merge conflicts on UR related files...

I've rebased on your latest changes and CI is running: #17221

Let me know if you need me to re-run or try newer changes.

@Seanst98
Copy link
Contributor

I've rebased on your latest changes and CI is running: #17221

Things are looking positive: https://github.com/intel/llvm/actions/runs/13590791091/job/37997492624?pr=17221

Thanks a lot!

@ayylol
Copy link
Contributor Author

ayylol commented Mar 3, 2025

@intel/llvm-gatekeepers this is ready to merge.

First run had an unrelated flaky fail on AMD, made an issue here: #17276

Also see https://github.com/intel/llvm/actions/runs/13595920829/job/38012683050?pr=17254 (these changes on top of a pr that exhibited the issues this is trying to fix)

@sarnex sarnex merged commit 106668a into intel:sycl Mar 3, 2025
21 checks passed
@ayylol ayylol deleted the ci-toolchain branch March 7, 2025 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

run_prebuilt_e2e_tests CI jobs fail in cases with UR API changes
4 participants