-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][E2E] Use lit env in subprocess calls #17200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
run: | | ||
cp -r $GITHUB_WORKSPACE/build/install $GITHUB_WORKSPACE/toolchain | ||
echo PATH=$GITHUB_WORKSPACE/toolchain/bin/:$PATH >> $GITHUB_ENV | ||
echo LD_LIBRARY_PATH=$GITHUB_WORKSPACE/toolchain/lib/:$LD_LIBRARY_PATH >> $GITHUB_ENV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must not be necessary for compilation, only for execution. I'm not convinced the error is here still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ayylol Can you link a workflow run with the error we're trying to fix? I couldn't find it in the workflow run in the linked issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ayylol Can you link a workflow run with the error we're trying to fix? I couldn't find it in the workflow run in the linked issue.
The error is a bit opaque, since its based on a compilation we do as a check to see if dev_kits/preview breaking change features should be available. We don't normally report anything there since in theory if those compilations fail it is because those features are not available.
https://github.com/intel/llvm/actions/runs/13310780044/job/37172558738?pr=16992#step:25:2642
Here was a run with extra diagnostics that shows an example of an error
/usr/bin/ld: /__w/llvm/llvm/toolchain/bin/../lib/libsycl.so: undefined reference to `urEnqueueCommandBufferExp@LIBUR_LOADER_0.12'
urEnqueueCommandBufferExp
is related to the pr that was having issues.
The main thing to note is the reported global features list in the build stage, as opposed to the prebuilt execution stages. In the build stage we are missing level_zero_dev_kit
, cuda_dev_kit
, and preview-breaking-changes-mode
due to compilation failures like the one above in the checks to add those features. In the prebuilt execution stages however these features are available, since the compilations dont fail due to setting the PATH/LD_LIBRARY_PATH prior to running the tests, which leads to those tests being supported and failing due to them not being built.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is LD_LIBRARY_PATH affecting linking? Shouldn't it be LIB/LIBRARY_PATH or something like that? I'd also half expect compiler driver to throw in proper -L<...>
on its own...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aelovikov-intel @sarnex FYI, changed the approach to fix this at the lit level instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry can you explain the new approach? Is it that we lose envvars set in the shell calling lit when doing subprocess calls within lit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running lit tests we use an environment that is not just that of the external shell, which is stored in config.environment
. We add the toolchain bin/lib folders to PATH/LD_LIBRARY_PATH to that variable inside of lit.cfg.py
.
When we make the subprocess calls to check if we can compile the l0/cuda/preview breaking changes files we are actually using the external environment rather than the one stored in that variable, which in the case for the build stage on CI, is missing the toolchain/lib
folder from LD_LIBRARY_PATH
.
With these changes we would also be using that internal environment for the subprocess calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok that makes sense to me, thx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight change, doing the same as described but with a helper that sets and restores the environment rather than doing string manipulation to create an env ...
command
Further investigated this, turns out the issue stems from the subprocess calls in @Seanst98 I tested this locally and it seems to also resolve the issue, but I think we definitely need to check on CI with a pr that had the issues, would you mind using these changes and removing the previous CI changes from your draft pr to check? I briefly tried on my own but I ran into some merge conflicts on UR related files... |
I've rebased on your latest changes and CI is running: #17221 Let me know if you need me to re-run or try newer changes. |
Things are looking positive: https://github.com/intel/llvm/actions/runs/13590791091/job/37997492624?pr=17221 Thanks a lot! |
@intel/llvm-gatekeepers this is ready to merge. First run had an unrelated flaky fail on AMD, made an issue here: #17276 Also see https://github.com/intel/llvm/actions/runs/13595920829/job/38012683050?pr=17254 (these changes on top of a pr that exhibited the issues this is trying to fix) |
Fixes #16982
subprocess calls for checking if certain features were available used the external shell environment instead of the internal one used for running lit tests, leading to failures in those checks for certain PRs. This changes those subprocess calls to use the internal lit environment instead.