Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GHA] Uplift Linux GPU RT version to 23.17.26241.22 #10087

Merged
merged 2 commits into from
Jun 29, 2023

Conversation

bb-sycl
Copy link
Contributor

@bb-sycl bb-sycl commented Jun 27, 2023

Scheduled drivers uplift

@bb-sycl bb-sycl requested a review from a team as a code owner June 27, 2023 03:17
@bb-sycl bb-sycl temporarily deployed to aws June 27, 2023 03:25 — with GitHub Actions Inactive
@bb-sycl bb-sycl temporarily deployed to aws June 27, 2023 04:00 — with GitHub Actions Inactive
@bader
Copy link
Contributor

bader commented Jun 27, 2023

@aelovikov-intel, FYI.

@aelovikov-intel
Copy link
Contributor

@aelovikov-intel, FYI.

I somewhat doubt it's the CI because the testing in #10054 finished successfully. Is it possible the drivers are bad/incompatible?

@bader
Copy link
Contributor

bader commented Jun 27, 2023

@aelovikov-intel, FYI.

I somewhat doubt it's the CI because the testing in #10054 finished successfully. Is it possible the drivers are bad/incompatible?

From the logs:

Traceback (most recent call last):
  File "/opt/get_release.py", line 19, in <module>
    release = get_release_by_tag(repo, tag)
  File "/opt/get_release.py", line 6, in get_release_by_tag
    release = urlopen("https://api.github.com/repos/" + repo + "/releases/tags/" + tag).read()
  File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/usr/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: rate limit exceeded

@bb-sycl bb-sycl temporarily deployed to aws June 27, 2023 16:21 — with GitHub Actions Inactive
@bb-sycl bb-sycl temporarily deployed to aws June 27, 2023 17:08 — with GitHub Actions Inactive
@aelovikov-intel
Copy link
Contributor

Re-run doesn't have that issue.

@bader
Copy link
Contributor

bader commented Jun 27, 2023

Re-run doesn't have that issue.

Okay, but we should think about preventive cure for urllib.error.HTTPError: HTTP Error 403: rate limit exceeded error. Just let it happen and wait + re-run, doesn't sound like a good solution.

@bader
Copy link
Contributor

bader commented Jun 28, 2023

Re-run doesn't have that issue.

Okay, but we should think about preventive cure for urllib.error.HTTPError: HTTP Error 403: rate limit exceeded error. Just let it happen and wait + re-run, doesn't sound like a good solution.

FYI: I found a bug report for that issue - #8462.

@bader bader merged commit 46046bd into sycl Jun 29, 2023
@bader bader deleted the ci/update_gpu_driver-linux-23.17.26241.22 branch June 29, 2023 05:18
@dm-vodopyanov
Copy link
Contributor

Post-commit failure:

Unexpectedly Passed Tests (1):
  SYCL :: ESIMD/hardware_dispatch.cpp

For these automated Linux GPU driver updates we could run broader scope of testing on pre-commit step automatically.

@dm-vodopyanov
Copy link
Contributor

Fix: #10132

@aelovikov-intel
Copy link
Contributor

For these automated Linux GPU driver updates we could run broader scope of testing on pre-commit step automatically.

I think that could be as easy as making

lts_config: "hip_amdgpu;lin_intel;esimd_emu;cuda_aws;win_l0_gen12"
conditional on files changed (i.e., don't use "fused" lin_intel task and prefer separate l0_gen9, ocl_gen9 and ocl_x64).

Condition could probably be based on contains(needs.detect_changes.outputs.filters, 'drivers_and_configs') - see

ci:
- .github/workflows/**
- devops/*/**
# devops/* contains config files, including drivers versions.
# Allow them to be tested in pre-commit.
drivers_and_configs: &drivers_and_configs
- devops/*
test_build:
- *sycl
- *drivers_and_configs
.

dm-vodopyanov added a commit that referenced this pull request Jun 29, 2023
@bader
Copy link
Contributor

bader commented Jun 29, 2023

@dm-vodopyanov, @aelovikov-intel, could you clarify what testing is missing in this PR pre-commit? Don't we run ESIMD tests on GPU in all pre-commits?

@aelovikov-intel
Copy link
Contributor

https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/README.md?plain=1#L34-L44

We don't test for XFAIL in pre-commit.

@bader
Copy link
Contributor

bader commented Jun 29, 2023

https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/README.md?plain=1#L34-L44

We don't test for XFAIL in pre-commit.

This text doesn't say if there is any difference in XFAIL semantics between post- and pre- commits. Why does the test fail in post-commit?

@aelovikov-intel
Copy link
Contributor

Because in pre-commit we run with

"targets": "ext_oneapi_level_zero:gpu;opencl:gpu;opencl:cpu",

and in post-commit with

"targets": "ext_oneapi_level_zero:gpu",

@bader
Copy link
Contributor

bader commented Jun 29, 2023

So just having additional devices in the system changes the status of the test? Do you understand how broken this system is? How are developers supposed to debug issues like this?

@aelovikov-intel
Copy link
Contributor

So just having additional devices in the system changes the status of the test?

No.

aelovikov-intel added a commit to aelovikov-intel/llvm that referenced this pull request Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants