Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More flexible test affinity setting #44677

Merged
merged 4 commits into from
Mar 21, 2022
Merged

More flexible test affinity setting #44677

merged 4 commits into from
Mar 21, 2022

Conversation

staticfloat
Copy link
Member

When running on a machine with cpusets applied, we are unable to
assign CPU affinity to CPUs 1 and 2; we may be locked to CPUs 9-16, for
example. So we must inspect what our current cpumask is, and from that
select CPUs that we can safely assign affinity to in our tests.

@staticfloat staticfloat requested a review from tkf March 19, 2022 03:42
@staticfloat staticfloat added the backport 1.8 Change should be backported to release-1.8 label Mar 19, 2022
@DilumAluthge DilumAluthge added test This change adds or pertains to unit tests bugfix This change fixes an existing bug ci Continuous integration multithreading Base.Threads and related functionality labels Mar 19, 2022
@DilumAluthge DilumAluthge requested a review from Keno March 19, 2022 03:52
@DilumAluthge DilumAluthge changed the title More flexibly test affinity setting More flexible test affinity setting Mar 19, 2022
@DilumAluthge DilumAluthge added merge me PR is reviewed. Merge when all tests are passing backport 1.8 Change should be backported to release-1.8 and removed backport 1.8 Change should be backported to release-1.8 labels Mar 19, 2022
Copy link
Member

@tkf tkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction LGTM. I think we want to dedup get_cpumask.

test/threads.jl Outdated Show resolved Hide resolved
test/threads.jl Outdated Show resolved Hide resolved
test/threads.jl Outdated Show resolved Hide resolved
@tkf tkf removed the merge me PR is reviewed. Merge when all tests are passing label Mar 19, 2022
@DilumAluthge DilumAluthge added the merge me PR is reviewed. Merge when all tests are passing label Mar 20, 2022
staticfloat and others added 2 commits March 19, 2022 22:50
When running on a machine with `cpusets` applied, we are unable to
assign CPU affinity to CPUs 1 and 2; we may be locked to CPUs 9-16, for
example.  So we must inspect what our current cpumask is, and from that
select CPUs that we can safely assign affinity to in our tests.
@DilumAluthge DilumAluthge removed the merge me PR is reviewed. Merge when all tests are passing label Mar 20, 2022
@DilumAluthge
Copy link
Member

@staticfloat @tkf Any idea why the tests are failing?

@DilumAluthge DilumAluthge requested a review from tkf March 20, 2022 04:33
@tkf
Copy link
Member

tkf commented Mar 20, 2022

I think it fails on macos because libuv does not support affinity on macos (so it makes sense). linux32 fails with OOM so maybe it's an accident? The failure on win32 looks like a bug? (https://build.julialang.org/#/builders/42/builds/3572/steps/5/logs/stdio)

Error During Test at C:\buildbot\worker-tabularasa\tester_win32\build\share\julia\test\subarray.jl:291
  Got exception outside of a @test
  ArgumentError: colons must be converted by to_indices(...)
  Stacktrace:
    [1] to_index(#unused#::Colon)
      @ Base .\indices.jl:299
    [2] to_index(A::SubArray{Int32, 3, Array{Int32, 3}, Tuple{StepRange{Int32, Int32}, Base.Slice{Base.OneTo{Int32}}, Base.Slice{Base.OneTo{Int32}}}, false}, i::Function)
      @ Base .\indices.jl:277
    [3] to_indices
      @ .\indices.jl:333 [inlined]
    [4] to_indices
      @ .\indices.jl:324 [inlined]
    [5] view(::SubArray{Int32, 3, Array{Int32, 3}, Tuple{StepRange{Int32, Int32}, Base.Slice{Base.OneTo{Int32}}, Base.Slice{Base.OneTo{Int32}}}, false}, ::Function, ::SubArray{Int32, 2, UnitRange{Int32}, Tuple{Matrix{Int32}}, false}, ::SubArray{Int32, 2, UnitRange{Int32}, Tuple{Matrix{Int32}}, false})
      @ Base .\subarray.jl:176
    [6] runsubarraytests(::Any, ::Function, ::Vararg{Any})
      @ Main.Test13Main_subarray C:\buildbot\worker-tabularasa\tester_win32\build\share\julia\test\subarray.jl:207
    [7] runviews(SB::SubArray{Int32, 3, Array{Int32, 3}, Tuple{StepRange{Int32, Int32}, Base.Slice{Base.OneTo{Int32}}, Base.Slice{Base.OneTo{Int32}}}, false}, indexN::Tuple{Int32, Colon, UnitRange{Int32}, Vector{Int32}, Array{Int32, 0}, SubArray{Int32, 2, UnitRange{Int32}, Tuple{Matrix{Int32}}, false}}, indexNN::Tuple{Int32, Colon, UnitRange{Int32}, Vector{Int32}, Array{Int32, 0}, SubArray{Int32, 2, UnitRange{Int32}, Tuple{Matrix{Int32}}, false}}, indexNNN::Tuple{Int32, Colon, UnitRange{Int32}, Vector{Int32}, Array{Int32, 0}, SubArray{Int32, 3, UnitRange{Int32}, Tuple{Array{Int32, 3}}, false}})
      @ Main.Test13Main_subarray C:\buildbot\worker-tabularasa\tester_win32\build\share\julia\test\subarray.jl:230
    [8] macro expansion
      @ C:\buildbot\worker-tabularasa\tester_win32\build\share\julia\test\subarray.jl:313 [inlined]
    [9] macro expansion
      @ C:\buildbot\worker-tabularasa\tester_win32\build\share\julia\stdlib\v1.9\Test\src\Test.jl:1433 [inlined]
   [10] top-level scope
      @ C:\buildbot\worker-tabularasa\tester_win32\build\share\julia\test\subarray.jl:291
   [11] include
      @ .\Base.jl:429 [inlined]
   [12] macro expansion
      @ C:\buildbot\worker-tabularasa\tester_win32\build\share\julia\test\testdefs.jl:24 [inlined]
   [13] macro expansion
      @ C:\buildbot\worker-tabularasa\tester_win32\build\share\julia\stdlib\v1.9\Test\src\Test.jl:1357 [inlined]
   [14] macro expansion
      @ C:\buildbot\worker-tabularasa\tester_win32\build\share\julia\test\testdefs.jl:23 [inlined]
...

@DilumAluthge
Copy link
Member

The colons must be converted by error is #44635. It's sporadic, so let's try rerunning those jobs.

@DilumAluthge
Copy link
Member

What's the best way to deal with macOS?

@tkf
Copy link
Member

tkf commented Mar 20, 2022

I just pushed my shot at it (3b666df)

Copy link
Member

@tkf tkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to go once CI is happy

@DilumAluthge DilumAluthge added the merge me PR is reviewed. Merge when all tests are passing label Mar 20, 2022
@@ -96,9 +98,10 @@ end
const AFFINITY_SUPPORTED = (Sys.islinux() || Sys.iswindows()) && !running_under_rr()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can potentially drop the rr check, since it will be implied (it sets an affinity mask to 1 cpu)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see how it goes: #44693

@staticfloat staticfloat merged commit 32b1305 into master Mar 21, 2022
@staticfloat staticfloat deleted the sf/cpusets_resilience branch March 21, 2022 20:33
@giordano giordano removed the merge me PR is reviewed. Merge when all tests are passing label Mar 21, 2022
@KristofferC KristofferC mentioned this pull request Mar 23, 2022
22 tasks
KristofferC pushed a commit that referenced this pull request Mar 23, 2022
* More flexibly test affinity setting

When running on a machine with `cpusets` applied, we are unable to
assign CPU affinity to CPUs 1 and 2; we may be locked to CPUs 9-16, for
example.  So we must inspect what our current cpumask is, and from that
select CPUs that we can safely assign affinity to in our tests.

* Import `uv_thread_getaffinity` from `print_process_affinity.jl`

* Call `uv_thread_getaffinity` only if `AFFINITY_SUPPORTED`

* Fix a syntax error

Co-authored-by: Takafumi Arakaki <aka.tkf@gmail.com>
(cherry picked from commit 32b1305)
@KristofferC KristofferC removed the backport 1.8 Change should be backported to release-1.8 label Mar 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug ci Continuous integration multithreading Base.Threads and related functionality test This change adds or pertains to unit tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants