Skip to content

test Profile fails to complete from 1.10.0-DEV.1147 onwards (source build on macOS ARM) #49572

@George9000

Description

@George9000

Starting at abeecee onwards, after a successful build, make testall gets stuck at test set Profile. This test set has run for up to an hour without completing. A Ctrl-C interrupt exits from make testall. If the Profile test set is run by itself, make test-Profile, the console become unresponsive, even to Ctrl-C, requiring a kill <pid> from a separate terminal. The issue was bisected from a known good make testall at 3db036e. The first bad commit and test failure was at abeecee. The latest HEAD, 219dc10, continues to have this issue.

test summary
julia> versioninfo()
Julia Version 1.10.0-DEV.1147
Commit abeecee71c (2023-04-28 20:59 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.6.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
  Threads: 5 on 8 virtual cores
Environment:
  JULIA_NUM_THREADS = 4
  JULIA_DEPOT_PATH = /Users/foo/applications/juliabuild

% gmake testall                                                                                                                                                                                                                                        19:25
Whitespace check found no issues.
cp /Users/foo/applications/julia/usr/lib/julia/sys.dylib /Users/foo/applications/julia/local.dylib
/Users/foo/applications/julia/usr/bin/julia -J /Users/foo/applications/julia/local.dylib -e 'true'
rm /Users/foo/applications/julia/local.dylib
    JULIA test/all
Running parallel tests with:
  nworkers() = 8
  nthreads() = 4
  Sys.CPU_THREADS = 8
  Sys.total_memory() = 64.000 GiB
  Sys.free_memory() = 41.985 GiB

Test                                        (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
.
.
.
Profile                                          (4) |        started at 2023-04-28T19:36:28.703
.
.
.
    Statistics                                    |      810                      810  30.2s
    FileWatching                                  |      562                      562  1m46.3s
    Pkg                                           |     3232              1      3233  12m59.7s
    Profile                                       |               1                 1      
    precompile                                    |               1                 1      
    SharedArrays                                  |               1                 1      
    threads                                       |               1                 1      
    Distributed                                   |               1                 1      
    stress                                        |               1                 1      
    FAILURE

The global RNG seed was 0x7510a9e4d9ccb5a01256c78602917ca8.

Error in testset Profile:
Interrupted
Error in testset precompile:
Interrupted
Error in testset SharedArrays:
Interrupted
Error in testset threads:
Interrupted
Error in testset Distributed:
Interrupted
Error in testset stress:
Interrupted
ERROR: LoadError: Test run finished with errors
in expression starting at /Users/foo/applications/julia/test/runtests.jl:93
┌ Warning: Forcibly interrupting busy workers
│   exception = rmprocs: pids [4] not terminated after 5.0 seconds.
└ @ Distributed ~/applications/julia/usr/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:1253
┌ Warning: rmprocs: process 1 not removed
└ @ Distributed ~/applications/julia/usr/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:1049
gmake[1]: *** [Makefile:25: all] Error 1
gmake: *** [Makefile:611: testall] Error 2



% gmake test-Profile                                                                                                                                                                                                                                   20:35
Whitespace check found no issues.
    JULIA test/Profile
Running parallel tests with:
  nworkers() = 1
  nthreads() = 4
  Sys.CPU_THREADS = 8
  Sys.total_memory() = 64.000 GiB
  Sys.free_memory() = 40.115 GiB

Test  (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
Profile    (1) |        started at 2023-04-28T20:40:16.513
gmake: *** [Makefile:618: test-Profile] Killed: 9



% gmake test-precompile                                                                                                                                                                                                                                20:57
Whitespace check found no issues.
    JULIA test/precompile
Running parallel tests with:
  nworkers() = 1
  nthreads() = 4
  Sys.CPU_THREADS = 8
  Sys.total_memory() = 64.000 GiB
  Sys.free_memory() = 39.769 GiB

Test  (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
precompile (1) |        started at 2023-04-28T20:58:16.664
precompile (1) |    28.84 |   0.06 |  0.2 |     679.60 |   425.94

Test Summary: | Pass  Total  Time
  Overall     |  239    239  0.4s
    SUCCESS


% gmake test-SharedArrays                                                                                                                                                                                                                              20:58
Whitespace check found no issues.
    JULIA test/SharedArrays
Running parallel tests with:
  nworkers() = 1
  nthreads() = 4
  Sys.CPU_THREADS = 8
  Sys.total_memory() = 64.000 GiB
  Sys.free_memory() = 39.765 GiB

Test  (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
SharedArrays(1) |        started at 2023-04-28T20:59:27.009
SharedArrays(1) |    12.06 |   0.26 |  2.2 |    1240.84 |   443.19

Test Summary: | Pass  Total  Time
  Overall     |  114    114  0.4s
    SUCCESS



% gmake test-threads                                                                                                                                                                                                                                   20:59
Whitespace check found no issues.
    JULIA test/threads
Running parallel tests with:
  nworkers() = 1
  nthreads() = 4
  Sys.CPU_THREADS = 8
  Sys.total_memory() = 64.000 GiB
  Sys.free_memory() = 39.756 GiB

Test  (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
threads    (1) |        started at 2023-04-28T21:00:03.588
Test Summary:                               |  Pass  Total   Time
threads_exec.jl with JULIA_NUM_THREADS == 1 | 40347  40347  18.3s
Test Summary:                               |  Pass  Total   Time
threads_exec.jl with JULIA_NUM_THREADS == 2 | 50249  50249  14.2s
Test Summary:                               |  Pass  Total   Time
threads_exec.jl with JULIA_NUM_THREADS == 4 | 50290  50290  14.0s
Test Summary:                               |  Pass  Total   Time
threads_exec.jl with JULIA_NUM_THREADS == 4 | 49700  49700  14.0s
threads    (1) |   100.38 |   0.00 |  0.0 |      31.94 |   310.11

Test Summary: | Pass  Total  Time
  Overall     |   40     40  0.4s
    SUCCESS



% gmake test-Distributed                                                                                                                                                                                                                               21:01
Whitespace check found no issues.
    JULIA test/Distributed
Running parallel tests with:
  nworkers() = 1
  nthreads() = 4
  Sys.CPU_THREADS = 8
  Sys.total_memory() = 64.000 GiB
  Sys.free_memory() = 39.577 GiB

Test  (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
Distributed(1) |        started at 2023-04-28T21:13:24.643
┌ Warning: Skipping a thread-unsafe test because `Threads.maxthreadid() > 1`
│   Threads.maxthreadid() = 5
└ @ Main ~/applications/julia/usr/share/julia/stdlib/v1.10/Distributed/test/distributed_exec.jl:161
┌ Warning: Skipping a thread-unsafe test because `Threads.maxthreadid() > 1`
│   Threads.maxthreadid() = 5
└ @ Main ~/applications/julia/usr/share/julia/stdlib/v1.10/Distributed/test/distributed_exec.jl:161
[ Info: SO_REUSEPORT is unsupported, skipping reuseport tests
      From worker 28:	
      From worker 28:	[69000] signal (15): Terminated: 15
      From worker 28:	in expression starting at none:0
      From worker 28:	__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
      From worker 28:	unknown function (ip: 0x0)
      From worker 28:	__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
      From worker 28:	unknown function (ip: 0x0)
      From worker 28:	__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
      From worker 28:	unknown function (ip: 0x0)
      From worker 28:	kevent at /usr/lib/system/libsystem_kernel.dylib (unknown line)
      From worker 28:	unknown function (ip: 0x0)
      From worker 28:	Allocations: 2548794 (Pool: 2546673; Big: 2121); GC: 4
┌ Warning: rmprocs: process 1 not removed
└ @ Distributed ~/applications/julia/usr/share/julia/stdlib/v1.10/Distributed/src/cluster.jl:1049
Distributed(1) |   120.69 |   0.00 |  0.0 |      27.86 |   317.78

Test Summary: | Pass  Total  Time
  Overall     |   15     15  0.4s
    SUCCESS


% gmake test-stress                                                                                                                                                                                                                                    21:15
Whitespace check found no issues.
    JULIA test/stress
Running parallel tests with:
  nworkers() = 1
  nthreads() = 4
  Sys.CPU_THREADS = 8
  Sys.total_memory() = 64.000 GiB
  Sys.free_memory() = 39.331 GiB

Test  (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
stress     (1) |        started at 2023-04-28T21:22:00.424
(libuv) kqueue(): Too many open files
stress     (1) |     8.26 |   0.00 |  0.0 |      31.85 |   307.91

Test Summary: | Pass  Total  Time
  Overall     |  118    118  0.4s
    SUCCESS

bisect summary
abeecee71c202f06c454963890e41c1f73522a3e is the first bad commit
commit abeecee71c202f06c454963890e41c1f73522a3e
Author: Diogo Netto <61364108+d-netto@users.noreply.github.com>
Date:   Fri Apr 28 17:59:33 2023 -0300

    Implement parallel marking (#48600)
    
    Using a work-stealing queue after Chase and Lev, optimized for
    weak memory models by Le et al.
    
    Default number of GC threads is half the number of compute threads.
    
    Co-authored-by: Gabriel Baraldi <baraldigabriel@gmail.com>
    Co-authored-by: Valentin Churavy <v.churavy@gmail.com>

 NEWS.md                                  |   4 +
 base/options.jl                          |   1 +
 base/threadingconstructs.jl              |   7 +
 doc/man/julia.1                          |   5 +
 doc/src/base/multi-threading.md          |   1 +
 doc/src/manual/command-line-interface.md |   1 +
 doc/src/manual/environment-variables.md  |   8 +
 doc/src/manual/multi-threading.md        |   9 +
 src/Makefile                             |   2 +-
 src/gc-debug.c                           |  64 +++---
 src/gc.c                                 | 381 +++++++++++++++++++++++--------
 src/gc.h                                 |  37 +--
 src/init.c                               |   1 +
 src/jl_exported_data.inc                 |   1 +
 src/jloptions.c                          |  11 +
 src/jloptions.h                          |   1 +
 src/julia.h                              |   1 +
 src/julia_threads.h                      |  10 +-
 src/options.h                            |   3 +
 src/partr.c                              |  33 ++-
 src/threading.c                          |  42 +++-
 src/threading.h                          |   1 +
 src/work-stealing-queue.h                | 102 +++++++++
 stdlib/Distributed/src/cluster.jl        |   5 +-
 test/choosetests.jl                      |   2 +-
 test/cmdlineargs.jl                      |  18 ++
 test/gc.jl                               |  18 ++
 test/gc/binarytree.jl                    |  53 +++++
 test/gc/linkedlist.jl                    |  21 ++
 test/gc/objarray.jl                      |  35 +++
 30 files changed, 722 insertions(+), 156 deletions(-)
 create mode 100644 src/work-stealing-queue.h
 create mode 100644 test/gc.jl
 create mode 100644 test/gc/binarytree.jl
 create mode 100644 test/gc/linkedlist.jl
 create mode 100644 test/gc/objarray.jl




% gmake test-Profile                                                                                                                                                 12:56
Whitespace check found no issues.
    JULIA test/Profile
Running parallel tests with:
  nworkers() = 1
  nthreads() = 4
  Sys.CPU_THREADS = 8
  Sys.total_memory() = 64.000 GiB
  Sys.free_memory() = 44.274 GiB

Test  (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
Profile    (1) |        started at 2023-04-29T12:56:52.860
gmake: *** [Makefile:618: test-Profile] Killed: 9

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions