Skip to content

Swift concurrency fails consistently under load #65537

Open
@brianm998

Description

@brianm998

Description
Even really simple swift concurrent tasks fail under load repeatedly

Steps to reproduce
Download and this really simple task group based looping example:
https://github.com/brianm998/SwiftConcurrentBug/blob/develop/Sources/main.swift
I've tested this on macos on the command line, the above repo is a barely modified stock swift cli app.
While failure is not 100%, it fails about half the time, more when the system is under load from other processes as well.

The code is really simple (slightly condensed from the repo above):

let maxNumberOfTasks = ProcessInfo.processInfo.activeProcessorCount*10
let maxNumber = 10000000

let dispatchGroup = DispatchGroup()
dispatchGroup.enter()
Task {
    await withTaskGroup(of: Void.self) { taskGroup in
        for _ in 0..<maxNumberOfTasks {
            taskGroup.addTask() {
                var base = 0
                for _ in 0..<maxNumber {
                    base += 1
                }
            }
        }
        
        await taskGroup.waitForAll()
    }
    dispatchGroup.leave()
}
dispatchGroup.wait()

All this code is doing is incrementing a variable 'base' which should be local to each Task that is running it. No other memory access should be taking place from within each Task.

I ran into this problem when developing https://github.com/brianm998/nighttime_timelapse_airplane_remover, and using the thread sanitizer to figure out why the crash was happening.
My solution has been to implement a 'limited' task group https://github.com/brianm998/nighttime_timelapse_airplane_remover/blob/develop/NtarCore/Sources/NtarCore/LimitedTaskGroup.swift
which mostly works, as long as the system isn't busy with something else too.
With this limited task group I can reduce the number of currently executing tasks to hit a given number, which is better than nothing.
However this bug still pops up frequently during my video processing workflow as my machine is oftentimes at 0.1% idle due to lots of processes competing for CPU time.

Expected behavior
I expect this code to always run to completion, and not to crash with memory corruption

Environment

  • Swift compiler version info
    Apple Swift version 5.8 (swiftlang-5.8.0.124.2 clang-1403.0.22.11.100)
    Target: x86_64-apple-macosx13.0

  • Xcode version info
    Xcode 14.3
    Build version 14E222b

  • Deployment target: macOS 10.15

Darwin Kernel Version 22.3.0: Mon Jan 30 20:42:11 PST 2023; root:xnu-8792.81.3~2/RELEASE_X86_64
swift-driver version: 1.75.2 Apple Swift version 5.8 (swiftlang-5.8.0.124.2 clang-1403.0.22.11.100)

I was able to catch it in lldb with the malloc_error_break set, as was suggested by a previous crash.

This was not hard to repro.

Sun Apr 30 13:36:07 PDT 2023
Garth:SwiftConcurrentBug brian$ lldb .build/debug/SwiftConcurrentBug
(lldb) target create ".build/debug/SwiftConcurrentBug"
Current executable set to '/Users/brian/git/SwiftConcurrentBug/.build/debug/SwiftConcurrentBug' (x86_64).
(lldb) b malloc_error_break
Breakpoint 1: where = libsystem_malloc.dylib`malloc_error_break, address = 0x00007ff80023b78d
(lldb) run
Process 33949 launched: '/Users/brian/git/SwiftConcurrentBug/.build/debug/SwiftConcurrentBug' (x86_64)
Hello, cruel world!
SwiftConcurrentBug(33949,0x70000b6a4000) malloc: *** error for object 0x600000c00210: pointer being freed was not allocated
SwiftConcurrentBug(33949,0x70000bc45000) malloc: Heap corruption detected, free list is damaged at 0x600000bfffd0
*** Incorrect guard value: 0
SwiftConcurrentBug(33949,0x70000b6a4000) malloc: *** set a breakpoint in malloc_error_break to debug
SwiftConcurrentBug(33949,0x70000c475000) malloc: Heap corruption detected, free list is damaged at 0x600000bfffd0
*** Incorrect guard value: 0
SwiftConcurrentBug(33949,0x70000bc45000) malloc: *** set a breakpoint in malloc_error_break to debug
SwiftConcurrentBug(33949,0x70000b392000) malloc: Heap corruption detected, free list is damaged at 0x600000bfffd0
*** Incorrect guard value: 0
Process 33949 stopped
* thread #9, queue = 'com.apple.root.user-initiated-qos.cooperative', stop reason = breakpoint 1.1
    frame #0: 0x00007ff805d6a78d libsystem_malloc.dylib`malloc_error_break
libsystem_malloc.dylib`malloc_error_break:
->  0x7ff805d6a78d <+0>: pushq  %rbp
    0x7ff805d6a78e <+1>: movq   %rsp, %rbp
    0x7ff805d6a791 <+4>: nop    
    0x7ff805d6a792 <+5>: nopl   (%rax)
  thread #20, queue = 'com.apple.root.user-initiated-qos.cooperative', stop reason = breakpoint 1.1
    frame #0: 0x00007ff805d6a78d libsystem_malloc.dylib`malloc_error_break
libsystem_malloc.dylib`malloc_error_break:
->  0x7ff805d6a78d <+0>: pushq  %rbp
    0x7ff805d6a78e <+1>: movq   %rsp, %rbp
    0x7ff805d6a791 <+4>: nop    
    0x7ff805d6a792 <+5>: nopl   (%rax)
Target 0: (SwiftConcurrentBug) stopped.
(lldb) bt
* thread #9, queue = 'com.apple.root.user-initiated-qos.cooperative', stop reason = breakpoint 1.1
  * frame #0: 0x00007ff805d6a78d libsystem_malloc.dylib`malloc_error_break
    frame #1: 0x00007ff805d5b5b8 libsystem_malloc.dylib`malloc_vreport + 761
    frame #2: 0x00007ff805d5e951 libsystem_malloc.dylib`malloc_report + 151
    frame #3: 0x00007ff8143dabef libswiftCore.dylib`protocol witness for Swift.Collection.subscript.read : (τ_0_0.Index) -> τ_0_0.Element in conformance Swift.Dictionary<τ_0_0, τ_0_1>.Keys : Swift.Collection in Swift with unmangled suffix ".resume.0" + 15
    frame #4: 0x00007ff81434a1ff libswiftCore.dylib`Swift.IndexingIterator.next() -> Swift.Optional<τ_0_0.Element> + 399
    frame #5: 0x0000000100006526 SwiftConcurrentBug`closure #1 in closure #1 in closure #1 in  at main.swift:18:21
    frame #6: 0x0000000100006f10 SwiftConcurrentBug`thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out A) at <compiler-generated>:0
    frame #7: 0x0000000100007040 SwiftConcurrentBug`partial apply for thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out A) at <compiler-generated>:0
(lldb) quit
Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y
Garth:SwiftConcurrentBug brian$ date
Sun Apr 30 13:39:18 PDT 2023
Garth:SwiftConcurrentBug brian$ swift --version
swift-driver version: 1.75.2 Apple Swift version 5.8 (swiftlang-5.8.0.124.2 clang-1403.0.22.11.100)
Target: x86_64-apple-macosx13.0
Garth:SwiftConcurrentBug brian$ uname -v
Darwin Kernel Version 22.3.0: Mon Jan 30 20:42:11 PST 2023; root:xnu-8792.81.3~2/RELEASE_X86_64

Metadata

Metadata

Assignees

No one assigned

    Labels

    ConcurrencуArea → standard library: The `Concurrency` module under the standard library umbrellaTaskGroupArea → standard library → Concurrency: The `TaskGroup` typebugA deviation from expected or documented behavior. Also: expected but undesirable behavior.crashBug: A crash, i.e., an abnormal termination of softwarememory corruptionstandard libraryArea: Standard library umbrellaswift 5.8

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions