Skip to content

Async git repository opening #8721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 9, 2025

Conversation

plemarquand
Copy link
Contributor

@plemarquand plemarquand commented May 24, 2025

The RepositoryProvider protocol's open method is defined as synchronous. Convert this to async to avoid concurrency workarounds in clients that might be prone to deadlocks.

This involved porting RepositoryManager.lookup to structured concurrency. The method has some concurrency constraints that are preserved:

  • There is a limit (maxConcurrentOperations) on how many simultaneous lookups can be performed concurrently.
  • If a lookup is requested and one is in flight for the same RepositorySpecifier, the in flight request completes before the new one is started.

The `RepositoryProvider` protocol's `open` method is defined as
synchronous. Convert this to `async` to avoid concurrency workarounds in
clients that might be prone to deadlocks.

This involved porting `RepositoryManager.lookup` to structured
concurrency. The method has some concurrency constraints that are
preserved:

- There is a limit (`maxConcurrentOperations`) on how many simultaneous
  lookups can be performed concurrently.
- If a lookup is requested and one is in flight for the same
  `RepositorySpecifier`, the in flight request completes before the new
  one is started.

This PR also moves away from using a `DispatchQueue` to make delegate
calls, instead opting to use a `Task` that calls through an `actor`
based proxy to the underlying delegate. Gating these calls through an
`actor` allows us to ensure the calls are processed sequentially as they
were with the `DispatchQueue`.

private func waitIfNeeded() async {
if activeTasks >= concurrentTasks {
await withCheckedContinuation { continuation in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a withTaskCancellationHandler here and potentially a withTaskPriorityEscalationHandler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FranzBusch would it be sufficient to propagate cancellation by wrapping the waitingTasks.append in a Task.isCancelled check?

if !Task.isCancelled {
    waitingTasks.append(continuation)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that's not enough since the task can be cancelled after the continuation has been appended. In general, to be structured concurrency and cancellation compliant whenever you create a continuation you need to wrap it into a withTaskCancellationHandler. In this case you probably want to remove that continuation from the waitingTasks and resume it with a CancellationError.

Since you are currently using an actor this won't work easily without an unstructured task. So I would recommend you to switch over to a class and protect the state with a Mutex.

Copy link
Contributor Author

@plemarquand plemarquand May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. Unfortunately SwiftPM's supported platforms is set to .macOS(.v13) and Mutex is only available on >= .v15, so I'll have to use an NSLock as we do elsewhere in SwiftPM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we can't bump to macOS 15.0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We cannot require macOS 15 at this time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the AsyncOperationQueue to respect cancellation of the parent task. Incorporating a Mutex polyfill can come later and bulk replace the state: T/stateLock: NSLock pattern used throughout SwiftPM.

self.waitingTasksLock.withLock {
if let taskIndex = self.waitingTasks.firstIndex(where: { $0.0 == taskId }) {
let task = self.waitingTasks.remove(at: taskIndex)
task.1.resume(throwing: CancellationError())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resuming continuations while a lock is held is dangerous and can lead to deadlocks. Note withTaskCancellationHandler's documentation:

Cancellation handlers which acquire locks must take care to avoid deadlock. The cancellation handler may be invoked while holding internal locks associated with the task or other tasks. Other operations on the task, such as resuming a continuation, may acquire these same internal locks. Therefore, if a cancellation handler must acquire a lock, other code should not cancel tasks or resume continuations while holding that lock.

I think here you can return the continuation out of the lock and then resume it after releasing the lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I've fixed this up in two places

Comment on lines 132 to 151
try await withTaskCancellationHandler {
try await withCheckedThrowingContinuation { (continuation: CheckedContinuation<Void, any Error>) in
if !Task.isCancelled {
waitingTasksLock.withLock {
waitingTasks.append((taskId, continuation))
}
} else {
continuation.resume(throwing: CancellationError())
}
}
} onCancel: {
// If the parent task is cancelled then we need to manually handle resuming the
// continuation for the waiting task with a `CancellationError`.
self.waitingTasksLock.withLock {
if let taskIndex = self.waitingTasks.firstIndex(where: { $0.0 == taskId }) {
let task = self.waitingTasks.remove(at: taskIndex)
task.1.resume(throwing: CancellationError())
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakepetroules raised a good point with resuming while a lock is held. In general, never do any outcalls while a lock is held. FWIW, we have run into exactly this deadlock before.

Now the first part of this code is also not 100% correct. There is subtle race that can happen in this code in this scenario:

  1. Task is running not cancelled
  2. You are checking Task.isCancelled returns false
  3. Something is canceling your task
  4. The task cancellation handler runs and acquires the lock. Tries to remove the continuation but it hasn't been stored yet so resumes nothing
  5. The code after the Task.isCancelled runs and stores the continuation

In this case you now stored a continuation of a cancelled task and the task cancellation handler already ran. What you have to do instead is to model a tri-state for a waiter:

enum Waiter {
  case creating
  case waiting(Continuation)
  case cancelled
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, thanks. I've adopted a tri-state and made sure that the continuation always resumes with a cancellation error when appropriate.

@plemarquand plemarquand requested a review from jakepetroules May 27, 2025 15:44
@plemarquand
Copy link
Contributor Author

@FranzBusch @jakepetroules I've addressed comments and added some tests for AsyncOperationQueue. Could you take another look?

@plemarquand plemarquand requested a review from FranzBusch May 29, 2025 17:55

deinit {
waitingTasksLock.withLock {
if !waitingTasks.filter({ $0.continuation != nil }).isEmpty {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it unexpected if there is anything at all in waitingTasks, not just those with a non-nil continuation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're right, I've updated to revert this check.

Comment on lines 152 to 155
let taskId = ID()
waitingTasksLock.withLock {
waitingTasks.append(.creating(taskId))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move this into the above lock and return an optional UUID. This way we don't need to acquire the lock more than once for the initial setup

fileprivate typealias WaitingContinuation = CheckedContinuation<Void, any Error>

private let concurrentTasks: Int
private var activeTasks: Int = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this counter at all or can we just use waitingTasks.count? Keeping the two in sync can be tricky and I had to triple check that the code does it correctly. That requires us to change the WaitingTask to be just WorkTask with a new case but IMO that would make this code a lot clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but it that would mean the waitingTasks would no longer be able to be modelled as a queue since the actively running tasks would be contained within it.

I think an OrderedDictionary could work instead, but runs in to the swift-collections in swift-build question.


// If the task was cancelled in between creating the task cancellation handler and aquiring the lock,
// we should resume the continuation with a `CancellationError`.
if case .cancelled = waitingTasks[index] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of writing an if case I would prefer if we exhaustively switch over the returned task here similar to how you did it in the onCancel case. I always recommend doing that since it makes sure every case is properly handled.

case .waiting:
activeTasks -= 1
// Begin the next waiting task
return waitingTasks.remove(at: 0).continuation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are operating under FIFO here it is worth using a Deque instead of an array since this remove(at:) is going to reallocate the entire array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to keep in mind this will make its way back in to swift-build, which has no dependency on swift-collections and looks to be trying to keep its dependencies minimal. @jakepetroules is that accurate, or would swift-build accept a swift-collections dependency for this?

try await withCheckedThrowingContinuation { (continuation: WaitingContinuation) -> Void in
let action: TaskAction? = waitingTasksLock.withLock {
guard let index = waitingTasks.firstIndex(where: { $0.id == taskId }) else {
// If the task was cancelled in onCancelled it will have been removed from the waiting tasks list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not true right? We are always going to get an index by just looking at the code in onCancel. Either we find a .cancelled case or we have run before. There is no way for onCancel and this check to interleave in the way you describe here. Now, what can interleave is the signalCompletion that also appears to remove cancelled() tasks. Can we update this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've tightend up this comment to reflect how this guard code would actually get called.

@plemarquand
Copy link
Contributor Author

plemarquand commented Jun 2, 2025

@FranzBusch I've tried to simplify the code as suggested by removing the counter state. It did complicate the code in signalCompletion a bit because we can't start processing immediately from the head of the collection.

Copy link
Member

@FranzBusch FranzBusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one small nit

@plemarquand
Copy link
Contributor Author

Thank you for your tireless reviews @FranzBusch!

@plemarquand plemarquand enabled auto-merge (squash) June 3, 2025 19:14
@plemarquand plemarquand merged commit 5489d55 into swiftlang:main Jun 9, 2025
6 checks passed
@plemarquand plemarquand deleted the async-repository-open branch June 9, 2025 16:12
plemarquand added a commit to plemarquand/swift-package-manager that referenced this pull request Jun 20, 2025
The `RepositoryProvider` protocol's `open` method is defined as
synchronous. Convert this to `async` to avoid concurrency workarounds in
clients that might be prone to deadlocks.

This involved porting `RepositoryManager.lookup` to structured
concurrency. The method has some concurrency constraints that are
preserved:

- There is a limit (`maxConcurrentOperations`) on how many simultaneous
lookups can be performed concurrently.
- If a lookup is requested and one is in flight for the same
`RepositorySpecifier`, the in flight request completes before the new
one is started.
plemarquand added a commit that referenced this pull request Jun 20, 2025
The `RepositoryProvider` protocol's `open` method is defined as
synchronous. Convert this to `async` to avoid concurrency workarounds in
clients that might be prone to deadlocks.

This involved porting `RepositoryManager.lookup` to structured
concurrency. The method has some concurrency constraints that are
preserved:

- There is a limit (`maxConcurrentOperations`) on how many simultaneous
lookups can be performed concurrently.
- If a lookup is requested and one is in flight for the same
`RepositorySpecifier`, the in flight request completes before the new
one is started.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants