Skip to content

AsyncBufferSequence.Buffer Improvements #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 10, 2025

Conversation

iCharlesHu
Copy link
Contributor

@iCharlesHu iCharlesHu commented May 19, 2025

This PR enhances the usability of AsyncBufferSequence.Buffer and the closure-based run() in several ways:

  • Introduced AsyncBufferSequence.LineSequence, which lets developers stream standard output and error line by line as String. This is the recommended approach for converting AsyncBufferSequence.Buffer to String because converting each buffer individually might not always work due to potential breaks in the String.
  • Updated the implementation of AsyncBufferSequence.Buffer.bytes by basing it on DispatchData.Slice instead of DispatchData. This change ensures that AsyncBufferSequence.Buffer holds a contiguous chunk of bytes, which is necessary for the .bytes API.
  • Removed the unnecessary availability tags for the closure-based run().
  • Made several improvements to the documentation.

@iCharlesHu iCharlesHu requested a review from parkera May 19, 2025 18:55
@iCharlesHu iCharlesHu marked this pull request as ready for review May 19, 2025 18:55
@iCharlesHu iCharlesHu force-pushed the charles/buffer-improvements branch from 36e0e21 to f261182 Compare May 19, 2025 21:29
@iCharlesHu iCharlesHu changed the title Add .stringValue(), .arrayValue(), and .dataValue() to Buffer Introduce custom initializers to initialize String, Data, and Array from AsyncBufferSequence.Buffer May 19, 2025
@iCharlesHu iCharlesHu force-pushed the charles/buffer-improvements branch from f261182 to 75b804c Compare May 19, 2025 22:38
@@ -267,7 +267,7 @@ public func run<Result, Error: OutputProtocol>(
environment: Environment = .inherit,
workingDirectory: FilePath? = nil,
platformOptions: PlatformOptions = PlatformOptions(),
error: Error,
error: Error = .discarded,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part of the same change?

let span = RawSpan(_unsafeElements: ptr)
return _overrideLifetime(of: span, to: self)
case .array(let array):
let span = array.span.bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swift 6.2 on Windows: error: value of type '[UInt8]' has no member 'span'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that's strange... should be available on 6.2. Let me check

@iCharlesHu iCharlesHu marked this pull request as draft May 21, 2025 04:09
@iCharlesHu
Copy link
Contributor Author

In stead of directly expose String initializer, we decided to introduce AsyncBufferSequence.LineSequence which transforms AsyncBufferSequence to AsyncSequence<String> to allow line-by-line streaming

@iCharlesHu iCharlesHu changed the title Introduce custom initializers to initialize String, Data, and Array from AsyncBufferSequence.Buffer AsyncBufferSequence.Buffer Improvements May 30, 2025
@iCharlesHu iCharlesHu marked this pull request as ready for review May 30, 2025 19:27
@iCharlesHu iCharlesHu requested a review from itingliu May 30, 2025 19:28
public func run<Result, Input: InputProtocol, Error: OutputProtocol>(
_ executable: Executable,
arguments: Arguments = [],
environment: Environment = .inherit,
workingDirectory: FilePath? = nil,
platformOptions: PlatformOptions = PlatformOptions(),
input: Input = .none,
error: Error,
error: Error = .discarded,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was accidentally left out from the ~Copyable PR. I discovered it as I was testing closure based runs

return data
let createdBuffers = Buffer.createFrom(data)
// Most (all?) cases there should be only one buffer
// because DispatchData are motsly contiguous
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// because DispatchData are motsly contiguous
// because DispatchData are mostly contiguous

internal var startIndex: Int { self.bytes.startIndex }
internal var endIndex: Int { self.bytes.endIndex }

internal init(bytes: UnsafeBufferPointer<UInt8>) {
Copy link
Contributor

@itingliu itingliu Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this missing a startIndex argument? without that,

internal var startIndex: Int { self.bytes.startIndex }

This will always return 0 since UnsafeBufferPointer.startIndex is always 0.

And without that it would not match the semantics of the other Slice-like types, where the index of the containing collection is preserved in the subsequence (like ArraySlice and Array) right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are intentionally keeping a 1 to 1 Buffer to Slice mapping. We actually don't really care the relative position of the Slices within the source DispatchData, as long as the slices are contiguous. This is the same conceptual design with the original Buffer to DispatchData mapping (as in we only care about individual chunk read). We only switched to slice instead of DispatchData because DispatchData is not contiguous and we need a contiguous storage in order to vend the bytes property.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshed: As we talked previously, since this is not really representing a sub-sequence like a Substring or ArraySlice, can we not call it Slice? It sounds like we're really interested in ensuring that it's contiguous, so perhaps ContiguousRegion or something like that?

@iCharlesHu iCharlesHu force-pushed the charles/buffer-improvements branch from be5121b to 8fa05f4 Compare June 5, 2025 17:29
@iCharlesHu iCharlesHu force-pushed the charles/buffer-improvements branch from 8fa05f4 to 00bcd99 Compare June 5, 2025 18:30

public func lines<Encoding: _UnicodeEncoding>(
encoding: Encoding.Type = UTF8.self,
bufferingPolicy: LineSequence<Encoding>.BufferingPolicy = .unbounded
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should label all 'new' API since 0.0.1 (including the ones I worked on) with some kind of comment so we know which ones we need to have further discussion about in review. Here I would like to talk about the tradeoff when defaulting to an unbounded buffer, with no arguments...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I track them by tagging the PRs with API Change label. I'll add some comments and when we are ready for 0.0.1 I'll list out all the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose .unbounded as the default argument because that's what AsyncStream do today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to throw an error for #45
can we please have an issue tracking updating this to the same behavior?

@iCharlesHu
Copy link
Contributor Author

Addressed Tony's comments. Most noticeably: throw an error when LineSequence buffered enough content that exceeds the maxLineLength limit.

@iCharlesHu
Copy link
Contributor Author

Resolves #15

if self.buffer.isEmpty {
return nil
}
return String(decoding: self.buffer[0 ..< endIndex], as: Encoding.self)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a little unclear on why we need to copy this into an Array, and then copy it into a String. Why can't String read from the buffer data directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to keep an [UInt8] buffer in order to scan through and find new line characters. We can't use String directly as the buffer type because every time when we receive a chunk of data, we can't directly convert it to String since the chunk might end in between byte encodings.

@@ -61,7 +59,15 @@ public struct AsyncBufferSequence: AsyncSequence, Sendable {
#endif
return nil
}
return data
let createdBuffers = Buffer.createFrom(data)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nitpick, but this is just init.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, in this case it sort of goes the other way since it vends buffers from data... I guess we can clean it up later.

Copy link
Contributor Author

@iCharlesHu iCharlesHu Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also it creates multiple. I didn't use initializer because it's DispatchData -> [Buffer].

internal var startIndex: Int { self.bytes.startIndex }
internal var endIndex: Int { self.bytes.endIndex }

internal init(bytes: UnsafeBufferPointer<UInt8>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshed: As we talked previously, since this is not really representing a sub-sequence like a Substring or ArraySlice, can we not call it Slice? It sounds like we're really interested in ensuring that it's contiguous, so perhaps ContiguousRegion or something like that?

#if canImport(Darwin) || canImport(Glibc) || canImport(Android) || canImport(Musl)
extension DispatchData {
/// Unfortunately `DispatchData.Region` is not available on Linux, hence our own wrapper
internal struct Slice: @unchecked Sendable, RandomAccessCollection {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this @unchecked Sendable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it holds a let bytes: UnsafeBufferPointer<UInt8> property. DispatchData.Region's backing storage is also @unchecked Sendable, I suspect for the same reason.

@iCharlesHu
Copy link
Contributor Author

Bikeshed: As we talked previously, since this is not really representing a sub-sequence like a Substring or ArraySlice, can we not call it Slice? It sounds like we're really interested in ensuring that it's contiguous, so perhaps ContiguousRegion or something like that?

How about _ContiguousBufferView? Which follows Swift's *View naming convention that it's a "view" into a larger data. I'll also add underscore to make it clear it's internal and only used by Subprocess.

…turning partial (and potentially incorrect) data
@iCharlesHu iCharlesHu force-pushed the charles/buffer-improvements branch from ddeb6d0 to 195fcee Compare June 10, 2025 00:21
@iCharlesHu iCharlesHu merged commit 8a8c6b7 into swiftlang:main Jun 10, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants