Skip to content

Conversation

@eiriktsarpalis
Copy link
Member

From #70107 (comment). The current implementation of the internal ReadFromStreamAsync method will keep reading from the stream until either the internal buffer is filled or the stream EOF's, which is intended as a performance optimization to avoid redundant deserialization calls. While this makes sense in the case of DeserializeAsync, it can impact liveness when deserializing IAsyncEnumerable sequences in examples such as the one seen in #70107.

This PR changes the behavior of stream reading specifically in DeserializeAsyncEnumerable such that the deserializer is invoked as soon as the first Stream.ReadAsync operation returns. Moreover it makes a few more changes:

  • Encapsulate the buffer management logic behind the ReadBufferState struct.
  • Avoid allocating a new JsonTypeInfo instance on every IAsyncEnumerator initialization.

Fix #70107.

* Change DeserializeAsyncEnumerable so that reading from the underlying stream does not wait until the underlying buffer is full.
* Encapsulate the buffer management logic behind ReadBufferState.
* Avoid allocating a new JsonTypeInfo instance on every IAsyncEnumerator initialization.
@ghost
Copy link

ghost commented Jun 8, 2022

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

From #70107 (comment). The current implementation of the internal ReadFromStreamAsync method will keep reading from the stream until either the internal buffer is filled or the stream EOF's, which is intended as a performance optimization to avoid redundant deserialization calls. While this makes sense in the case of DeserializeAsync, it can impact liveness when deserializing IAsyncEnumerable sequences in examples such as the one seen in #70107.

This PR changes the behavior of stream reading specifically in DeserializeAsyncEnumerable such that the deserializer is invoked as soon as the first Stream.ReadAsync operation returns. Moreover it makes a few more changes:

  • Encapsulate the buffer management logic behind the ReadBufferState struct.
  • Avoid allocating a new JsonTypeInfo instance on every IAsyncEnumerator initialization.

Fix #70107.

Author: eiriktsarpalis
Assignees: -
Labels:

area-System.Text.Json

Milestone: -

@eiriktsarpalis eiriktsarpalis requested a review from krwq June 8, 2022 12:26
@eiriktsarpalis eiriktsarpalis added this to the 7.0.0 milestone Jun 8, 2022
Comment on lines +163 to +166
new Span<byte>(_buffer, 0, _maxCount).Clear();

byte[] toReturn = Buffer;
Buffer = null!;
byte[] toReturn = _buffer;
_buffer = null!;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering you return the copy of this buffer state in some of the methods is it safe to do it like this? Should you add some guarding against double free?

Copy link
Member

@krwq krwq Jun 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing you always assign back the copy of the struct, please make sure to at least add comment on the method returning copy and the class itself. Alternative perhaps some #if DEBUG with extra code protecting us against this pattern (i.e. wrap buffer in ref type, set it to null on Dispose and check with Debug.Assert)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering you return the copy of this buffer state in some of the methods is it safe to do it like this? Should you add some guarding against double free?

Yes, assuming you somehow end up disposing two copies of the same struct then that is a possibility, but we're not doing this in the handful of methods that creates the struct.

perhaps some #if DEBUG extra code protecting us against this pattern (i.e. wrap buffer in ref type, set it to null on Dispose and check with Debug.Assert)

Can you give an example? Not sure I understand. FWIW this is not new code, I'm just moving existing logic to the struct.

Copy link
Member

@krwq krwq Jun 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadBufferState a = ...;
ReadBufferState b = a;
a.Dispose();
b.Dispose(); // double free or usage after dispose if we don't dispose and continue using

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the condition, my question is what assert could guard against such a potential scenario.

Copy link
Member

@krwq krwq Jun 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the struct you could potentially add nested class like this (under #if DEBUG):

private class BufferRef
{
  public byte[]? Buffer;
}

and then add field in the struct:

BufferRef _myBuffer = new BufferRef(); // assign _buffer to _myBuffer.Buffer;

then if struct gets copied then ref gets copied as well but underlying field will be shared. Then in the Dispose you could clear that field and check if it's not null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per discussion offline, better to do this for all similar structs in separate PR in order to potentially find potential memory corruption bug

@eiriktsarpalis eiriktsarpalis merged commit fc538c2 into dotnet:main Jun 10, 2022
@eiriktsarpalis eiriktsarpalis deleted the readstream-improvements branch June 10, 2022 15:47
@ghost ghost locked as resolved and limited conversation to collaborators Jul 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[System.Text.Json] DeserializeAsyncEnumerable is delayed emitting objects when writing asynchronously to stream

3 participants