Skip to content

Utf8JsonWriter API Proposal #27938

Closed
Closed
@ahsonkhan

Description

@ahsonkhan

A JsonWriter API that supports writing UTF-8 encoded data natively with emphasis on high performance and low allocations.

namespace System.Text.Json {

    public sealed class JsonWriterException : Exception {
        public JsonWriterException(string message);
    }

    public struct JsonWriterOptions {
        // Alternative names: Indent(ed), Minify(ied), Minimize(d), PrettyPrint(ed)
        public bool Formatted { get; set; }
        public bool SkipValidation { get; set; }
    }

    public struct JsonWriterState {
        public JsonWriterState(int maxDepth = DefaultMaxDepth, JsonWriterOptions options = default);
        public JsonWriterOptions Options { get; }
        public int MaxDepth { get; }
        public long BytesWritten { get; } // long?
    }

    public static class Utf8JsonWriter {
        public static OperationStatus EscapeString(ReadOnlySpan<byte> value, Span<byte> destination, out int consumed, out int bytesWritten);
        public static OperationStatus EscapeString(ReadOnlySpan<char> value, Span<byte> destination, out int consumed, out int bytesWritten);
        public static OperationStatus EscapeString(string value, Span<byte> destination, out int consumed, out int bytesWritten);
        public static Utf8JsonWriter<IBufferWriter<byte>> CreateFromStream(Stream stream, JsonWriterState state);
        public static Utf8JsonWriter<IBufferWriter<byte>> CreateFromMemory(Memory<byte> memory, JsonWriterState state);
    }

    public ref struct Utf8JsonWriter<TBufferWriter> where TBufferWriter : IBufferWriter<byte> {
        public Utf8JsonWriter(TBufferWriter bufferWriter, JsonWriterState state);
        public int BytesWritten { get; }
        public int CurrentDepth { get; }
        public JsonWriterState CurrentState {get; }
        public JsonTokenType TokenType { get; } // set?
        public void Flush(bool isFinalBlock = true); // should it validate before advancing the IBufferWriter?

        // These APIs exist mainly for performance, hence optional.
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<int> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<long> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<uint> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<ulong> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<decimal> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<double> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<float> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<Guid> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<DateTime> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<DateTimeOffset> values);
        // Can't support Span of Spans. Is there value in adding support for string (or byte) arrays?
        //public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<string> values);

        // All permutation of start/end array/object with various property name formats.
        public void WriteEndArray();
        public void WriteEndObject();

        public void WriteStartArray();
        public void WriteStartArray(ReadOnlySpan<byte> propertyName);
        public void WriteStartArray(ReadOnlySpan<char> propertyName);
        public void WriteStartArray(string propertyName);

        public void WriteStartObject();
        public void WriteStartObject(ReadOnlySpan<byte> propertyName);
        public void WriteStartObject(ReadOnlySpan<char> propertyName);
        public void WriteStartObject(string propertyName);

        // Writing property name & values (for writing tokens within JSON objects)
        public void WriteBoolean(ReadOnlySpan<byte> propertyName, bool value);
        public void WriteBoolean(ReadOnlySpan<char> propertyName, bool value);
        public void WriteBoolean(string propertyName, bool value);

        public void WriteBytesUnchecked(ReadOnlySpan<byte> propertyName, ReadOnlySpan<byte> utf8Bytes);
        public void WriteBytesUnchecked(ReadOnlySpan<char> propertyName, ReadOnlySpan<byte> utf8Bytes);
        public void WriteBytesUnchecked(string propertyName, ReadOnlySpan<byte> utf8Bytes);

        public void WriteNull(ReadOnlySpan<byte> propertyName);
        public void WriteNull(ReadOnlySpan<char> propertyName);
        public void WriteNull(string propertyName);

        public void WriteNumber(ReadOnlySpan<byte> propertyName, decimal value);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, double value);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, int value);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, long value);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, float value);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, uint value);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, ulong value);

        public void WriteNumber(ReadOnlySpan<char> propertyName, decimal value);
        public void WriteNumber(ReadOnlySpan<char> propertyName, double value);
        public void WriteNumber(ReadOnlySpan<char> propertyName, int value);
        public void WriteNumber(ReadOnlySpan<char> propertyName, long value);
        public void WriteNumber(ReadOnlySpan<char> propertyName, float value);
        public void WriteNumber(ReadOnlySpan<char> propertyName, uint value);
        public void WriteNumber(ReadOnlySpan<char> propertyName, ulong value);

        public void WriteNumber(string propertyName, decimal value);
        public void WriteNumber(string propertyName, double value);
        public void WriteNumber(string propertyName, int value);
        public void WriteNumber(string propertyName, long value);
        public void WriteNumber(string propertyName, float value);
        public void WriteNumber(string propertyName, uint value);
        public void WriteNumber(string propertyName, ulong value);

        public void WriteString(ReadOnlySpan<byte> propertyName, DateTime value);
        public void WriteString(ReadOnlySpan<byte> propertyName, DateTimeOffset value);
        public void WriteString(ReadOnlySpan<byte> propertyName, Guid value);
        public void WriteString(ReadOnlySpan<byte> propertyName, ReadOnlySpan<byte> value);
        public void WriteString(ReadOnlySpan<byte> propertyName, ReadOnlySpan<char> value);
        public void WriteString(ReadOnlySpan<byte> propertyName, string value);

        public void WriteString(ReadOnlySpan<char> propertyName, DateTime value);
        public void WriteString(ReadOnlySpan<char> propertyName, DateTimeOffset value);
        public void WriteString(ReadOnlySpan<char> propertyName, Guid value);
        public void WriteString(ReadOnlySpan<char> propertyName, ReadOnlySpan<byte> value);
        public void WriteString(ReadOnlySpan<char> propertyName, ReadOnlySpan<char> value);
        public void WriteString(ReadOnlySpan<char> propertyName, string value);

        public void WriteString(string propertyName, DateTime value);
        public void WriteString(string propertyName, DateTimeOffset value);
        public void WriteString(string propertyName, Guid value);
        public void WriteString(string propertyName, ReadOnlySpan<byte> value);
        public void WriteString(string propertyName, ReadOnlySpan<char> value);
        public void WriteString(string propertyName, string value);

        // Writing values (for single-value JSON or for writing tokens within JSON arrays)
        public void WriteNull();
        public void WriteValue(bool value);
        public void WriteValue(DateTime value);
        public void WriteValue(DateTimeOffset value);
        public void WriteValue(decimal value);
        public void WriteValue(double value);
        public void WriteValue(Guid value);
        public void WriteValue(int value);
        public void WriteValue(long value);
        public void WriteValue(float value);
        public void WriteValue(uint value);
        public void WriteValue(ulong value);

        public void WriteValue(ReadOnlySpan<byte> utf8Text);
        public void WriteValue(ReadOnlySpan<char> utf16Text);
        public void WriteValue(string utf16Text);

        public void WriteBytesUnchecked(ReadOnlySpan<byte> utf8Bytes);

        public void WriteComments(string comment);
        public void WriteComments(ReadOnlySpan<char> comment);
        public void WriteComments(ReadOnlySpan<byte> comment);
    }
}
Previous iteration of the APIs
namespace System.Text.Json {
    // Factory
    public static class Utf8JsonWriter {
        public static Utf8JsonWriter<TBufferWriter> Create<TBufferWriter>(TBufferWriter bufferWriter, bool prettyPrint = false) where TBufferWriter : IBufferWriter<byte>;
        // Other potential factory methods:
        public static Utf8JsonWriter<TBufferWriter> Create<TBufferWriter>(Stream stream, bool prettyPrint = false);
        // Can't support writing to a span directly.
    }
    public ref struct Utf8JsonWriter<TBufferWriter> where TBufferWriter : IBufferWriter<byte> {
        // ctors
        public Utf8JsonWriter(TBufferWriter bufferWriter, bool prettyPrint = false);

        // Advances the IBufferWriter
        public void Flush();
        
        public void WriteStartArray();
        public void WriteEndArray();
        public void WriteStartArray(ReadOnlySpan<byte> name);
        // public void WriteStartArray(string name);
        
        public void WriteStartObject();
        public void WriteEndObject(); // Throws FormatException - should it throw JsonWriterException?
        public void WriteStartObject(ReadOnlySpan<byte> name);
        // public void WriteStartObject(string name);

        // Writing all supported .NET types - which ones should we support?
        // These APIs exist mainly for performance.
        public void WriteArray(ReadOnlySpan<byte> name, ReadOnlySpan<int> values);
        public void WriteArray(ReadOnlySpan<byte> name, ReadOnlySpan<bool> values);
        public void WriteArray(ReadOnlySpan<byte> name, ReadOnlySpan<string> values);
        public void WriteArray(ReadOnlySpan<byte> name, ReadOnlySpan<ReadOnlyMemory<byte>> utf8StringBytes);
        // ... etc.

        // string-based overloads that transcode to UTF-8?
        public void WriteArray(string name, ReadOnlySpan<int> values);
        public void WriteArray(string name, ReadOnlySpan<bool> values);
        public void WriteArray(string name, ReadOnlySpan<string> values);
        public void WriteArray(string name, ReadOnlySpan<ReadOnlyMemory<byte>> utf8StringBytes);
        // ... etc.

        // Writing all supported .NET types - which ones should we support?
        public void WriteNameValue(ReadOnlySpan<byte> name, bool value);
        public void WriteNameValue(ReadOnlySpan<byte> name, DateTime value);
        public void WriteNameValue(ReadOnlySpan<byte> name, DateTimeOffset value);
        public void WriteNameValue(ReadOnlySpan<byte> name, Guid value);
        public void WriteNameValue(ReadOnlySpan<byte> name, long value);
        public void WriteNameValue(ReadOnlySpan<byte> name, ulong value);
        public void WriteNameValue(ReadOnlySpan<byte> name, string value);
        public void WriteNameValue(ReadOnlySpan<byte> name, ReadOnlySpan<byte> utf8Bytes);
        public void WriteNameValueAsBase64(ReadOnlySpan<byte> name, ReadOnlySpan<byte> utf8Bytes);
        // Provide escape overloads that allow skipping escaping the name and values where relevant?
        // ... etc.

        // string-based overloads that transcode to UTF-8?
        public void WriteNameValue(string name, bool value);
        public void WriteNameValue(string name, DateTime value);
        public void WriteNameValue(string name, DateTimeOffset value);
        public void WriteNameValue(string name, Guid value);
        public void WriteNameValue(string name, long value);
        public void WriteNameValue(string name, ulong value);
        public void WriteNameValue(string name, string value);
        public void WriteNameValue(string name, ReadOnlySpan<byte> utf8StringBytes);
        public void WriteNameValueRaw(string name, ReadOnlySpan<byte> utf8Bytes);
        public void WriteNameValueAsBase64(string name, ReadOnlySpan<byte> utf8Bytes);
        // ... etc.

        public void WriteNull(ReadOnlySpan<byte> name);
        public void WriteNull();
     
        // Matching the capabilities of WriteNameValue - which ones should we support?
        public void WriteValue(bool value);
        public void WriteValue(DateTime value);
        public void WriteValue(DateTimeOffset value);
        public void WriteValue(Guid value);
        public void WriteValue(long value);
        public void WriteValue(ulong value);
        public void WriteValue(string value);
        // public void WriteValue(string value, bool escape);
        public void WriteValue(ReadOnlySpan<byte> utf8StringBytes);
        // public void WriteValue(ReadOnlySpan<byte> utf8Bytes, bool escape);
        public void WriteValueRaw(ReadOnlySpan<byte> utf8Bytes);
        public void WriteValueAsBase64(ReadOnlySpan<byte> utf8Bytes);
        // ... etc.

        public void Write(Utf8JsonReader reader, bool writeChildren = false);
    }
}

Sample Usage:

private static void WriteHelloWorld(bool formatted, ArrayFormatterWrapper output)
{
    var json = new Utf8JsonWriter<ArrayFormatterWrapper>(output, prettyPrint: formatted);

    json.WriteStartObject();
    json.WriteNameValue(Message, HelloWorld);
    json.WriteEndObject();
    json.Flush();
}

private static void WriteBasicJson(bool formatted, ArrayFormatterWrapper output, ReadOnlySpan<int> data)
{
    Utf8JsonWriter<ArrayFormatterWrapper> json = Utf8JsonWriter.Create(output, prettyPrint: formatted);

    json.WriteStartObject();
    json.WriteNameValue("age", 42);
    json.WriteNameValue("first", "John");
    json.WriteNameValue("last", "Smith");
    json.WriteStartArray("phoneNumbers");
    json.WriteValue("425-000-1212");
    json.WriteValue("425-000-1213");
    json.WriteEndArray();
    json.WriteStartObject("address");
    json.WriteNameValue("street", "1 Microsoft Way");
    json.WriteNameValue("city", "Redmond");
    json.WriteNameValue("zip", 98052);
    json.WriteEndObject();
    json.WriteArray(ExtraArray, data);
    json.WriteEndObject();
    json.Flush();
}

From SignalR: https://github.com/ahsonkhan/SignalR/blob/9d4a51d6c1eb7cb2a68b154e107e4265fc804b7d/src/Microsoft.AspNetCore.Http.Connections.Common/NegotiateProtocol.cs#L31

        public static void WriteResponse(NegotiationResponse response, IBufferWriter<byte> output)
        {
            Utf8JsonWriter<IBufferWriter<byte>> jsonWriter = Utf8JsonWriter.Create(output);

            jsonWriter.WriteObjectStart();

            if (!string.IsNullOrEmpty(response.Url))
            {
                jsonWriter.WriteAttribute(UrlPropertyName, response.Url);
            }

            if (!string.IsNullOrEmpty(response.AccessToken))
            {
                jsonWriter.WriteAttribute(AccessTokenPropertyName, response.AccessToken);
            }

            if (!string.IsNullOrEmpty(response.ConnectionId))
            {
                jsonWriter.WriteAttribute(ConnectionIdPropertyName, response.ConnectionId);
            }

            jsonWriter.WriteArrayStart(AvailableTransportsPropertyName);

            if (response.AvailableTransports != null)
            {
                foreach (var availableTransport in response.AvailableTransports)
                {
                    jsonWriter.WriteObjectStart();
                    jsonWriter.WriteAttribute(TransportPropertyName, availableTransport.Transport);
                    jsonWriter.WriteArrayStart(TransferFormatsPropertyName);

                    if (availableTransport.TransferFormats != null)
                    {
                        foreach (var transferFormat in availableTransport.TransferFormats)
                        {
                            jsonWriter.WriteValue(transferFormat);
                        }
                    }

                    jsonWriter.WriteArrayEnd();
                    jsonWriter.WriteObjectEnd();
                }
            }

            jsonWriter.WriteArrayEnd();
            jsonWriter.WriteObjectEnd();

            jsonWriter.Flush();
        }

For Reference:

Notes:

  • Why is the type generic? This type relies on BufferWriter as an implementation detail (ref struct BufferWriter<T> where T : IBufferWriter<byte>). From @benaadams:
  1. If you pass BufferWriter<IBufferWriter<byte>> a struct formatter; it will box to the IBufferWriter<byte> type, allocation and perf issue. Using the generic will avoid the allocation and boxing.
  2. The methods called internally to BufferWriter<T> (e.g. GetSpan and Advance) will go via generic interface dispatch in a shared generic; which can't inline and is also the slowest calling convention for the clr. Changing it to the generic means when you pass it a struct type it will use a non-shared generic and direct, inlinable calls.

See dotnet/corefxlab#2358 (comment) for more details.

  • Some of the APIs exist for performance reasons, to reduce the chattiness and to reduce interface calls. In some cases, they are also convenient (like WriteNameValue, or WriteStartArray which accepts a name).
  • Since this is a ref struct (like the reader), async support would need to be built on top.
  • We cannot support writing directly to a span since ref structs cannot implement interfaces (i.e. the IBufferWriter<byte>). If we can get language support to enable that, we could add a span-based factory method.

Questions:

  1. What .NET Types should we support? Decimal? TimeSpan? All numeric types? Uri? BigInteger? Byte/Char?
  2. Should we provide string-based overloads at all (that transcode before writing)?
  3. Allow writing comments? What should the semantics of such an API be?
  4. Do we need to provide an extensible JsonWriterOption which enables features like MaxDepth, custom indentation character, custom new line character, string escape handling? Currently, we only have two options, which are served by a bool parameter within a ctor:
    • minimized (default, i.e. no extra white space)
    • pretty printed (properly indented with 2 space characters based on depth with Environment.NewLine)
    • We escape everything based on a white-list, by default. One alternative could be to provide overloads that accept bool escape.
  5. General API and argument names:
    • WriteAttribute, WritePropertyNameValue, WriteKeyValue, etc.
    • PrettyPrint or Formatted?
    • Argument name - name or propertyName?
  6. How do we want to deal with API overloads that accept System.String, UTF-8 string as ReadOnlySpan<byte>, and raw bytes as ReadOnlySpan<byte> that need to be Base64-encoded? Should the argument name remain value or be explicitly different? The current proposal is to add AsBase64 to the API name to help with overload resolution.
  7. What should the API name be for writing null values? WriteNull or WriteValueNull/WriteNameValueNull?
  8. Should we provide support for writing raw JSON data as string/UTF-8 bytes?
  9. Should we have static factory methods or are constructors good enough?
Utf8JsonWriter<ArrayFormatterWrapper> json = Utf8JsonWriter.Create(output, prettyPrint: true);
var json = new Utf8JsonWriter<ArrayFormatterWrapper>(output, prettyPrint: true);

cc @KrzysztofCwalina, @terrajobst, @davidfowl, @steveharter, @joshfree, @benaadams

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions