Skip to content

Writing raw JSON values when using Utf8JsonWriter #1784

Closed
@nahk-ivanov

Description

@nahk-ivanov

Edited by @layomia

Original proposal by @nahk-ivanov (click to view)

According to the image in the roadmap - Utf8JsonWriter would be the lowest level of API we are going to get, but it's missing WriteRawValue functionality from JSON.NET. Seems like it is "by design", as roadmap states

Does not allow writing invalid JSON according to the JSON RFC

In this case, it would be nice to have access to the output stream directly in the custom JsonConverter, as there are still scenarios where for performance we want to write raw JSON. It's quite often that application layer doesn't care about particular branches of the JSON document and just wants to pass them through to/from the data store (especially when using NoSQL). What we used to do with JSON.NET in such cases is maintain it as string property and define custom converter that would use WriteRawValue, which took a string argument.

One of the examples would be using Azure Table Storage, where we want to extract few top-level properties to be PartitionKey/RowKey and store other nested properties in their original JSON form. I don't see any reason why application would unnecessarily deserialize and serialize these parts of the tree - all we need is to just run forward-only reader to return the value of the given property in it's raw form and later write it as such.

I looked at the current System.Text.Json APIs and can't find a good way of doing this. It seems like (for performance reasons) we would want to maintain raw data in a binary form now (compared to string previously), which is fine, but there is no way to read/write next value as bytes. The best I could find is to use JsonDocument.ParseValue + document.RootElement.GetRawText() when reading it and parse it again into a JsonDocument when writing to just call WriteTo.

Am I missing some APIs?


Introduction

There are scenarios where customers want to integrate existing, "raw" JSON payloads when writing new JSON payloads with Utf8JsonWriter. There's no first-class API for providing this functionality today, but there's a workaround using JsonDocument:

// UTF8 representation of {"Hello":"World"}
byte[] rawJson = GetRawPayload();

writer.WriteStartObject();
writer.WritePropertyName("Payload");

// No easy way to write this raw JSON payload. Here's a workaround:
using (JsonDocument document = JsonDocument.Parse(rawJson))
{
    document.RootElement.WriteTo(writer);
}

writer.WriteEndObject():

This implementation builds a metadata database for navigating a JSON document, which is unnecessary given the desired functionality and has a performance overhead.

The goal of this feature is to provide performant and safe APIs to write raw JSON values.

API Proposal

namespace System.Text.Json
{
  public sealed partial class Utf8JsonWriter
  {
    // Writes the span of bytes directly as JSON content.
    public void WriteRawValue(ReadOnlySpan<byte> utf8Json, bool skipValidation = false) { }

    // Writes the span of bytes directly as JSON content.
    public void WriteRawValue(ReadOnlySpan<char> json, bool skipValidation = false) { }

    // Writes the span of bytes directly as JSON content.
    public void WriteRawValue(string json, bool skipValidation = false) { }
  }
}
Alternate Design (click to view)
namespace System.Text.Json
{
  // Specifies processing options when writing raw JSON values.
  [Flags]
  public enum JsonWriteRawOptions
  {
    // Raw JSON values will be validated for structural correctness and encoded if required.
    Default = 0,

    // Whether to skip validation. Independent of JsonWriterOptions.SkipValidation.
    SkipValidation = 1,

    // Whether to skip encoding specified via JsonWriterOptions.Encoder.
    SkipEncoding = 2,
  }

  public sealed partial class Utf8JsonWriter
  {
    // Writes the span of bytes directly as JSON content, according to the specified options.
    public void WriteRawValue(ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options = default) { }
  }
}

Potential additions

namespace System.Text.Json
{
  [Flags]
  public enum JsonWriteRawOptions
  {
    // Existing
    // Default = 0,
    // SkipValidation = 1,
    // SkipEncoding = 2,

    // Whether to reformat to raw payload according to the JsonWriterOptions, or write it as-is.
    // Reformatting is influenced by JsonWriterOptions.Indented, and may result in whitespace changes.
    Reformat = 4
  }

  public sealed partial class Utf8JsonWriter
  {
    // Existing
    // public void WriteRawValue(ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions? options = null) { }

    // Write raw value overload that takes ROS<char>.
    public void WriteRawValue(ReadOnlySpan<char> json, JsonWriteRawOptions options = default) { }
    
    // Write raw value overload that takes string.
    public void WriteRawValue(string json, JsonWriteRawOptions options = default) { }
  }
}

Scenarios

The major considerations for this feature are:

  • Should we validate raw payloads for structural correctness?
  • Should we encode the raw JSON payload according to the writer options?
  • Should we maintain the formatting of the raw payload when writing, or should we re-format it according to the writer options?

The answers depend on the scenario. Is the raw JSON payload trusted or arbitrary? Is the enveloping of the raw payload done on behalf of users? This proposal seeks to make all of these post-processing options fully configurable by users, allowing them to tweak the behavior in a way that satisfies their performance, correctness, and security requirements.

In this proposal, raw JSON values will not be reformatted to align with whatever whitespace and indentation settings are specified on JsonWriterOptions, but left as-is. We can provide a future API to reformat raw JSON values. We'll also not encode values by default but can provide API to do it in the future. If users need all three processing options today, the workaround shown above works for that scenario.

Let's go over two of the expected common scenarios, and what API would be configured to enable them.

I have a blob which I think represents JSON content and which I want to envelope, and I need to make sure the envelope & its inner contents remain well-formed

Consider the Azure SDK team which provides a service protocol where the payload content is JSON. Part of the JSON is internal protocol information, and a nested part of the JSON is user provided data. When serializing protocol information, Azure cares about the raw user JSON being structurally valid, and ultimately that the overall JSON payload is valid. They don't wish to alter the formatting of the raw JSON, or preemptively encode it. Given these considerations, they might write their serialization logic as follows:

JsonWriterOptions writerOptions = new() { WriteIndented = true };

using MemoryStream ms = new();
using UtfJsonWriter writer = new(ms, writerOptions);

writer.WriteStartObject();
writer.WriteString("id", protocol.Id);
writer.WriteString("eventType", protocol.EventType);
writer.WriteString("eventTime", DateTimeOffset.UtcNow);
// Write user-provided data.
writer.WritePropertyName("data");
writer.WriteRawValue(protocol.UserBlob);
writer.WriteEndObject();

The resulting JSON payload might look like this:

{
    "id": "a20299ee-f239-42ce-8eee-2da562848fbe",
    "eventType": "Microsoft.Resources.ResourceWriteSuccess",
    "eventTime": "2021-06-10T14:29:10.3363984+00:00",
    "data":{"auth":"{az_resource_mgr_auth}","correlationId":"76d8b695-e98b-4fb6-81c4-1edc1aa22dfc","tenantId":"76d8b695-e98b-4fb6-81c4-1edc1aa22dfc"}
}

Notice that no JsonWriteRawOptions had to be specified.

I have a deliberate sequence of bytes I want to write out on the wire, and I know what I'm doing

Consider a user who needs to format double values differently from the Utf8JsonWriter.WriteNumber[X] methods. Those methods write the shortest possible JSON output required for round-tripping on deserialization. This means that a float or double value that can be expressed simply as 89 would not be written as 89.0. However, our user needs to send numeric JSON content to a service which treats values without decimal points as integers, and values with decimal points as doubles. This distinction is important to our user. The WriteRawValue APIs would allow our custom to format their numeric values as they wish, and write them directly to the destination buffer. In this scenario, their numeric values are trusted, and our user would not want the writer to perform any encoding or structural validation on their raw JSON payloads, to give the fastest possible perf. To satisfy this trusted scenario, our user might write serialization logic as follows:

JsonWriterOptions writerOptions = new() { WriteIndented = true, };

using MemoryStream ms = new();
using UtfJsonWriter writer = new(ms, writerOptions);

writer.WriteStartObject();
writer.WritePropertyName("dataType", "CalculationResults");

writer.WriteStartArray("data");

foreach (CalculationResult result in results)
{
    writer.WriteStartObject();
    writer.WriteString("measurement", result.Measurement);

    writer.WritePropertyName("value");
    // Write raw JSON numeric value
    byte[] formattedValue = FormatNumberValue(resultValue);
    writer.WriteRawValue(formattedValue, skipValidation: true);

    writer.WriteEndObject();
}

writer.WriteEndArray();
writer.WriteEndObject();

The resulting JSON payload might look like this:

{
    "dataType": "CalculationResults",
    "data": [
        {
            "measurement": "Min",
            "value": 50.4
        },
        {
            "measurement": "Max",
            "value": 2
        }
    ]
}

Notes

  • The WriteRawValue APIs would work not just for writing raw values as JSON property values (i.e. as POCO properties and dictionary values), but also as root level JSON objects, arrays, and primitives; and as JSON array elements. This functions in the same way as the existing Utf8JsonWriter.WriteXValue methods.

What's next?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions