Skip to content

Commit

Permalink
[Internal] JSON Binary Encoding: Adds support for encoding uniform ar…
Browse files Browse the repository at this point in the history
…rays (#4866)

## Description

Added full end-to-end support for writing and reading binary-encoded
uniform number arrays, as well as nested arrays of uniform number
arrays.

**Uniform Number Arrays**
A uniform number array is a JSON array where all items share the same
numeric type. The encoding supports the following numeric types:
- **Int8**: Signed 1-byte integer (-128 to 127)
- **UInt8**: Unsigned 1-byte integer (0 to 255)
- **Int16**: Signed 2-byte integer
- **Int32**: Signed 4-byte integer
- **Int64**: Signed 8-byte integer
- **Float16**: 2-byte floating-point value (currently unsupported)
- **Float32**: 4-byte floating-point value
- **Float64**: 8-byte floating-point value

Uniform number arrays are represented by these new type markers:
- **ArrNumC1**: Uniform number array with a 1-byte item count
- **ArrNumC2**: Uniform number array with a 2-byte item count

Both type markers are encoded as follows:
`| Type marker | Item type marker | Item count |`

To maintain backward compatibility, writing uniform number arrays is
controlled via the `EnableNumberArrays `write option. When enabled, at
the end of writing an array, the writer checks if all values are
numeric. It identifies the smallest numeric type that fits all values
and compares the length of the uniform number array to the regular
array. If the new length is less than or equal to the old one, the array
is converted to a uniform number array.

**Arrays of Uniform Number Arrays**
This encoding enhancement allows for encoding multiple uniform number
arrays with the same underlying numeric type and item count into a
single contiguous array of numbers. The items in all arrays are preceded
by a prefix indicating the common array encoding and the number of
encoded arrays.

Arrays of uniform number arrays are supported by these two new
type-markers:

- **ArrArrNumC1C1**: Array of 1-byte item count of common uniform number
arrays with 1-byte item count.
- **ArrArrNumC2C2**: Array of 2-byte item count of common uniform number
arrays with 2-byte item count.

Both new values are encoded as follows:
`| Type-marker | Array type-marker | Number type-marker | Number item
count | Array item count |`

Similar to uniform number arrays, the writing of arrays of uniform
number arrays is conditional on the `EnableNumberArrays` write option
being specified. This ensures backward compatibility with readers and
navigators that do not yet support this encoding.

**JSON Serialization Testing**

- Introduced a new set of tests for both uniform number arrays and
nested arrays of uniform number arrays.
- Enhanced the `JsonToken` class to support representation of uniform
number array tokens.
- Updated `JsonWriterTest` to include additional validation. This now
not only checks the expected output but also verifies round-trip
consistency across different formats and write options for all three
rewrite scenarios: JSON Navigator, JSON Reader - Write All, and JSON
Reader - Write Current Token.


## Type of change

Please delete options that are not relevant.

- [ ] New feature (non-breaking change which adds functionality)

## Closing issues
  • Loading branch information
sboshra authored Nov 5, 2024
1 parent f8032b4 commit 0018845
Show file tree
Hide file tree
Showing 22 changed files with 9,550 additions and 1,776 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ internal interface IJsonBinaryWriterExtensions : IJsonWriter
{
void WriteRawJsonValue(
ReadOnlyMemory<byte> rootBuffer,
ReadOnlyMemory<byte> rawJsonValue,
bool isRootNode,
int valueOffset,
JsonBinaryEncoding.UniformArrayInfo externalArrayInfo,
bool isFieldName);
}
}
53 changes: 52 additions & 1 deletion Microsoft.Azure.Cosmos/src/Json/IJsonWriter.cs
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@
namespace Microsoft.Azure.Cosmos.Json
{
using System;
using System.Collections.Generic;
using Microsoft.Azure.Cosmos.Core.Utf8;

/// <summary>
/// Interface for all JsonWriters that know how to write jsons of a specific serialization format.
/// Common interface for all JSON writers that can write JSON in a specific serialization format.
/// </summary>
#if INTERNAL
public
Expand Down Expand Up @@ -87,6 +88,54 @@ interface IJsonWriter
/// </summary>
void WriteNullValue();

#region Number Arrays

/// <summary>
/// Writes an array of 8-byte unsigned integer values.
/// </summary>
/// <param name="values">The array of 8-byte unsigned integer values to write.</param>
void WriteNumberArray(IReadOnlyList<byte> values);

/// <summary>
/// Writes an array of 8-byte signed integer values.
/// </summary>
/// <param name="values">The array of 8-byte signed integer values to write.</param>
void WriteNumberArray(IReadOnlyList<sbyte> values);

/// <summary>
/// Writes an array of 16-byte signed integer values.
/// </summary>
/// <param name="values">The array of 16-byte signed integer values to write.</param>
void WriteNumberArray(IReadOnlyList<short> values);

/// <summary>
/// Writes an array of 32-byte signed integer values.
/// </summary>
/// <param name="values">The array of 32-byte signed integer values to write.</param>
void WriteNumberArray(IReadOnlyList<int> values);

/// <summary>
/// Writes an array of 64-byte signed integer values.
/// </summary>
/// <param name="values">The array of 64-byte signed integer values to write.</param>
void WriteNumberArray(IReadOnlyList<long> values);

/// <summary>
/// Writes an array of single-precision floating-point numbers.
/// </summary>
/// <param name="values">The array of single-precision floating-point numbers to write.</param>
void WriteNumberArray(IReadOnlyList<float> values);

/// <summary>
/// Writes an array of double-precision floating-point numbers.
/// </summary>
/// <param name="values">The array of double-precision floating-point numbers to write.</param>
void WriteNumberArray(IReadOnlyList<double> values);

#endregion

#region Extended Types

/// <summary>
/// Writes an single signed byte integer to the internal buffer.
/// </summary>
Expand Down Expand Up @@ -141,6 +190,8 @@ interface IJsonWriter
/// <param name="value">The value of the bytes to write.</param>
void WriteBinaryValue(ReadOnlySpan<byte> value);

#endregion

/// <summary>
/// Gets the result of the JsonWriter.
/// </summary>
Expand Down
136 changes: 66 additions & 70 deletions Microsoft.Azure.Cosmos/src/Json/JsonBinaryEncoding.Enumerator.cs
Original file line number Diff line number Diff line change
Expand Up @@ -13,57 +13,73 @@ internal static partial class JsonBinaryEncoding
{
public static class Enumerator
{
public static IEnumerable<ReadOnlyMemory<byte>> GetArrayItems(ReadOnlyMemory<byte> buffer)
public static IEnumerable<ArrayItem> GetArrayItems(
ReadOnlyMemory<byte> rootBuffer,
int arrayOffset,
UniformArrayInfo externalArrayInfo)
{
ReadOnlyMemory<byte> buffer = rootBuffer.Slice(arrayOffset);
byte typeMarker = buffer.Span[0];
if (!JsonBinaryEncoding.TypeMarker.IsArray(typeMarker))

UniformArrayInfo uniformArrayInfo;
if (externalArrayInfo != null)
{
throw new JsonInvalidTokenException();
uniformArrayInfo = externalArrayInfo.NestedArrayInfo;
}
else
{
uniformArrayInfo = IsUniformArrayTypeMarker(typeMarker) ? GetUniformArrayInfo(buffer.Span) : null;
}

int firstArrayItemOffset = JsonBinaryEncoding.GetFirstValueOffset(typeMarker);
int arrayLength = JsonBinaryEncoding.GetValueLength(buffer.Span);

// Scope to just the array
buffer = buffer.Slice(0, arrayLength);

// Seek to the first array item
buffer = buffer.Slice(firstArrayItemOffset);

while (buffer.Length != 0)
if (uniformArrayInfo != null)
{
int arrayItemLength = JsonBinaryEncoding.GetValueLength(buffer.Span);
if (arrayItemLength > buffer.Length)
int itemStartOffset = arrayOffset + uniformArrayInfo.PrefixSize;
int itemEndOffset = itemStartOffset + (uniformArrayInfo.ItemSize * uniformArrayInfo.ItemCount);
for (int offset = itemStartOffset; offset < itemEndOffset; offset += uniformArrayInfo.ItemSize)
{
yield return new ArrayItem(offset, uniformArrayInfo);
}
}
else
{
if (!TypeMarker.IsArray(typeMarker))
{
// Array Item got cut off.
throw new JsonInvalidTokenException();
}

// Create a buffer for that array item
ReadOnlyMemory<byte> arrayItem = buffer.Slice(0, arrayItemLength);
yield return arrayItem;
int firstArrayItemOffset = JsonBinaryEncoding.GetFirstValueOffset(typeMarker);
int arrayLength = JsonBinaryEncoding.GetValueLength(buffer.Span);

// Slice off the array item
buffer = buffer.Slice(arrayItemLength);
}
}
// Scope to just the array
buffer = buffer.Slice(0, arrayLength);

public static IEnumerable<Memory<byte>> GetMutableArrayItems(Memory<byte> buffer)
{
foreach (ReadOnlyMemory<byte> readOnlyArrayItem in Enumerator.GetArrayItems(buffer))
{
if (!MemoryMarshal.TryGetArray(readOnlyArrayItem, out ArraySegment<byte> segment))
// Seek to the first array item
buffer = buffer.Slice(firstArrayItemOffset);

while (buffer.Length != 0)
{
throw new InvalidOperationException("failed to get array segment.");
}
int arrayItemLength = JsonBinaryEncoding.GetValueLength(buffer.Span);
if (arrayItemLength > buffer.Length)
{
// Array Item got cut off.
throw new JsonInvalidTokenException();
}

yield return segment;
yield return new ArrayItem(arrayOffset + (arrayLength - buffer.Length), null);

// Slice off the array item
buffer = buffer.Slice(arrayItemLength);
}
}
}

public static IEnumerable<ObjectProperty> GetObjectProperties(ReadOnlyMemory<byte> buffer)
public static IEnumerable<ObjectProperty> GetObjectProperties(
ReadOnlyMemory<byte> rootBuffer,
int objectOffset)
{
ReadOnlyMemory<byte> buffer = rootBuffer.Slice(objectOffset);
byte typeMarker = buffer.Span[0];

if (!JsonBinaryEncoding.TypeMarker.IsObject(typeMarker))
{
throw new JsonInvalidTokenException();
Expand All @@ -73,7 +89,7 @@ public static IEnumerable<ObjectProperty> GetObjectProperties(ReadOnlyMemory<byt
int objectLength = JsonBinaryEncoding.GetValueLength(buffer.Span);

// Scope to just the array
buffer = buffer.Slice(0, (int)objectLength);
buffer = buffer.Slice(0, objectLength);

// Seek to the first object property
buffer = buffer.Slice(firstValueOffset);
Expand All @@ -85,7 +101,8 @@ public static IEnumerable<ObjectProperty> GetObjectProperties(ReadOnlyMemory<byt
throw new JsonInvalidTokenException();
}

ReadOnlyMemory<byte> name = buffer.Slice(0, nameNodeLength);
int nameOffset = objectOffset + (objectLength - buffer.Length);

buffer = buffer.Slice(nameNodeLength);

int valueNodeLength = JsonBinaryEncoding.GetValueLength(buffer.Span);
Expand All @@ -94,57 +111,36 @@ public static IEnumerable<ObjectProperty> GetObjectProperties(ReadOnlyMemory<byt
throw new JsonInvalidTokenException();
}

ReadOnlyMemory<byte> value = buffer.Slice(0, valueNodeLength);
buffer = buffer.Slice(valueNodeLength);

yield return new ObjectProperty(name, value);
}
}

public static IEnumerable<MutableObjectProperty> GetMutableObjectProperties(Memory<byte> buffer)
{
foreach (ObjectProperty objectProperty in GetObjectProperties(buffer))
{
if (!MemoryMarshal.TryGetArray(objectProperty.Name, out ArraySegment<byte> nameSegment))
{
throw new InvalidOperationException("failed to get array segment.");
}
int valueOffset = objectOffset + (objectLength - buffer.Length);

if (!MemoryMarshal.TryGetArray(objectProperty.Value, out ArraySegment<byte> valueSegment))
{
throw new InvalidOperationException("failed to get array segment.");
}
buffer = buffer.Slice(valueNodeLength);

yield return new MutableObjectProperty(nameSegment, valueSegment);
yield return new ObjectProperty(nameOffset, valueOffset);
}
}

public readonly struct ObjectProperty
public readonly struct ArrayItem
{
public ObjectProperty(
ReadOnlyMemory<byte> name,
ReadOnlyMemory<byte> value)
public ArrayItem(int offset, UniformArrayInfo externalArrayInfo)
{
this.Name = name;
this.Value = value;
this.Offset = offset;
this.ExternalArrayInfo = externalArrayInfo;
}

public ReadOnlyMemory<byte> Name { get; }
public ReadOnlyMemory<byte> Value { get; }
public int Offset { get; }
public UniformArrayInfo ExternalArrayInfo { get; }
}

public readonly struct MutableObjectProperty
public readonly struct ObjectProperty
{
public MutableObjectProperty(
Memory<byte> name,
Memory<byte> value)
public ObjectProperty(int nameOffset, int valueOffset)
{
this.Name = name;
this.Value = value;
this.NameOffset = nameOffset;
this.ValueOffset = valueOffset;
}

public Memory<byte> Name { get; }
public Memory<byte> Value { get; }
public int NameOffset { get; }
public int ValueOffset { get; }
}
}
}
Expand Down
16 changes: 8 additions & 8 deletions Microsoft.Azure.Cosmos/src/Json/JsonBinaryEncoding.NodeTypes.cs
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ public static class NodeTypes
String, // StrR2 (Reference string of 2-byte offset)
String, // StrR3 (Reference string of 3-byte offset)
String, // StrR4 (Reference string of 4-byte offset)
Unknown, // <empty> 0xC7
Number, // NumUI64

// Number Values
Number, // NumUI8
Expand All @@ -109,7 +109,7 @@ public static class NodeTypes
Number, // NumDbl,
Float32, // Float32
Float64, // Float64
Unknown, // <empty> 0xCF
Unknown, // Float16 (No corresponding JsonNodeType at the moment)

// Other Value Types
Null, // Null
Expand All @@ -119,7 +119,7 @@ public static class NodeTypes
Unknown, // <empty> 0xD4
Unknown, // <empty> 0xD5
Unknown, // <empty> 0xD6
Unknown, // <empty> 0xD7
Unknown, // UInt8 (No corresponding JsonNodeType at the moment)

Int8, // Int8
Int16, // Int16
Expand Down Expand Up @@ -150,11 +150,11 @@ public static class NodeTypes
Object, // ObjLC2 (2-byte length and count)
Object, // ObjLC4 (4-byte length and count)

// Empty Range
Unknown, // <empty> 0xF0
Unknown, // <empty> 0xF1
Unknown, // <empty> 0xF2
Unknown, // <empty> 0xF3
// Array and Object Special Type Markers
Array, // ArrNumC1 Uniform number array of 1-byte item count
Array, // ArrNumC2 Uniform number array of 2-byte item count
Array, // Array of 1-byte item count of Uniform number array of 1-byte item count
Array, // Array of 2-byte item count of Uniform number array of 2-byte item count
Unknown, // <empty> 0xF4
Unknown, // <empty> 0xF5
Unknown, // <empty> 0xF6
Expand Down
Loading

0 comments on commit 0018845

Please sign in to comment.