Skip to content

IUtf8SpanFormattable and IUtf8SpanParsable #81500

Open

Description

Background and Motivation

We currently support UTF-16 based formatting and parsing and even expose common interfaces through which any developer can declare their types as supporting the same. However, we have no such support for the same around UTF-8.

With UTF-8 being ever more prevalent for various scenarios, it would be ideal if similar interfaces could be exposed so users can express that their own types support the functionality.

As such, I propose we expose two new interfaces that support parsing/formatting types using UTF-8. These interfaces would only support Span today and as we do not have a corresponding Utf8String type that would make exposing IUtf8Formattable or IUtf8Parsable viable today. We could express those as byte[], but that is "less ideal" and blocks us from supporting any future utf8 string type.

Proposed API

namespace System;

public interface IUtf8SpanFormattable
{
    bool TryFormat(Span<byte> destination, out int bytesWritten, ReadOnlySpan<byte> format, IFormatProvider? provider);
}

public interface IUtf8SpanParsable<TSelf>
    where TSelf : IUtf8SpanParsable<TSelf>?
{
    static abstract TSelf Parse(ReadOnlySpan<byte> s, IFormatProvider? provider);

    static abstract bool TryParse(ReadOnlySpan<byte> s, IFormatProvider? provider, [MaybeNullWhen(returnValue: false)] out TSelf result);
}

Initial types that will implement the interface

namespace System
{
    public partial struct Byte : IUtf8SpanFormattable, IUtf8SpanParsable<byte>;
    public partial struct Char : IUtf8SpanFormattable, IUtf8SpanParsable<char>;
    public partial struct Decimal : IUtf8SpanFormattable, IUtf8SpanParsable<decimal>;
    public partial struct Double : IUtf8SpanFormattable, IUtf8SpanParsable<double>;
    public partial struct Half : IUtf8SpanFormattable, IUtf8SpanParsable<Half>;
    public partial struct Int16 : IUtf8SpanFormattable, IUtf8SpanParsable<short>;
    public partial struct Int32 : IUtf8SpanFormattable, IUtf8SpanParsable<int>;
    public partial struct Int64 : IUtf8SpanFormattable, IUtf8SpanParsable<long>;
    public partial struct Int128 : IUtf8SpanFormattable, IUtf8SpanParsable<Int128>;
    public partial struct IntPtr : IUtf8SpanFormattable, IUtf8SpanParsable<nint>;
    public partial struct SByte : IUtf8SpanFormattable, IUtf8SpanParsable<sbyte>;
    public partial struct Single : IUtf8SpanFormattable, IUtf8SpanParsable<float>;
    public partial struct UInt16 : IUtf8SpanFormattable, IUtf8SpanParsable<ushort>;
    public partial struct UInt32 : IUtf8SpanFormattable, IUtf8SpanParsable<uint>;
    public partial struct UInt64 : IUtf8SpanFormattable, IUtf8SpanParsable<ulong>;
    public partial struct UInt128 : IUtf8SpanFormattable, IUtf8SpanParsable<UInt128>;
    public partial struct UIntPtr : IUtf8SpanFormattable, IUtf8SpanParsable<nuint>;
    
    public partial struct DateOnly : IUtf8SpanFormattable, IUtf8SpanParsable<DateOnly>;
    public partial struct DateTime : IUtf8SpanFormattable, IUtf8SpanParsable<DateTime>;
    public partial struct DateTimeOffset : IUtf8SpanFormattable, IUtf8SpanParsable<DateTimeOffset>;
    public partial struct Guid : IUtf8SpanFormattable, IUtf8SpanParsable<Guid>;
    public partial struct TimeOnly : IUtf8SpanFormattable, IUtf8SpanParsable<TimeOnly>;
    public partial struct TimeSpan : IUtf8SpanFormattable, IUtf8SpanParsable<TimeSpan>;
}

namespace System.Numerics
{
    public partial struct Complex : IUtf8SpanFormattable, IUtf8SpanParsable<Complex>;
    public partial struct BigInteger : IUtf8SpanFormattable, IUtf8SpanParsable<BigInteger>;
}

namespace System.Runtime.InteropServices
{
    public partial struct NFloat : IUtf8SpanFormattable, IUtf8SpanParsable<NFloat>;
}

System.Enum, System.Rune, and System.Version all implement ISpanFormattable today. They could optionally implement IUtf8SpanFormattable as well.

We should ideally have System.Numerics.INumberBase<TSelf> implement both IUtf8SpanFormattable and IUtf8SpanParsable<TSelf>. Doing this would require a DIM that defers to the UTF-16 variant.

Additional Considerations

It may be desirable to provide some API that lets users know the longest potential format string so they can have a "fail safe" way of formatting their value. For many types this is a well-defined upper bound or can be trivially computed.

These APIs operate like ISpanFormattable and ISpanParsable and not like Utf8Formatter or Utf8Parser. That is, they fail if they encounter unrecognized or unsupported data where-as the latter instead treat it as effectively "end of data to parse". There are both pros and cons to this approach, but I believe that the latter's functionality is better expressed via a different API and one that could also apply to UTF-16.

This doesn't account for number parsing which would likely entail extending INumberBase<TSelf> with new UTF-8 APIs as well. If we expose such APIs, we'd also extend INumberBase<TSelf with the following methods (which would be DIM and defer to the UTF-16 variants):

static virtual TSelf Parse(ReadOnlySpan<byte> s, NumberStyles style, IFormatProvider? provider);
static virtual bool TryParse(ReadOnlySpan<byte> s, NumberStyles style, IFormatProvider? provider, [MaybeNullWhen(false)] out TSelf result);

Should we take ReadOnlySpan<byte> format or string format. There are pros/cons to each approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions