Description
Background and Motivation
We currently support UTF-16 based formatting and parsing and even expose common interfaces through which any developer can declare their types as supporting the same. However, we have no such support for the same around UTF-8.
With UTF-8 being ever more prevalent for various scenarios, it would be ideal if similar interfaces could be exposed so users can express that their own types support the functionality.
As such, I propose we expose two new interfaces that support parsing/formatting types using UTF-8. These interfaces would only support Span
today and as we do not have a corresponding Utf8String
type that would make exposing IUtf8Formattable
or IUtf8Parsable
viable today. We could express those as byte[]
, but that is "less ideal" and blocks us from supporting any future utf8 string type.
Proposed API
namespace System;
public interface IUtf8SpanFormattable
{
bool TryFormat(Span<byte> destination, out int bytesWritten, ReadOnlySpan<byte> format, IFormatProvider? provider);
}
public interface IUtf8SpanParsable<TSelf>
where TSelf : IUtf8SpanParsable<TSelf>?
{
static abstract TSelf Parse(ReadOnlySpan<byte> s, IFormatProvider? provider);
static abstract bool TryParse(ReadOnlySpan<byte> s, IFormatProvider? provider, [MaybeNullWhen(returnValue: false)] out TSelf result);
}
Initial types that will implement the interface
namespace System
{
public partial struct Byte : IUtf8SpanFormattable, IUtf8SpanParsable<byte>;
public partial struct Char : IUtf8SpanFormattable, IUtf8SpanParsable<char>;
public partial struct Decimal : IUtf8SpanFormattable, IUtf8SpanParsable<decimal>;
public partial struct Double : IUtf8SpanFormattable, IUtf8SpanParsable<double>;
public partial struct Half : IUtf8SpanFormattable, IUtf8SpanParsable<Half>;
public partial struct Int16 : IUtf8SpanFormattable, IUtf8SpanParsable<short>;
public partial struct Int32 : IUtf8SpanFormattable, IUtf8SpanParsable<int>;
public partial struct Int64 : IUtf8SpanFormattable, IUtf8SpanParsable<long>;
public partial struct Int128 : IUtf8SpanFormattable, IUtf8SpanParsable<Int128>;
public partial struct IntPtr : IUtf8SpanFormattable, IUtf8SpanParsable<nint>;
public partial struct SByte : IUtf8SpanFormattable, IUtf8SpanParsable<sbyte>;
public partial struct Single : IUtf8SpanFormattable, IUtf8SpanParsable<float>;
public partial struct UInt16 : IUtf8SpanFormattable, IUtf8SpanParsable<ushort>;
public partial struct UInt32 : IUtf8SpanFormattable, IUtf8SpanParsable<uint>;
public partial struct UInt64 : IUtf8SpanFormattable, IUtf8SpanParsable<ulong>;
public partial struct UInt128 : IUtf8SpanFormattable, IUtf8SpanParsable<UInt128>;
public partial struct UIntPtr : IUtf8SpanFormattable, IUtf8SpanParsable<nuint>;
public partial struct DateOnly : IUtf8SpanFormattable, IUtf8SpanParsable<DateOnly>;
public partial struct DateTime : IUtf8SpanFormattable, IUtf8SpanParsable<DateTime>;
public partial struct DateTimeOffset : IUtf8SpanFormattable, IUtf8SpanParsable<DateTimeOffset>;
public partial struct Guid : IUtf8SpanFormattable, IUtf8SpanParsable<Guid>;
public partial struct TimeOnly : IUtf8SpanFormattable, IUtf8SpanParsable<TimeOnly>;
public partial struct TimeSpan : IUtf8SpanFormattable, IUtf8SpanParsable<TimeSpan>;
}
namespace System.Numerics
{
public partial struct Complex : IUtf8SpanFormattable, IUtf8SpanParsable<Complex>;
public partial struct BigInteger : IUtf8SpanFormattable, IUtf8SpanParsable<BigInteger>;
}
namespace System.Runtime.InteropServices
{
public partial struct NFloat : IUtf8SpanFormattable, IUtf8SpanParsable<NFloat>;
}
System.Enum
, System.Rune
, and System.Version
all implement ISpanFormattable
today. They could optionally implement IUtf8SpanFormattable
as well.
We should ideally have System.Numerics.INumberBase<TSelf>
implement both IUtf8SpanFormattable
and IUtf8SpanParsable<TSelf>
. Doing this would require a DIM that defers to the UTF-16 variant.
Additional Considerations
It may be desirable to provide some API that lets users know the longest potential format string so they can have a "fail safe" way of formatting their value. For many types this is a well-defined upper bound or can be trivially computed.
These APIs operate like ISpanFormattable
and ISpanParsable
and not like Utf8Formatter
or Utf8Parser
. That is, they fail if they encounter unrecognized or unsupported data where-as the latter instead treat it as effectively "end of data to parse". There are both pros and cons to this approach, but I believe that the latter's functionality is better expressed via a different API and one that could also apply to UTF-16.
This doesn't account for number parsing
which would likely entail extending INumberBase<TSelf>
with new UTF-8 APIs as well. If we expose such APIs, we'd also extend INumberBase<TSelf
with the following methods (which would be DIM and defer to the UTF-16 variants):
static virtual TSelf Parse(ReadOnlySpan<byte> s, NumberStyles style, IFormatProvider? provider);
static virtual bool TryParse(ReadOnlySpan<byte> s, NumberStyles style, IFormatProvider? provider, [MaybeNullWhen(false)] out TSelf result);
Should we take ReadOnlySpan<byte> format
or string format
. There are pros/cons to each approach.