Skip to content

Half.ToString fails for highest precision #98841

Closed
@huoyaoyuan

Description

@huoyaoyuan

During implementation of BFloat16, I examined the parsing/formatting traits of other FP types and found HalfNumberBufferLength is incorrect.

internal const int DecimalNumberBufferLength = 29 + 1 + 1; // 29 for the longest input + 1 for rounding
internal const int DoubleNumberBufferLength = 767 + 1 + 1; // 767 for the longest input + 1 for rounding: 4.9406564584124654E-324
internal const int Int32NumberBufferLength = 10 + 1; // 10 for the longest input: 2,147,483,647
internal const int Int64NumberBufferLength = 19 + 1; // 19 for the longest input: 9,223,372,036,854,775,807
internal const int Int128NumberBufferLength = 39 + 1; // 39 for the longest input: 170,141,183,460,469,231,731,687,303,715,884,105,727
internal const int SingleNumberBufferLength = 112 + 1 + 1; // 112 for the longest input + 1 for rounding: 1.40129846E-45
internal const int HalfNumberBufferLength = 21; // 19 for the longest input + 1 for rounding (+1 for the null terminator)
internal const int UInt32NumberBufferLength = 10 + 1; // 10 for the longest input: 4,294,967,295
internal const int UInt64NumberBufferLength = 20 + 1; // 20 for the longest input: 18,446,744,073,709,551,615
internal const int UInt128NumberBufferLength = 39 + 1; // 39 for the longest input: 340,282,366,920,938,463,463,374,607,431,768,211,455

Calculation

The buffer length is decided by highest possible significant digits of the type. Such value occurs when setting BiasedExponent to 1 and TrailingSignificand to all bits set. For Half it's 0x07FF.
Let e = Abs(MinExponent) = ExponentBias - 1 and m = TrailingSignificandLength,
Significand of the value should be (2 - 2^-m), and exponent should be -e, so the value is (2 - 2^-m) * 2^-e
Convert the value to fractional: (2^(m+1) - 1) / (2^(e+m))
Multiply 5^(e+m) to both numerator and denominator to get decimal fraction: (2^(m+1) - 1)*5^(e+m) / 10^(e+m)
The numerator won't contain trailing 0, so the total significand digits of the fraction is the magnitude of its numerator:
(2^(m+1) - 1)*5^(e+m) ≈ 2^(m+1) * 5^(e+m) = 10^(m+1) * 5^(e-1)
So the total significand digits is m+1+Log10(5^(e-1)) = m+1+(e-1)*Log10(5) (ceiling)

I'm going to add comment for this together with BFloat16.

For double, e = 1022, m = 52, total digits = 766.6483744270753
For float, e = 126, m = 23, total digits = 111.37125054200236
For Half, e = 14, m = 10, total digits = 20.086610056368244

So the longest Half value has 21 significant digits.
Convert the value to double and validate, it's 0.000122010707855224609375.

Observation

BitConverter.UInt16BitsToHalf(0x0x07FF).ToString(99) throws IndexOutOfRangeException.

This also reproduces in .NET 6, probably present since the introduction of Half. We should fix and backport it, at least to 8.0.
The file has been refactored so it may require manual backport.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions