Description
Issue
.NET currently depends on the underlying platform to perform floating-point to integer
conversions. For most scenarios, this behavior is correct and IEEE 754 compliant.
However, for cases of overflow or cases where the underlying platform does not directly support a given conversion, each platform can return different results.
This differing behavior can lead to downstream issues when dealing with such conversions and lead to hard to diagnose bugs.
Proposal
We should follow in the footsteps of other languages and standardize our conversion behavior here to be consistent on all platforms.
In particular, we should standardize our behavior to be consistent with ARM64, Rust, and WASM and to saturate the conversions rather than some other behavior such as using a sentinel value (x86/x64). NaN
will likewise be specially handled and overflow to 0
.
What's Required
For ARM32 and ARM64, there will be effectively no impact and the codegen will remain identical to what it is today as the underlying conversions already perform saturation on overflow. The exception is for conversions to small integers where there is no direct underlying platform support and so manual clamping of the input to the appropriate range will be required before converting.
For x86 and x64, the platform only directly supports converting to int32 with additional support for converting to int64 on x64. Conversions to small integer types and unsigned types will require the same manual clamping as ARM32/ARM64. Additionally, since the underlying platform returns a sentinel value (0x80000000
or 0x80000000_00000000
) we will need to have logic to handle this.
Perf Impact
For the conversions that need additional handling, there is going to be additional cost and measurable perf impact.
For a Skylake processor simple conversions would go from ~7 cycles to approx. between 19 and 24 cycles, or a theoretical "worst case" taking ~3.4x more time.
The measured impact by Rust was much lower and was closer to a 0.8x regression in real world scenarios involving heavy floating-point to integer conversions (in particular an "RBG JPEG encoding" algorithm).
Additional Considerations
We should additionally expose a set of unsafe APIs that perform the "platform specific" conversions. These would allow developers who understand the xplat differences and need the perf to still get the underlying behavior.
It is likely not worth waiting for additional asks on this as other platforms, such as Rust have already exposed such APIs on their end. Additionally, we will likely require them for our own code in the BCL, including for Vector<T>
, Vector64<T>
, Vector128<T>
, and Vector256<T>
. This will likewise allow the mentioned vector types to be consistent by default and provide fast fallbacks where appropriate.
I would propose these are exposed as ConvertTo*Unsafe
, matching the existing ConvertTo*
algorithms we have in several other locations.
This would result in the following additions (the commented out APIs already exist or are approved and will be implemented in .NET 7):
namespace System
{
public struct Double
{
public static int ConvertToInt32(double value);
public static int ConvertToInt32Unsafe(double value);
public static long ConvertToInt64(double value);
public static long ConvertToInt64Unsafe(double value);
public static uint ConvertToUInt32(double value);
public static uint ConvertToUInt32Unsafe(double value);
public static ulong ConvertToUInt64(double value);
public static ulong ConvertToUInt64Unsafe(double value);
}
public struct Half
{
public static int ConvertToInt32(Half value);
public static int ConvertToInt32Unsafe(Half value);
public static long ConvertToInt64(Half value);
public static long ConvertToInt64Unsafe(Half value);
public static uint ConvertToUInt32(Half value);
public static uint ConvertToUInt32Unsafe(Half value);
public static ulong ConvertToUInt64(Half value);
public static ulong ConvertToUInt64Unsafe(Half value);
}
public struct Single
{
public static int ConvertToInt32(float value);
public static int ConvertToInt32Unsafe(float value);
public static long ConvertToInt64(float value);
public static long ConvertToInt64Unsafe(float value);
public static uint ConvertToUInt32(float value);
public static uint ConvertToUInt32Unsafe(float value);
public static ulong ConvertToUInt64(float value);
public static ulong ConvertToUInt64Unsafe(float value);
}
}
namespace System.Numerics
{
public static class Vector
{
// public static Vector<int> ConvertToInt32(Vector<float> value);
public static Vector<int> ConvertToInt32Unsafe(Vector<float> value);
// public static Vector<long> ConvertToInt64(Vector<double> value);
public static Vector<long> ConvertToInt64Unsafe(Vector<double> value);
// public static Vector<uint> ConvertToUInt32(Vector<float> value);
public static Vector<uint> ConvertToUInt32Unsafe(Vector<float> value);
// public static Vector<ulong> ConvertToUInt64(Vector<double> value);
public static Vector<ulong> ConvertToUInt64Unsafe(Vector<double> value);
}
}
namespace System.Runtime.Intrinsics
{
public static class Vector64
{
// public static Vector64<int> ConvertToInt32(Vector64<float> value);
public static Vector64<int> ConvertToInt32Unsafe(Vector64<float> value);
// public static Vector64<long> ConvertToInt64(Vector64<double> value);
public static Vector64<long> ConvertToInt64Unsafe(Vector64<double> value);
// public static Vector64<uint> ConvertToUInt32(Vector64<float> value);
public static Vector64<uint> ConvertToUInt32Unsafe(Vector64<float> value);
// public static Vector64<ulong> ConvertToUInt64(Vector64<double> value);
public static Vector64<ulong> ConvertToUInt64Unsafe(Vector64<double> value);
}
public static class Vector128
{
// public static Vector128<int> ConvertToInt32(Vector128<float> value);
public static Vector128<int> ConvertToInt32Unsafe(Vector128<float> value);
// public static Vector128<long> ConvertToInt64(Vector128<double> value);
public static Vector128<long> ConvertToInt64Unsafe(Vector128<double> value);
// public static Vector128<uint> ConvertToUInt32(Vector128<float> value);
public static Vector128<uint> ConvertToUInt32Unsafe(Vector128<float> value);
// public static Vector128<ulong> ConvertToUInt64(Vector128<double> value);
public static Vector128<ulong> ConvertToUInt64Unsafe(Vector128<double> value);
}
public static class Vector256
{
// public static Vector256<int> ConvertToInt32(Vector256<float> value);
public static Vector256<int> ConvertToInt32Unsafe(Vector256<float> value);
// public static Vector256<long> ConvertToInt64(Vector256<double> value);
public static Vector256<long> ConvertToInt64Unsafe(Vector256<double> value);
// public static Vector256<uint> ConvertToUInt32(Vector256<float> value);
public static Vector256<uint> ConvertToUInt32Unsafe(Vector256<float> value);
// public static Vector256<ulong> ConvertToUInt64(Vector256<double> value);
public static Vector256<ulong> ConvertToUInt64Unsafe(Vector256<double> value);
}
}