-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Please add hw instrinct support for divide on intel platforms.
Rationale and Usage
The intel instruction div is a bit special since 32bit divide takes a 64 dividend and 32bit divisor and returns both quotient and remainder as result.
It would be very useful, not only for speeding up Math.DivRem (https://github.com/dotnet/coreclr/issues/757 ) but especially for several decimal operations which would make it more feasible to move decimal code to C# (and maybe several number functions).
Significant speedup using div in C++ provided significant speedups in https://github.com/dotnet/coreclr/issues/10642)
Example:
The following code perfomes division of a 96bit unsigned number by a 32bit number.
Updating the 96bit number in-pace and returning the remainder
The example is simplified from actual code and contains only the "worst case" execution path.
public uint Div96By32(uint [] num, uint denominator)
{
if (System.Runtime.Intrinsics.X86.X86Base.IsSupported)
{
uint remainder;
num[2] = System.Runtime.Intrinsics.X86.X86Base.DivMod(0, num[2], denominator, out remainder);
num[1] = System.Runtime.Intrinsics.X86.X86Base.DivMod(remainder, num[1], denominator, out remainder);
num[0] = System.Runtime.Intrinsics.X86.X86Base.DivMod(remainder, num[0], denominator, out remainder);
return remainder;
}
else { ... }
}
It performs the same logic as in https://github.com/dotnet/corert/blob/d82d460a8530a57e4915060be37fb42c7a661f48/src/System.Private.CoreLib/shared/System/Decimal.DecCalc.cs#L228
That code has 2 64bit divides (much slower, especially in 32bit mode), and several multiplications (workarounds instead of using '%' which would double the executed div instructions currently).
Proposed API
The API might need some design, but I think it makes sense to keep it similar to Math.DivRem
as below.
namespace System.Runtime.Intrinsics.X86
{
public static class X86Base
{
/// Perform unsigned division of 64 bit (hi, low) by 32bit divisor
/// returning quotient, and returning remainder as an out parameter
public uint DivRem(uint hi, uint low, uint divisor, out uint remainder);
/// Perform unsigned division of 64 bit (hi, low) by 32 bit divisor
/// returning quotient, and returning remainder as an out parameter
public int DivRem(int hi, uint low, int divisor, out int remainder);
[Intrinsic]
public static class X64
{
/// Perform unsigned division of 128 bit (hi, low) by 64bit divisor
/// returning quotient, and returning remainder as an out parameter
public ulong DivRem(ulong hi, ulong low, ulong divisor, out ulong remainder);
/// Perform unsigned division of 128 bit (hi, low) by 64 bit divisor
/// returning quotient, and returning remainder as an out parameter
public long DivRem(long hi, ulong low, long divisor, out long remainder);
}
}
Related functionality which would make sense to add.
It could use the instrinct for god performance on x86 platforms.
class System.Math
{
// unsigned overload of public static int DivRem (int a, int b, out int result);
public static uint DivRem (uint a, uint b, out uint result);
// unsigned overload of public static int DivRem (int a, int b, out int result);
public static ulong DivRem (ulong a, ulong b, out ulong result);
}
Details
Full description of instruction can be found in https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf
Updates
- 2018-09-04 I've tried to add signed divide, and tried to restructure the post.
I am unsure about which versions would be useful so I am adding both that came to my mind. - 2020-06-15
- updated to use new X86Base https://github.com/dotnet/runtime/blob/master/src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/X86Base.cs
- Removed 32 by 32 bit, and 64 by 64 bit division since it is better to add the missing overloads to Math and recognize them as instrincts