-
Couldn't load subscription status.
- Fork 5.2k
Minimal fix for Issue 620 #40535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal fix for Issue 620 #40535
Conversation
|
Fixes #620 |
|
|
Sample regression: Source code: Original: New version: |
Added test case: Runtime_620.cs Added lvForceLoadNormalize
bceadec to
43980f1
Compare
|
@CarolEidt @dotnet/jit-contrib PTAL |
|
runtime (Libraries Build Linux_musl x64 Debug) Failing after 14m due to #40416 |
src/coreclr/src/jit/morph.cpp
Outdated
| // | ||
| // For loads signed/unsigned differences do matter. | ||
| // | ||
| if (varTypeIsUnsigned(lvaTable[lclNum].lvType) == varTypeIsUnsigned(typ)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an else if? If not, I don't understand the "otherwise" comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will change this to an else if,
I reordered these two during a refactoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of additional methods had regressions:
21 (350.00% of base) : System.Private.CoreLib.dasm - System.Numerics.ConstantHelper:GetInt16WithAllBitsSet():short
19 (316.67% of base) : System.Private.CoreLib.dasm - System.Numerics.ConstantHelper:GetSByteWithAllBitsSet():byte
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally the BYTEMARK test IDEA also has a significant performace/codesize regression.
This existed before the change to "else if".
before: IDEA: Iterations/sec: 5542.04724 Index: 84.76411
after: IDEA: Iterations/sec: 3428.97638 Index: 52.44527
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No other methods in the benchmarks had code diffs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you have a look at the IDEA regression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it has many unsafe casts and is a awkward port of the C code:
The regression occurs in
private static void cipher_idea(byte[] xin, byte[] xout, int offset, char[] Z)
Where we inline the MUL and low16 methods, into the inner loop.
This forces x1, x4, t2 and t1 to become GT_LCL_FLD and thus memory accesses.
MUL(ref x1, Z[idx++]);
MUL(ref x4, Z[idx++]);
MUL(ref t2, Z[idx++]);
MUL(ref t1, Z[idx++]);
// These were macros in the original C code
/* #define low16(x) ((x) & 0x0FFFF) */
private static char low16(int x)
{
return (char)((x) & 0x0FFFF);
}
/* #define MUL(x,y) (x=mul(low16(x),y)) */
private static void MUL(ref char x, char y)
{
x = mul(low16(x), y);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My other fix does better with a much smaller regression:
before: IDEA: Iterations/sec: 5542.04724 Index: 84.76411
after: IDEA: Iterations/sec: 5244.63054 Index: 80.21521
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can change/fix the source code that could be another option:
Original:
/* #define MUL(x,y) (x=mul(low16(x),y)) */
private static void MUL(ref char x, char y)
{
x = mul(low16(x), y);
}
Modification:
/* #define MUL(x,y) (x=mul(low16(x),y)) */
private static char MUL(char x, char y)
{
return mul(low16(x), y);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried with the above code modification and with the change all version produce the same Assembly code
and the execution times are about the same as the original version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - and thanks for adding the test!
|
Thanks @briansull for the analysis. Would you be willing to open an issue to address the regression in .NET 6? I think it would make sense to add the optimization you'd proposed adding in your original fix, as well as possibly suggesting the improvement to the benchmark. |
* Minimal fix for Issue 620 Added test case: Runtime_620.cs Added lvForceLoadNormalize * Changed conditional to "else if"
Turns a GT_IND(GT_ADDR(GT_LCL_VAR) into a GT_LCL_FLD when necessary for correctness.
Added test case: Runtime_620.cs
Added lvForceLoadNormalize