Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Extend the DIV/MOD dividend into RDX:RAX only if needed #1241

Merged
merged 2 commits into from
May 11, 2016

Conversation

mikedn
Copy link

@mikedn mikedn commented Jul 15, 2015

When the divisor is a power of 2 GT_UMOD/GT_UDIV are emitted as INS_shr/INS_and. This codepath doesn't require the dividend to be in RDX:RAX.
This includes the special case where the divisor is 1 and the dividend can be moved direclty to the target register without going through RAX. This case doesn't appear to be ever hit, probably it is handled earlier.

Fixes #1207

There are quite a few other issues in this area (including the fact that RAX and RDX are reserved anyway) but fixing those would probably require a discussion first. For now a small fix. I can do more if the JIT team agrees.

@dnfclas
Copy link

dnfclas commented Jul 15, 2015

@mikedn, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR.

Thanks, DNFBOT;

@BruceForstall
Copy link

/cc @CarolEidt @briansull @cmckinsey

@cmckinsey
Copy link

@mikedn, thanks for the contribution. Someone will review the code soon. One thing to mention, whenever changes are made that impact the code-sequences generated by the JIT, we typically do 2 things: run it over our extensive regression test base, and examine asm (assembly) diffs and benchmark impact.

At the moment the capability for non-MS devs to do this is not available. We are planning to expose this capability in the future, but until then I'll probably have someone (probably me) run this change internally for you just to make sure things look good. So this is just a heads up might be a delay of a few days while this gets done.

@mikedn
Copy link
Author

mikedn commented Jul 24, 2015

but until then I'll probably have someone (probably me) run this change internally for you just to make sure things look good

Sounds like a lot of trouble for such a small change, especially that this may require additional work. Feel free to close this until the diff capability becomes available. I'd love to see such assembly diffs myself.

@cmckinsey
Copy link

@mikedn Mike, apologies for the delay. There are 396 functions with diffs in our regression base, all are good improvements: 2206 bytes of code size reduction. One nice improvement on hot path in mscorlib::System.Buffer:Memmove(long,long,long)

This change looks good to me. And agree about the remaining portions of code that are still using RAX:RDX when in fact any GPR would work. We would be happy to continue to accept PRs in this area. Perhaps we can rig up a crude way for you to inspect mscorlib/corefx asm diffs for small changes using COMPLUS_JitDisasm so you're not blocked on our turn-around time.

@mikedn
Copy link
Author

mikedn commented Oct 6, 2015

@cmckinsey Thank you for review!

The RAX:RDX issue is more or less caused by LSRA's getKillSetForNode special casing GT_MOD & co. It appears that this information could have been passed from lowering to LSRA via TreeNodeInfo but maybe it was decided that the additional space that would be required in TreeNodeInfo is not worth it. If that's indeed the case then the only remaining option is to move this optimization from codegen to lowering so the register allocation doesn't see the GT_MOD at all. Unfortunately that may be a bit problematic in the case of signed integers - it uses CDQ and this isn't directly representable in IR. Some input regarding what's the best approach (changing getKillSetForNode or moving the optimization) is needed here.

There are a few other issues related to the constant divisors that are subject to this optimization. One funny case is that this optimization works for negative divisors but fgMorphModByConst steals it because it checks isPow2(cons) instead of isPow2(abs(cons)) like lowering does. These are probably minor issue as the kind of constants that have such problems are less likely to show up as divisors.

@cmckinsey
Copy link

@CarolEidt for feedback on the LSRA interaction with GT_MOD.

Regarding the missing cases, I think there is a lot of opportunity for improvement handling DIV/MOD by constants, including MUL by constants. Agreed negative constants may not be as frequent. But we're considering whether to make deeper investments here, perhaps as aggressive as VC/LLVM compilers. Just speaking aloud, but you should still commit your PR.

@mikedn
Copy link
Author

mikedn commented Oct 18, 2015

I rebased the original commit and added 2 more commits which show/fix some of the DIV/MOD issues.

The first of the added commits is a trivial change in fgShouldUseMagicNumberDivide to prevent it from taking over the cases that involve negative divisors.

The second commit moves the DIV/MOD optimization from codegen to lowering for unsigned integers. This is trivial to do as it simply requires replacing GT_UDIV/GT_UMOD with GT_RSZ/GT_AND and adjusting the constant accordingly.

Ideally the same thing should be done for signed integers but I've yet to figure out how, I need to make a copy of the dividend and that seems to require a separate statement and local variable. If this can be done then LSRA doesn't need to be changed.

@ghost
Copy link

ghost commented Oct 18, 2015

@dotnet-bot, test this please.

The directed graph test was failing only in windows_debug job (Jenkins URL):

Regression

GC.Stress_Tests_DirectedGraph._Stress_Tests_DirectedGraph_cmd (from GC.XUnitWrapper)
Failing for the past 1 build (Since Failed#214 )
Took 10 min.
Stacktrace

MESSAGE:

cmdLine:d:\j\workspace\dotnet_coreclr\debug_windows_nt_prtest\bin\tests\Windows_NT.x64.debug\GC\Stress\Tests\DirectedGraph.cmd Timed Out Raw Output :d:\j\workspace\dotnet_coreclr\debug_windows_nt_prtest\bin\tests\Windows_NT.x64.debug\Reports\GC\Stress\Tests\DirectedGraph.output.txt To Run the test :Step 1. set Core_Root=d:\j\workspace\dotnet_coreclr\debug_windows_nt_prtest\bin\tests\Windows_NT.x64.debug\Tests\Core_Root Step 2. d:\j\workspace\dotnet_coreclr\debug_windows_nt_prtest\bin\tests\Windows_NT.x64.debug\GC\Stress\Tests\DirectedGraph.cmd
+++++++++++++++++++
STACK TRACE:
at GC.Stress_Tests_DirectedGraph._Stress_Tests_DirectedGraph_cmd() in d:\j\workspace\dotnet_coreclr\debug_windows_nt_prtest\tests\src\TestWrappers_x64_debug\GC\GC.XUnitWrapper.cs:line 1375

@cmckinsey
Copy link

The new changes to move this to lower LGTM. @sivarv and @CarolEidt to take a look and provide Mike some guidance on the interaction issue with LSRA.

@mikedn
Copy link
Author

mikedn commented Oct 24, 2015

@cmckinsey Thanks. Note that as things are now LSRA doesn't need to be modified. Since the unsigned case is handled in lowering only the signed case passes through LSRA and it needs rax and rdx because it uses cdq. It's possible to replace cdq with a mov and a sar to avoid this, IMO, restrictive register requirement but that's another story.

Also, I think it the code would simpler if the optimization could be done in one place. Unfortunately it seems that the signed case is difficult to handle in lower. Maybe it's just me being used to linear IR but this tree IR seems a bit counterproductive.

@cmckinsey
Copy link

@sivarv can you tell mikedn how to make a copy of the dividend in lower so we can have signed/unsigned div/mod expansion at the same place in the phase ordering.

@sivarv
Copy link
Member

sivarv commented Oct 27, 2015

@mikedn Lowering::TreeNodeInfoInit() - this routine is meant for specifying register requirements and not supposed to make IR modifications. Lowering::LowerNode() is the place where we should place all IR lowering transformations. For example, in this case Lowering::LowerNode() could add a case for unsigned DIV/MOD and lower the node appropriately. And in TreeNodeInfoInit() we just need to assert that we don't get to see an unsigned div/mod which is a power of 2.

Look at some of the routines that LowerNode() in turn is calling for examples of how to make IR modifications in Lower. In most cases we might not be required to create new nodes and can just do in-place manipulation of gentree nodes. Where in-place manipulation not possible, we create new gentree nodes, update tree node links and update linear order links (e.g. fgInsertLinearNodeBefore). Rationalize.cpp is another file where you will find some examples of IR manipulation.

@mikedn
Copy link
Author

mikedn commented Nov 16, 2015

Thanks for assistance. I've move both signed and unsigned div/mod pow2 optimizations in LowerNode. I also added some tests for the various div/mod by const cases as the existing tests don't seem to cover this area well.

I'm not 100% sure that it's a good idea to implement the signed case in lowering because the generated code is slightly larger - partly due to the use of a shift instead of cdq and partly due some unfortunate register moves that are introduced in some cases. That said, the code is IMO simpler now (because everything happens in one place) and it is arch independent. The fact that cdq is no longer used means registers other than rdx and rax can be used, that may be a good thing but I've yet to see a diff in mscorlib/corefx/roslyn that shows this.

@@ -499,6 +499,16 @@ void Lowering::LowerNode(GenTreePtr* ppTree, Compiler::fgWalkData* data)
LowerAdd(ppTree, data);
break;

case GT_UDIV:
case GT_UMOD:
LowerUDivUMod(*ppTree);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about LowerUnsignedDivOrMod()

@BruceForstall
Copy link

@cmckinsey @sivarv I haven't looked through diffs, internal test runs, etc. Any concerns here that need to be addressed before merging?

Presumably all the changes should be squashed.

@CarolEidt
Copy link

The changes look very good. I would like to see the commits squashed, but otherwise I think we should get this merged. I don't think anyone has taken these changes and done an internal test run, but I believe these changes are disjoint enough from desktop/coreclr differences that it should not be an issue.

@mikedn
Copy link
Author

mikedn commented May 10, 2016

Later today I'll squash this down to 2 commits, one for the tests and one for the actual change (it would be nice to run the tests before and after the actual change). I'll also update commit message asm examples as some have become stale.

Optimizing GT_DIV/GT_UDIV/GT_MOD/GT_UMOD by power of 2 in codegen is problematic because the xarch DIV instruction has special register requirements. By the time codegen decides to perform the optimization the rax and rdx registers have been already allocated by LSRA even though they're not always needed (as it happens in the case of unsigned division where CDQ isn't used).

Since the JIT can't represent a CDQ instruction in its IR an arithmetic shift (GT_RSH) has been instead to extract the dividend sign. xarch's SAR is larger than CDQ but it has the advantage that it doesn't require specific registers. Also, arithmetic shifts are available on architectures other than xarch.

Example: method "static int foo(int x) => x / 8;" is now compiled to
mov      eax, ecx
mov      edx, eax
sar      edx, 31
and      edx, 7
add      edx, eax
mov      eax, edx
sar      eax, 3

instead of
mov      eax, ecx
cdq
and      edx, 7
add      eax, edx
sar      eax, 3

As a side-effect of this change the optimization now also works when the divisor is too large to be contained. Previously this wasn't possible because the divisor constant needed to be modified during codegen but the constant was already loaded into a register.

Example: method "static ulong foo(ulong x) => x / 4294967296;" is now compiled to
mov      rax, rcx
shr      rax, 32

whereas before a DIV instruction was used.

This change also fixes an issue in fgShouldUseMagicNumberDivide. The optimization that is done in lower can handle negative power of 2 divisors but fgShouldUseMagicNumberDivide handled those cases because it didn't check the absolute value of the divisor.

Example: method "static int foo(int x) => return x / -2;" is now compiled to

mov      eax, ecx
mov      edx, eax
shr      edx, 31
add      edx, eax
sar      edx, 1
mov      eax, edx
neg      eax

instead of
mov      eax, 0x7FFFFFFF
imul     edx:eax, ecx
mov      eax, edx
sub      eax, ecx
mov      edx, eax
shr      edx, 31
add      eax, edx
@mikedn
Copy link
Author

mikedn commented May 11, 2016

Squashed and updated commit comment to include various generated assembly examples. Also removed a TODO from lowerarm64.cpp as it no longer applies:

// TODO-ARM64-CQ: Optimize a divide by power of 2 as we do for AMD64

@BruceForstall
Copy link

Fun.

11:29:05 ERROR: Step ‘Publish xUnit test result report’ aborted due to exception: 
11:29:05 
java.io.IOException: remote file operation failed: /Users/dotnet-bot/j/workspace/dotnet_coreclr/master/checked_osx_prtest at hudson.remoting.Channel@64c6a200:dci-macpro-10: hudson.remoting.ChannelClosedException: channel is already closed
11:29:05    at hudson.FilePath.act(FilePath.java:986)
11:29:05    at hudson.FilePath.act(FilePath.java:968)
11:29:05    at org.jenkinsci.plugins.xunit.XUnitProcessor.performTests(XUnitProcessor.java:145)
11:29:05    at org.jenkinsci.plugins.xunit.XUnitProcessor.performXUnit(XUnitProcessor.java:88)
11:29:05    at org.jenkinsci.plugins.xunit.XUnitPublisher.perform(XUnitPublisher.java:142)
11:29:05    at org.jenkinsci.plugins.xunit.XUnitPublisher.perform(XUnitPublisher.java:134)
11:29:05    at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
11:29:05    at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782)
11:29:05    at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:723)
11:29:05    at hudson.model.Build$BuildExecution.post2(Build.java:185)
11:29:05    at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:668)
11:29:05    at hudson.model.Run.execute(Run.java:1763)
11:29:05    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
11:29:05    at hudson.model.ResourceController.execute(ResourceController.java:98)
11:29:05    at hudson.model.Executor.run(Executor.java:410)
11:29:05 Caused by: hudson.remoting.ChannelClosedException: channel is already closed
11:29:05    at hudson.remoting.Channel.send(Channel.java:578)
11:29:05    at hudson.remoting.Request.call(Request.java:130)
11:29:05    at hudson.remoting.Channel.call(Channel.java:780)
11:29:05    at hudson.FilePath.act(FilePath.java:979)
11:29:05    ... 14 more
11:29:05 Caused by: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@7ca39482[name=dci-macpro-10]
11:29:05    at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
11:29:05    at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:628)
11:29:05    at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
11:29:05    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
11:29:05    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
11:29:05    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
11:29:05    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
11:29:05    at java.lang.Thread.run(Thread.java:745)
11:29:05 Caused by: java.io.IOException: Connection reset by peer
11:29:05    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
11:29:05    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
11:29:05    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
11:29:05    at sun.nio.ch.IOUtil.read(IOUtil.java:197)
11:29:05    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)
11:29:05    at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
11:29:05    at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
11:29:05    at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
11:29:05    ... 6 more

@BruceForstall
Copy link

@dotnet-bot test OSX x64 Checked Build and Test

@BruceForstall BruceForstall merged commit 0075514 into dotnet:master May 11, 2016
@omariom
Copy link

omariom commented May 11, 2016

👍

@sivarv
Copy link
Member

sivarv commented May 12, 2016

On 1st Mar'16, @pgavlin had run asm diffs on desktop with @mikedn 's changes and we have found the following code size regressions:

Division by const +2:

RyuJit:  Dividend is in RAX
Cdq
Sub eax, edx
Sar eax, 1    // result computed in eax

This is what generated with Mikedn’s changes:
Mov edx, ecx    // dividend is in ecx
Shr edx, 31
Add edx, ecx
Sar edx, 1       // result computed in edx

Similarly, for computing N % Pow(2)

RyuJIT generates: for N % 8
Cdq                         // dividend in rax
And edx, 7
Add eax, edx
And eax, 7
Sub eax, edx        // result in eax
This is what generated with MikeDn’s changes:
Mov r8d, edx             // dividend in edx
Sar r8d, 31
And r8d, 7
Add r8d, edx
And r8d, -8
Sub edx, r8d            // result in edx

As you can see from above that these code changes generate one instruction more than what RyuJIT previously generates.

@mikedn - Just wondering whether you have reworked your fix since then. If the code size regression exists for the above cases, we might want to track with a Git issue.

@mikedn
Copy link
Author

mikedn commented May 12, 2016

Just wondering whether you have reworked your fix since then.

Nope. I mentioned this before code review, the code is slightly larger because it uses a move and an arithmetic shift instead of cdq. The move can show up even if cdq is used since cdq requires specific registers but the arithmetic shift is always 2 bytes larger than cdq.

@BruceForstall
Copy link

@mikedn Unfortunately, I discovered that this change led to a JIT assert in some internal testing (namely, ngen of System.Windows.Forms.dll with an x64 RyuJIT). I've reverted this change (on the TFS side of the TFS<->GitHub mirror; it will propagate over to GitHub soon). Someone internal to Microsoft will need to generate a repro case for you, or else take your change and debug the problem.

The assert:

Assert failure(PID 33656 [0x00008378], Thread: 17792 [0x4580]): Assertion failed 'addr->OperIsAddrMode() || (addr->IsCnsIntOrI() && addr->isContained()) || !addr->isContained()' in 'System.Windows.Forms.CheckedListBox:OnDrawItem(ref):this' (IL size 1216)

    File: e:\dd\projectk\src\ndp\clr\src\jit\emitxarch.cpp Line: 2698
    Image: C:\Windows\Microsoft.NET\Framework64\v4.0.rb1605209\mscorsvw.exe

The tree:

***** BB41, stmt 82 (embedded)
     (  6,  8) [003723] ------------             *  stmtExpr  void  (embedded) (IL 0x109...  ???)
N1045 (  3,  2) [000115] ------------             |        /--*  lclVar    ref    V00 this         u:2 REG rcx $80
N1047 (  1,  4) [002642] ------------             |        +--*  const     long   344 field offset Fseq[idealCheckSize] REG NA $10b
N1049 (  4,  6) [002643] -------N----             |     /--*  +         byref  REG NA $356
N1051 (  6,  8) [000116] ----GO------             |  /--*  indir     int    REG rcx <l:$685, c:$2ef>
N1053 (  6,  8) [003669] DA--GO------             \--*  st.lclVar int    V172 cse1         rcx REG rcx RV

During codegen:

Generating BB41, stmt 71        Holding variables: [rbx rsi rdi r12-r15]

Generating: N1043 (  3,  2) [000114] ------------             *  lclVar    int    V05 loc3         u:3 r12 (last use) REG r12 RV $31a
Generating: N1045 (  3,  2) [000115] ------------             *  lclVar    ref    V00 this         u:2 REG rcx $80
IN00db:        mov      rcx, gword ptr [V00 rbp+10H]
                            GC regs: 00000040 {rsi} => 00000042 {rcx rsi}
Generating: N1047 (  1,  4) [002642] ------------             *  const     long   344 field offset Fseq[idealCheckSize] REG NA $10b
Generating: N1049 (  4,  6) [002643] -------N----             *  +         byref  REG NA $356
Generating: N1051 (  6,  8) [000116] ----GO------             *  indir     int    REG rcx <l:$685, c:$2ef>
… assert …

@mikedn
Copy link
Author

mikedn commented May 18, 2016

I'll try to get the relevant piece of code from System.Windows.Forms and compile it using coreclr. Anyway, that assert seems familiar, I may have an idea about what's going on.

@mikedn
Copy link
Author

mikedn commented May 18, 2016

The assert can be reproduced using the following program (distilled from CheckedListBox:OnDrawItem):

class Program
{
    int i;

    [MethodImpl(MethodImplOptions.NoInlining)]
    int Test(int h)
    {
        int x = h + 4;
        int f = (x - i) / 2;
        if (i > h)
            return 0;
        return f;
    }

    static void Main()
    {
        new Program().Test(42);
    }
}

CSE introduces an embedded statement for i. When LowerSignedDivOrMod uses fgInsertEmbeddedFormTemp a new top level statement is created and the embedded statement moves to it. But the embedded statement hasn't been lowered yet and now it will never be lowered because it moves before the current statement.

LowerArrElem faced a similar situation and solved it by manually lowering the embedded statement contained in the new top level statement. Not exactly pretty but effective.

dotnet-bot pushed a commit to dotnet-bot/coreclr that referenced this pull request May 18, 2016
…Forms.dll

====================
007551: Merge pull request dotnet#1241 from mikedn/modopt

Extend the DIV/MOD dividend into RDX:RAX only if needed
====================

Assert failure(PID 33656 [0x00008378], Thread: 17792 [0x4580]): Assertion failed 'addr->OperIsAddrMode() || (addr->IsCnsIntOrI() && addr->isContained()) || !addr->isContained()' in 'System.Windows.Forms.CheckedListBox:OnDrawItem(ref):this' (IL size 1216)

    File: e:\dd\projectk\src\ndp\clr\src\jit\emitxarch.cpp Line: 2698
    Image: C:\Windows\Microsoft.NET\Framework64\v4.0.rb1605209\mscorsvw.exe

The tree:

***** BB41, stmt 82 (embedded)
     (  6,  8) [003723] ------------             *  stmtExpr  void  (embedded) (IL 0x109...  ???)
N1045 (  3,  2) [000115] ------------             |        /--*  lclVar    ref    V00 this         u:2 REG rcx $80
N1047 (  1,  4) [002642] ------------             |        +--*  const     long   344 field offset Fseq[idealCheckSize] REG NA $10b
N1049 (  4,  6) [002643] -------N----             |     /--*  +         byref  REG NA $356
N1051 (  6,  8) [000116] ----GO------             |  /--*  indir     int    REG rcx <l:$685, c:$2ef>
N1053 (  6,  8) [003669] DA--GO------             \--*  st.lclVar int    V172 cse1         rcx REG rcx RV

During codegen:

Generating BB41, stmt 71        Holding variables: [rbx rsi rdi r12-r15]

Generating: N1043 (  3,  2) [000114] ------------             *  lclVar    int    V05 loc3         u:3 r12 (last use) REG r12 RV $31a
Generating: N1045 (  3,  2) [000115] ------------             *  lclVar    ref    V00 this         u:2 REG rcx $80
IN00db:        mov      rcx, gword ptr [V00 rbp+10H]
                            GC regs: 00000040 {rsi} => 00000042 {rcx rsi}
Generating: N1047 (  1,  4) [002642] ------------             *  const     long   344 field offset Fseq[idealCheckSize] REG NA $10b
Generating: N1049 (  4,  6) [002643] -------N----             *  +         byref  REG NA $356
Generating: N1051 (  6,  8) [000116] ----GO------             *  indir     int    REG rcx <l:$685, c:$2ef>
... assert ...

(This is rollback dotnet#2: the TFS/GitHub mirror unfortunately undid rollback CS#1605814 with CS#1605840. This change should avoid that problem.)

[tfs-changeset: 1605917]
@mikedn
Copy link
Author

mikedn commented Jun 10, 2016

So, what do we do with this? Can this PR be reopened or should I create a new PR with the proper fix?

At a minimum the associated issue should be reopened.

@RussKeldorph
Copy link

I reopened #1207. I would probably just open a new PR and mention this one for continuity.

@ghost
Copy link

ghost commented Jun 10, 2016

we typically do 2 things: run it over our extensive regression test base, and examine asm (assembly) diffs and benchmark impact. At the moment the capability for non-MS devs to do this is not available. We are planning to expose this capability in the future

@cmckinsey, are there still plans on making those diff tools available for community?

@RussKeldorph
Copy link

@russellhadley

@russellhadley
Copy link

@jasonwilliams200OK, @mikedn, I've put some tools out at https://github.com/dotnet/jitutils that allow diffing of jit dasm to be done across multiple platforms. Take a look. Fair warning these are new so there could be some issues - bring me in early if something goes wrong.

@mikedn mikedn deleted the modopt branch June 18, 2016 06:00
@mikedn mikedn restored the modopt branch June 18, 2016 06:23
CarolEidt added a commit to CarolEidt/coreclr that referenced this pull request Jan 9, 2020
Anipik pushed a commit that referenced this pull request Feb 13, 2020
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Extend the DIV/MOD dividend into RDX:RAX only if needed

Commit migrated from dotnet/coreclr@0075514
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…ystem.Windows.Forms.dll

====================
007551: Merge pull request dotnet/coreclr#1241 from mikedn/modopt

Extend the DIV/MOD dividend into RDX:RAX only if needed
====================

Assert failure(PID 33656 [0x00008378], Thread: 17792 [0x4580]): Assertion failed 'addr->OperIsAddrMode() || (addr->IsCnsIntOrI() && addr->isContained()) || !addr->isContained()' in 'System.Windows.Forms.CheckedListBox:OnDrawItem(ref):this' (IL size 1216)

    File: e:\dd\projectk\src\ndp\clr\src\jit\emitxarch.cpp Line: 2698
    Image: C:\Windows\Microsoft.NET\Framework64\v4.0.rb1605209\mscorsvw.exe

The tree:

***** BB41, stmt 82 (embedded)
     (  6,  8) [003723] ------------             *  stmtExpr  void  (embedded) (IL 0x109...  ???)
N1045 (  3,  2) [000115] ------------             |        /--*  lclVar    ref    V00 this         u:2 REG rcx $80
N1047 (  1,  4) [002642] ------------             |        +--*  const     long   344 field offset Fseq[idealCheckSize] REG NA $10b
N1049 (  4,  6) [002643] -------N----             |     /--*  +         byref  REG NA $356
N1051 (  6,  8) [000116] ----GO------             |  /--*  indir     int    REG rcx <l:$685, c:$2ef>
N1053 (  6,  8) [003669] DA--GO------             \--*  st.lclVar int    V172 cse1         rcx REG rcx RV

During codegen:

Generating BB41, stmt 71        Holding variables: [rbx rsi rdi r12-r15]

Generating: N1043 (  3,  2) [000114] ------------             *  lclVar    int    V05 loc3         u:3 r12 (last use) REG r12 RV $31a
Generating: N1045 (  3,  2) [000115] ------------             *  lclVar    ref    V00 this         u:2 REG rcx $80
IN00db:        mov      rcx, gword ptr [V00 rbp+10H]
                            GC regs: 00000040 {rsi} => 00000042 {rcx rsi}
Generating: N1047 (  1,  4) [002642] ------------             *  const     long   344 field offset Fseq[idealCheckSize] REG NA $10b
Generating: N1049 (  4,  6) [002643] -------N----             *  +         byref  REG NA $356
Generating: N1051 (  6,  8) [000116] ----GO------             *  indir     int    REG rcx <l:$685, c:$2ef>
... assert ...

(This is rollback dotnet/coreclr#2: the TFS/GitHub mirror unfortunately undid rollback CSdotnet/coreclr#1605814 with CSdotnet/coreclr#1605840. This change should avoid that problem.)

[tfs-changeset: 1605917]


Commit migrated from dotnet/coreclr@ce8e7e3
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RyuJIT: dangling instructions after converting modulus to bitwise AND