Skip to content

Updating to latest #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 104 commits into from
Jan 16, 2017
Merged

Updating to latest #1

merged 104 commits into from
Jan 16, 2017

Conversation

manofstick
Copy link
Owner

...Hopefully this is in my repository...

RussKeldorph and others added 30 commits January 6, 2017 13:14
This test runs out of memory on x86 when its huge expressions combine with
STRESS_CLONE_EXPR.  In this case, the "OptimizationSensitive" label isn't ideal,
but the meaning is what we need: don't run this under JitStress modes.
This is disabled at build time for x86, which means that if x64-built tests are
run on other architectures, e.g. ARM or ARM64, this change won't take effect.
Remove lazy initialization of Task.CompletedTask
There are two two kinds of transition penalties:
1.Transition from 256-bit AVX code to 128-bit legacy SSE code.
2.Transition from 128-bit legacy SSE code to either 128 or
256-bit AVX code. This only happens if there was a preceding
AVX256->legacy SSE transition penalty.

The primary goal is to remove the #1 AVX to SSE transition penalty.

Added two emitter flags: contains256bitAVXInstruction indicates that
if the JIT method contains 256-bit AVX code, containsAVXInstruction
indicates that if the method contains 128-bit or 256-bit AVX code.

Issue VZEROUPPER in prolog if the method contains 128-bit or 256-bit
AVX code, to avoid legacy SSE to AVX transition penalty, this could
happen for reverse pinvoke situation. Issue VZEROUPPER in epilog
if the method contains 256-bit AVX code, to avoid AVX to legacy
SSE transition penalty.

To limite code size increase impact, we only issue VZEROUPPER before
PInvoke call on user defined function if the JIT method contains
256-bit AVX code, assuming user defined function contains legacy
SSE code. No need to issue VZEROUPPER after PInvoke call because #2
SSE to AVX transition penalty won't happen since #1 AVX to SSE
transition has been taken care of before the PInvoke call.

We measured ~3% to 1% performance gain on TechEmPower plaintext and
verified those VTune AVX/SSE events: OTHER_ASSISTS.AVX_TO_SSE and
OTHER_ASSISTS.SSE_TO_AVE have been reduced to 0.

Fix #7240

move setContainsAVX flags to lower, refactor to a smaller method

refactor, fix typo in comments

fix format error
Disable hugeexpr1 test under JitStress modes
- Re-enable the default build archiving in jitdiff scenarios so that the
  windows _bld job will publish correct build artifacts for the
  ubuntu_jitdiff_tst job to consume.
- Use relative paths in the generated scripts to avoid drive letters in
  bash scripts.
- Fix exit code reporting in bash wrapper scripts.
- Upgrade jit-dasm version to use version with improved error handling
- Upgrade jit-analyze version to use same dependencies as jit-dasm
This change moves the implementation of live variable analysis from a
single function into a class in which the per-block portion of the
algorithm is contained in its own function. There is no functional
change.
Enable HighEntropyVA in mscorlib.dll
Encapsulate live var analysis in its own class.
Update PInvoke inlining restrictions for CoreRT
Add a variant that uses a class and out parameter instead of returning
a struct by value. This variant is similar to version 3 from the benchmarks
games site, but with validation added and parallelism removed.

See related analysis in #8837. According to xunit-perf runs, this version's
performance is improved (~10%) by enabling the model inlining policy. When
the model policy is enabled the inliner will inline the two outermost calls
to `ChildTreeNodes` in the innermlost loop.

Also, make sure the new and the original version to build the same way in
release and debug.
The most common uses of CreateLinkedTokenSource involve passing in one or two tokens, but even for these cases we end up allocating an array of registrations.  This commit fixes that.
* Default gcAllowVeryLargeObjects to true

* Update tests for gcAllowVeryLargeObjects

These 2 tests appear to rely on gcAllowVeryLargeObjects being false in that they expect an exception to be thrown when allocating a very large array.

Increase the number of array elements so that an OutOfMemoryException is always thown no matter what value gcAllowVeryLargeObjects has. 81*98*58*36*74*4 is 4906065024 which is larger than 2^32, the theoretical array length limit.
Enable PInvoke analyzer for S.P.CoreLib.dll
On non-hardware accelerated platforms, reduce the inner iteration count in
SeekUnroll. This test was timing out in x86 release builds because the
iteration count was too high.
Currently, only checked runs can be triggered for x86. This adds a trigger
for release runs as well.
If only the live-out set for a block changes during live variable
analysis, the LVA algorithm does not need to be re-run. As per the
dataflow equations for LVA:

    liveOut(block) = union(liveIn(s) for all s in successors(block))

Thus, unless a change to the live-out set for a block propagates through
to a change to the live-in set for that block, it will not affect the
live-in/out sets of any of its predecessors and liveness need not be
re-run.
dotnet-bot and others added 28 commits January 11, 2017 20:55
Remove AVX/SSE transition penalties
Fix initialization of resolution sets
Add optional argument to skip unmount for rootfs
The LastConsumedNode used in genCheckConsumeNode was not initialized for arm64.
Fixing this exposed several places where nodes were being consumed twice or in the wrong order.
In addition, since GT_PUTARG_STK doesn't define a register, its dstCount needs to be zero. This is enabled by checking IsValue() instead of type of TYP_VOID for the default case of TreeNodeInfoInit. This was missed for both arm and arm64.
The JIT can't eliminate range checks if it can't "see" Length and uses loop cloning which generates a lot of code. Even in cases where not all range checks can be eliminated and loop cloning is used anyway it's still preferable to have fewer range checks.

For example, SortExceptions is ~140 bytes shorter after this change, despite the fact that loop cloning is still being used.
Сhange children order in GenTreeBoundsChk. Fix #8077
Fix putArgStk dstCount and ConsumeReg errors
Update CoreClr to beta-24912-04 (master)
The inliner's code-size estimating state machine keeps count of
matches, but the count was only used in an assert that checked
that the count did not overflow.

The assert fired when jit stress drove the inliner to evaluate a
huge method as a potential inline candidate and the count reached
the overflow value.

This change removes the counting and the related assert.

Closes #8932.
JIT: remove match accounting from inliner state machine
CoreFX is going to be merging changes soon that will break how we
consume them to do our testing. To give us time to react, we'll fix
the version of the repository we build to a commit before the
changes. We'll also download artifacts from a saved build (produced
before the change took place) so the layout is as we expect.

The issue tracking cleaning this up is #8937
Stop build from leaving localpkg cache in src
This is a result of attempt to bring up CoreCLR on ARM64 Android.
The bring up is on hold now, but I want to check in the changes
that added ARM64 asm helpers and fixed general Linux ARM64 issues.
…or-ci

Use a fixed version of CoreFX for testing
…upport (#8939)

* Check if xsave is enabled by OS before calling xgetbv in XmmYmmStateSupport. Fix #8903

* Add ebx to clobbered registers.
Thread abort was not allowed in finally blocks, but the same logic was being applied to thread interrupt as well. There is nothing special about thread interrupt that requires it to not work in finally blocks.
These cases are actually possible because Windows APIs are inconsistent in their behavior when invalid handles are concerned. Depending on the invalid handle's value, a multi-wait can error with ERROR_INVALID_HANDLE, and a single-wait on an invalid handle could return WAIT_TIMEOUT.
Enable thread interrupt in finally blocks, remove some invalid asserts
@manofstick manofstick merged commit 07af7c9 into manofstick:master Jan 16, 2017
manofstick pushed a commit that referenced this pull request Jan 16, 2017
There are two two kinds of transition penalties:
1.Transition from 256-bit AVX code to 128-bit legacy SSE code.
2.Transition from 128-bit legacy SSE code to either 128 or
256-bit AVX code. This only happens if there was a preceding
AVX256->legacy SSE transition penalty.

The primary goal is to remove the #1 AVX to SSE transition penalty.

Added two emitter flags: contains256bitAVXInstruction indicates that
if the JIT method contains 256-bit AVX code, containsAVXInstruction
indicates that if the method contains 128-bit or 256-bit AVX code.

Issue VZEROUPPER in prolog if the method contains 128-bit or 256-bit
AVX code, to avoid legacy SSE to AVX transition penalty, this could
happen for reverse pinvoke situation. Issue VZEROUPPER in epilog
if the method contains 256-bit AVX code, to avoid AVX to legacy
SSE transition penalty.

To limite code size increase impact, we only issue VZEROUPPER before
PInvoke call on user defined function if the JIT method contains
256-bit AVX code, assuming user defined function contains legacy
SSE code. No need to issue VZEROUPPER after PInvoke call because dotnet#2
SSE to AVX transition penalty won't happen since #1 AVX to SSE
transition has been taken care of before the PInvoke call.

We measured ~3% to 1% performance gain on TechEmPower plaintext and
verified those VTune AVX/SSE events: OTHER_ASSISTS.AVX_TO_SSE and
OTHER_ASSISTS.SSE_TO_AVE have been reduced to 0.

Fix #7240

move setContainsAVX flags to lower, refactor to a smaller method

refactor, fix typo in comments

fix format error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.