8.5% runtime performance degradation for GNU-toolchain 1.45.x vs. 1.44.x in Windows #75374
Description
Hello,
For some time now I've been writing a chess engine in Rust. I develop on Windows at this time, but because I wanted to use only open source software for this engine, the GNU toolchain has been my primary target. I've been using the MSVC toolchain for profiling. Generally, This was the speed ranking:
- GNU toolchain
- MSVC toolchain using the LLD linker (installed on MSYS2)
- MSVC toolchain with it's normal MS linker
In version 1.45.x, I've seen a great performance drop for the GNU toolchain, when running the compiled program.
Steps to reproduce:
Clone this repository: https://github.com/mvanthoor/rustic
Install these toolchains:
stable-x86_64-pc-windows-gnu
stable-x86_64-pc-windows-msvc (default)
1.44.1-x86_64-pc-windows-gnu
1.44.1-x86_64-pc-windows-msvc*
Compile the engine; make sure you have RUSTFLAGS set for cpu-type=skylake
Run the engine "rustic.exe".
What it will do is a so called "perft": a performance test. It calculates moves in a given position to a given depth, and outputs the found move counts and speed per depth to the screen. The point of "perft" is to match the numbers known to be correct values, and to make it as fast as possible. You do this for a few hundred positions: if all the numbers match up to the known correct ones, move generation in your engine can be considered bug-free.
As said, in version 1.45, with the GNU toolchain, I have observed a great performance drop. The results on my computer are below. The values given are those for Perft 6, which in this case, is the most interesting.
The reason is that everything below perft 6 is only a few seconds or fractions of seconds, so any deviation is within the margin of error on a modern CPU, while estimated run time for perft 7 in the position I'm using would be about 2.75 hours. The numbers for Perft 5 and 6 are always consistent, with a margin of +/- 0.5 seconds on my system.
As you can see, a compile made with the GNU 1.45.x toolchain ran in 217 seconds... slowest of the bunch, down from a class-leading performance of 200 seconds in version 1.44.x. The MSVC toolchain with the normal MS-linker did get a nice speed boost in version 1.45.
Therefore, to obtain maximum speed, I would either need to stick with version GNU 1.44.1. Even if I switch to version MSVC version 1.45.2 to be able to upgrade, this is still slower than GNU 1.44.1.
(Somewhere along the line during the last few versions, the MSVC/LLD combo first lost speed to 212 seconds, and then gained speed again to 207 seconds. I have seen compiles with this toolchain running at 203 seconds for Perft 6. From all the toolchains across versions, this is the least consistent.)
If you have any questions or need something tested, please ask. In the next Rust version, I'd very much like to see the GNU toolchain regain the speed it had with 1.44.1 and earlier.
===============================================================================
Summary
Perft 6 results:
- 1.44.1 GNU : 200.23 seconds
- 1.45.2 MSVC : 202.88 seconds
- 1.44.1 MSVC : 206.70 seconds
- 1.45.2 MSVC/LLD : 207.73 seconds
- 1.44.1 MSVC/LLD : 212.22 seconds
- 1.45.2 GNU : 217.18 seconds
===============================================================================
1.44.1-x86_64-pc-windows-gnu
RUSTFLAGS=-C target-cpu=skylake
Perft 1: 48 (0 ms, inf leaves/sec)
Perft 2: 2039 (0 ms, inf leaves/sec)
Perft 3: 97862 (2 ms, 48931000 leaves/sec)
Perft 4: 4085603 (101 ms, 40451514 leaves/sec)
Perft 5: 193690690 (4706 ms, 41158242 leaves/sec)
Perft 6: 8031647685 (200227 ms, 40112710 leaves/sec)
Total time spent: 205036 ms
Execution speed: 40136970 leaves/second
===============================================================================
stable-x86_64-pc-windows-gnu
RUSTFLAGS=-C target-cpu=skylake
Perft 1: 48 (0 ms, inf leaves/sec)
Perft 2: 2039 (0 ms, inf leaves/sec)
Perft 3: 97862 (2 ms, 48931000 leaves/sec)
Perft 4: 4085603 (101 ms, 40451514 leaves/sec)
Perft 5: 193690690 (4668 ms, 41493292 leaves/sec)
Perft 6: 8031647685 (217176 ms, 36982206 leaves/sec)
Total time spent: 221947 ms
Execution speed: 37078779 leaves/second
===============================================================================
1.44.1-x86_64-pc-windows-msvc
RUSTFLAGS=-C target-cpu=skylake
Perft 1: 48 (0 ms, inf leaves/sec)
Perft 2: 2039 (0 ms, inf leaves/sec)
Perft 3: 97862 (2 ms, 48931000 leaves/sec)
Perft 4: 4085603 (105 ms, 38910504 leaves/sec)
Perft 5: 193690690 (4827 ms, 40126515 leaves/sec)
Perft 6: 8031647685 (206698 ms, 38856920 leaves/sec)
Total time spent: 211632 ms
Execution speed: 38886009 leaves/second
===============================================================================
stable-x86_64-pc-windows-msvc
RUSTFLAGS=-C target-cpu=skylake
Perft 1: 48 (0 ms, inf leaves/sec)
Perft 2: 2039 (0 ms, inf leaves/sec)
Perft 3: 97862 (2 ms, 48931000 leaves/sec)
Perft 4: 4085603 (102 ms, 40054931 leaves/sec)
Perft 5: 193690690 (4728 ms, 40966728 leaves/sec)
Perft 6: 8031647685 (202875 ms, 39589144 leaves/sec)
Total time spent: 207707 ms
Execution speed: 39620830 leaves/second
===============================================================================
1.44.1-x86_64-pc-windows-msvc
RUSTFLAGS=-C target-cpu=skylake
Using lld-link.exe through MSYS2
Perft 1: 48 (0 ms, inf leaves/sec)
Perft 2: 2039 (0 ms, inf leaves/sec)
Perft 3: 97862 (2 ms, 48931000 leaves/sec)
Perft 4: 4085603 (105 ms, 38910504 leaves/sec)
Perft 5: 193690690 (4839 ms, 40027007 leaves/sec)
Perft 6: 8031647685 (207272 ms, 38749313 leaves/sec)
Total time spent: 212218 ms
Execution speed: 38778632 leaves/second
===============================================================================
stable-x86_64-pc-windows-msvc
RUSTFLAGS=-C target-cpu=skylake
Using lld-link.exe through MSYS2
Perft 1: 48 (0 ms, inf leaves/sec)
Perft 2: 2039 (0 ms, inf leaves/sec)
Perft 3: 97862 (2 ms, 48931000 leaves/sec)
Perft 4: 4085603 (102 ms, 40054931 leaves/sec)
Perft 5: 193690690 (4729 ms, 40958065 leaves/sec)
Perft 6: 8031647685 (202895 ms, 39585242 leaves/sec)
Total time spent: 207728 ms
Execution speed: 39616825 leaves/second
===============================================================================
Had a lot of issues with Cargo Bisect; many errors. Also if it does work, it seems to only compile the project and when it succeeds, it's OK. I'll have to look into this more. I'd appreciate some assistance. For now, I did the bisection by hand.
Start: nightly-2020-04-23-x86_64-pc-windows-gnu (last 1.44.0 nightly) GOOD
2020-05-14: GOOD
2020-05-18: GOOD
2020-05-20: GOOD
2020-05-21: GOOD
2020-05-22: BAD <==
End: nightly-2020-06-05-x86_64-pc-windows-gnu (last 1.45.0 nightly) BAD
The 1.45.0 nightly from 2020-05-21 still gives the same results as the 1.44.0 nightly from 2020-04-23.
The next one, from 2020-05-22, is the first to have this performance degradation.
I'd appreciate some assitance in using cargo-bisect to be able to find the PR that caused this slowdown.
Activity