-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change hits file instrumentation to be memory mapped #241
Change hits file instrumentation to be memory mapped #241
Conversation
Nice to see someone take this on, I'll try this out when I'm back at work on Monday. Thanks! |
Yeah, not sure why the tests are failing. Works locally |
Hi @Cyberboss, pull the latest master and see if all tests still pass locally |
…overlet into MemoryMappedHitsFile
Codecov Report
@@ Coverage Diff @@
## master #241 +/- ##
==========================================
+ Coverage 95.27% 95.33% +0.06%
==========================================
Files 16 16
Lines 1693 1695 +2
==========================================
+ Hits 1613 1616 +3
+ Misses 80 79 -1
Continue to review full report at Codecov.
|
Think I nailed it now |
Anecdotally at least, this works for me. Ran it a couple of times and coverage matched perfectly. |
I've gave this a spin on my code base, and it seems to count hits correctly. But it is significantly slower than version 2.3.2, where two test projects go from about 50s to respectively 90s and 135s with this change. I think this might be related to what @pjanotti discussed in #181 about multithreaded tests. Even if Interlocking is less bad than using mutexes, it still adds a synchronization between the CPU cores which slows down tests. It would be interesting if @pjanotti could see what the performance is on those projects, since they seemed to be even more penalised by synchronization between parallel unit tests than the projects I test here. This could perhaps be improved by something similar to #181 keeping one hit count memory mapped file for each CPU core in a thread-static variable and increment it without Interlocking. But then the Coverage class must find all these files and add upp the hit counts from them. |
Hi @petli, yes I tried an initial perf improvement with Interlocked and for heavy concurrency scenarios it was really bad, so had to fallback on hit counting per thread. There were some projects on the CoreFx repo that were especially bad (I can't recall exactly which ones, but the names may live in some of my previous comments). I would recommend being careful about adding interlocked in the normal hit path count if considering performance for these concurrent scenarios is relevant. Perhaps an option to use the memory mapped file? (/cc @danmosemsft @ViktorHofer) |
I realised the performance test in coverlet can easily be adjusted to test the threaded behaviour. This really exercises the code, since the contention gets worse if there's lots of threads all hitting the same memory pages and here there are twenty tests all looping intensely over the same code:
Running those 20 parallel tests on 2.3.2 took 30s on my dev machine, with the Interlocking usage in this PR it takes... *twiddles thumbs* ...16 minutes. So before this can be merged it needs to be changed to use ThreadStatic to keep track of one memory mapped file per thread, alas. |
Alright, taken care of |
I was wrong above that the test case needed additional
(PR to fix this: #248) This led me to do an incorrect analysis of the test run I did before the change to thread-local files, the one I wrote about above taking 16 minutes. It does not seem to be a problem with Running just a single run of the performance test with 20000 iterations takes 3s with the old @Cyberboss, have you seen this too or is this just something on my machine? Sorry for leading you down the wrong track here... |
I have yet to look into how to run the performance test but yeah, my own MMF experiments don't bode well with random access incrementing like this. A shared memory implementation (non-file backed) should be infinitely better, I'll see if I can get that working. |
There's now a description in README.md in the master branch on how to run the performance tests, it was hidden down in a unit test file before: |
@Cyberboss Any progress on the perf testing? Keen to get this merged so we can fix our CI pipeline. |
@Cyberboss Sorry to pester, but I really need to know if you're going to be able to get this one sorted or not. If not, I'm going to switch out Coverlet with another tool. |
I'll see if I can't find time to readdress this sometime this week, end of year has gotten pretty busy. |
Thanks buddy, much appreciated. Same here or I would have looked myself! |
@WiredUK @Cyberboss I've played around with this today, but just changing to a pure memory-based file didn't help much. Instead I've tried a hybrid solution where hit counting is done directly in int arrays as before, but passing the counts to the coverage analysis is done via mapped memory. That can hopefully be quicker than writing to a file and thus have time to finish running in I have to clean up the code a bit, but will send a PR tomorrow with this. |
Finally pushed #276. |
Fixes #210
Added benefit of getting kernel level cleanup and atomic-ity when incrementing hit counters