Description
perf
can only monitor a specific OS process running on specific (or all) CPU core. It's unaware of Haskell's RTS and OS threads.
I expect that running several counters concurrently may give strange confusing results. Running a test with counter at the same time with other (non cpu-instruction-counter) tests will also be confusing.
Now, some test runners (e.g. tasty) do parallel test execution by default. This may be a great source of confusion for an unaware user.
I have several ideas of ranging complexity that can help here, but ultimately we have to play around and investigate this.
- Add a visible notice to README telling users not to run counters concurrently
- Make a global lock that is taken by
startInstructionCounter
- if the lock is taken, next
startInstructionCounter
can fail with meaningful error message; - or it could just wait until the lock is released, which will sequentialize
cpu-counter
tests - BUT: all of this seems hacky and won't help if you concurrently run non cpu-instruction-counter tests
- if the lock is taken, next
- We can investigate how to actually make it work concurrently:
startInstructionCounter
can return aHandle
that will allow to work with this specific counter, tracking information related to it- we can use
forkOn
to run on specific capability which usually corresponds to a core. It's implementation dependent, but we only work on Linux so it's probably fine- but probably there's a more reliable way to fork onto specific core, I don't know
- I scanned through a manpage and noticed interesting variables like
PERF_SAMPLE_ID
,PERF_FORMAT_ID
,PERF_SAMPLE_GROUP
,PERF_SAMPLE_ID
. I din't look any closer yet, but maybe this can be used for reliably tracking several counters. This stackoverflow question may be related, but I didn't read closely.
I can only be sure about the first option (warn users in the README). In any case, cpu-instruction-counter
is a thing that works only on Linux and uses FFI, so the best practice should be that all instruction counting tests/benchmarks live in separate executable, that's compiled with +RTS -N1
which eliminates the problem.