Recently @eerovaher brought up this good question. Would be nice to see what code is actually benchmarked (like codecov style) without having to dig through each benchmark module.
Not sure how hard this is to do. Running coverage during actual benchmarking would give you inaccurate timing, so the coverage measurement would have to be a separate job just for coverage, not timing. 💭