Some benchmarks simply fail, and we are not tracking that. Example: https://github.com/nodejs/node/pull/59173.