-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Order of benchmarks seems to bias results #864
Comments
Hey @sinclairzx81, thank you for your feedback. This is certainly a serious issue. I am not an expert on Node internals or how GC works, beyond the CS 201 type of information :) Do you have any proposals that are specific to this repo? I had an idea to run every test in a separate GitHub Action job, which would provide the isolation, but then I was worried about the jobs being placed on different machines with different specs. Running them in a separate process on the same machine sounds like good middle ground. Wondering what would it take to do so, given our current setup. |
Is there anything, perhaps, in the settings of Maybe @caderek can recommend something? |
@moltar Hi, thanks for the quick reply. I agree that each test should be run on the same machine (not distributed across different machines on CI). I guess my initial thoughts on this were to just run each test in it's own V8 isolate. I guess this could be achieved a couple of ways, but perhaps the easiest is to have a cli command that is able to target an individual test. This would enable something like the following in // runs spectypes only
{
"scripts": {
"start": "npm run compile:spectypes && ts-node index.ts --test spectypes"
}
} So, to run the complete test suite, you could just append the // runs everything
{
"scripts": {
"start": "npm run compile:spectypes && ts-node index.ts --test spectypes && ts-node index.ts --test zod && ... (omitted)"
}
} As the start command would likely get prohibitively long, googles zx package could be utilized to run a JavaScript based shell execution script to run each test and await them in turn. // runner.mjs
// build spectype
await $`npm run compile:spectypes`
// load cases somehow, and execute in series (with each test run as an separate OS process)
for (const case of loadCases()) {
await $`ts-node index.ts --test ${case}`
} $ npx zx runner.mjs I had a quick look through the code, I guess the only complication that might arise is handling aggregate results (writing benchmark output to Open to thoughts. |
👍
👍
The results json format is pretty simple: an object in an array for each benchmark/module combination. That can be easily rewritten such that every isolated benchmark run appends its individual result to an already existing results json file. Benchmarks will obviously not run in parallel so there won't be any file transaction issues. Two more thoughts/questions that come to my mind are:
|
@hoeck Hi!
This is a good question. I'm thinking maybe the best way to orchestrate these tests might be to organize the execution of the tests by package (vs by test). So in this regard, the process would start for a given validation package (i.e myzod), then execute all the tests for that validation package, then exit. The next package would be run immediately there after (in it's own process) The thinking here is that, if performance degrades during the tests run for a single package, that's generally going to be a consequence of that package throttling V8 in some way. Package authors may be able to identify bottlenecks in throughput if they know the sequence in which the tests were run, and if the results show clear degradation across the tests run in sequence.
It might be a good idea to omit packages from I should note, In the testing I've done so far, I have been importing all the tests in the suite, and haven't experienced a problem with them all being there. Degradation seems more tied to order of execution, with the importing of these packages seeming ok. Hope this helps! |
@sinclairzx81 @moltar or anyone else volunteering for this? Otherwise I'm going to try to implement @sinclairzx81 suggestions. |
@hoeck Hi. that's great news :) I wasn't planning on undertaking these updates myself (as that work is best left for those benchmarking), but can help assist with the PR process by running some local tests and providing feedback prior to merge if it's helpful. Keep me posted! |
That would be great! 👍🏼 😁 |
@sinclairzx81 could you please try the changes from my PR? When you execute You can also do And most important, does the bias of the ordering go away or does it persist? |
@hoeck Hi sure, just ran your branch and it looks like that did the trick. Seems many of the libraries there are giving much better results across the board now, with some crossing the 1 million operations per second mark which is nice to see! Hopefully the updates weren't too bad to implement. Also +1 for adding the ability run individual cases, very useful! Good work mate! |
Thx for the feedback @sinclairzx81 😊 and I am glad that this removed the result bias. I will fix up the linter issues with the PR tomorrow so that we can merge it. |
@hoeck amazing work, as usual! 👍🏼 I'm wondering if we shall add temporarily a notice banner at the top explaining this change, with maybe a link to this issue for details? |
That makes sense, I'll add a simple info message on top of the readme. |
Hi, thanks for this project.
I'm currently looking at submitting a new runtime type checker for comparative performance benchmarks, but have noticed that the order in which each validator is run seems to degrade subsequent benchmark results. By moving performing tests around, I'm able to greatly improve the performance results of tests by simply running them first.
I'm thinking this is likely due to some of the validators utilizing more memory overhead (implicating the GC), or that V8 optimization heuristics have broken down causing V8 to take the slow path (this is quite likely given how validation routines may require dynamic property access on objects to complete). To rule out the GC, I started the project with
npm run compile:spectypes && node --expose-gc -r ts-node/register ./index.ts
and calledglobal.gc()
to force collection prior to tests running, but no luck. So the problem is most likely related V8 optimization heuristics.I think to give all tests an even footing, I'm wondering if it would be possible to reshape these tests such that each is run within it's own node process (rather than running all tests in the same process). This would give each a fresh instance in which to do it's thing, and should yield more accurate results.
Open to thoughts if this is something worth looking into.
The text was updated successfully, but these errors were encountered: