-
Notifications
You must be signed in to change notification settings - Fork 47
Status of running node core benchmarks #127
Comments
This now seems to be working, and is ready for testing. @mcollina you mentioned you had some PRs to try it on. At the moment it is set to test a PR with its impact on master. There's four parameters available: Please take a look and let me know how it goes! Example of output : URL to launch from: NOTE: Be careful to understand how long the run you are submitting will take: |
I'm currently on vacation, I'll get this going asap when I come back (1 week). |
ProTip: To multiple by 60 just read the seconds as minutes. |
Thoughts: currently its taking AGES to clone node core, What are thoughts on cloning once on the machine and then copying over to the workspace and doing a pull origin master? Perhaps we could update the central clone once a week? Investigating why the run @mcollina failed - will update with findings.... |
How to: Running core benchmarks on Node.js CICurrent functionThe current jenkins job aims at allowing the impact of a PR on a particular branch to be compared. For example how does PR XXXX impact performance on master? How to configure the jobTo start with visit https://ci.nodejs.org/job/benchmark-node-micro-benchmarks/build?delay=0sec BRANCH -this is which branch or tag to compare to, defaults to master PULL_ID -pull request to test CATEGORY -This is a category of benchmarks in the node/benchmark folder from core FILTER -this reduces further the benchmarks in scope and is includes benchmark where FILTER exists in the filename. On running, a minimum of 30 iterations of each benchmark/variation per build, once they have all finished Rscript will summarise the results, suggesting improvement / regression as well as confidence intervals. |
@gareth-ellis It would probably make sense to PR this as a doc somewhere in the repo. |
@AndreasMadsen @mcollina @mscdex , if you have some cycles do you want to have a go with the job, using instructions above and see how you get on? Also got a PR for the above doc with a view to getting something together we can send out to a wider audience and get people using this job and core benchmarks. (PR in #144 ) |
I would love to see it in action but right now I don't have anything I would like to benchmark :( |
I think the script should automatically rebase the branch, because a difference in the V8 engine could skew the results and provide wrong feedback. Maybe this should be a checkbox, so we can verify across different release lines. I run the tests on my laptop in June and I got no regressions (nodejs/node#12857 (comment)) - this was at the time of V8 5.9. However the branch is still using an extremely old V8 engine, and we got some very weird results running it now:
I kicked off the job again: https://ci.nodejs.org/job/benchmark-node-micro-benchmarks/15/. It would be good to access the raw data, so I can use R to plot graphs.. is that pushed somewhere? |
The script checkouts the
What are the wired results? You have to remember that there is a small risk for a false positive and with many independent test that risk becomes large. At only 5% confidence (one star), the risk of getting 5 false positive in 44 tests is 4.590962%. plot is only valid for 44 independent test. plot codelibrary(ggplot2);
c1 = data.frame(
false.positive=0:44,
properbility=dbinom(0:44, 44, 0.05),
confidence='*'
)
c2 = data.frame(
false.positive=0:44,
properbility=dbinom(0:44, 44, 0.01),
confidence='**'
)
c3 = data.frame(
false.positive=0:44,
properbility=dbinom(0:44, 44, 0.001),
confidence='***'
)
c = rbind(rbind(c1, c2), c3)
p = ggplot(c, aes(x=false.positive))
+ geom_bar(aes(y=properbility, fill=confidence), stat="identity")
+ facet_grid(confidence ~ .)
+ scale_x_continuous(breaks=0:44, limit=c(-0.5,8)))
print(p) |
My bad. So, this is fine. The script works beautifully then. I think we should document it in the main repo, it would be of great help to everyone. |
Excellent - I will see what I can do about providing the raw data. Yes, we plan to get the info out to collaborators to start using, but thought it would be good to have some people test first :) I'll add a bit of a description similar to what @AndreasMadsen wrote to explain how it works, and include in the doc in #144 . Any other issues or quirks you noticed should be documented? |
@gareth-ellis +1 to all that you just said. Running all the http benchmarks without a filter takes forever (44 hours on my laptop and I got to 60%). We should not allow that, or strongly suggest that they need to add a filter. Running a job for 3-4 days does not sound right. IMHO The filter might well be mandatory for this script. Posting the result to the PR would be a huge plus, but I do not know if that is easy to do. |
I agree. Statistically, due to the false positive risk, it is also not very useful to run that many tests. I think we should just check the expected benchmark time using a precompiled list as shown earlier #58 (comment). |
@mcollina , for getting to the raw data, if you click on your job, you should be able to open the workspace, and then browse to I'll modify the script to move it to the top level, so there is less moving around. I think the gotcha here could be the workspace may be cleared before each run. I will rename the file then I can be sure. Otherwise another option may be copying it to a folder on the benchmarking machine and have it synced over to the website for download. We do something similar for the charts (at https://benchmarking.nodejs.org) so it should in theory be possible. Then we could post a link in the job output to where the file will be. For posting to a PR, I know there's some bots - in the org - are there any i can pass data to post to a PR? In the job I already have the PR number, so I have the information to send and where it needs to go. |
both approach make sense. Even uploading it to a gist on github (together with a chart, maybe), would be fantastic. Something that last for a bit of time with some meta-info (shas being tested). |
OK, will look into that. In the meantime, csvs now get saved in the root of the ws with date/timestamp: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/ws/ |
Are we ready to announce to the rest of the team they can start using this or is there any other testing feedback we should get first. |
@mhdawson I don't think there is anything to lose. It is not like making this public can cause anything else to fail or break. |
@gareth-ellis is there a reason that |
the main reason was that we were using the default number set of iterations set per benchmark, I can add the RUNS parameter if you like, just need to appreciate that if you set the number too low, the results may not be particularly useful, and if set too high and with enough benchmarks in scope it could take a long time to run |
I looked at the code and saw that the script will use it, so I added it, because we have a PR (nodejs/node#16888 (comment)) that is very noisy, and AFAICT it would give better results with more runs but shorter iterations. |
In theory, it will only make a tiny difference. After 30-100 runs, the difference should be less than the numerical precision. In practice, decreasing the number of iteration can change the optimization behavior. |
@refack what are the outliers? |
@refack I have made a long comment in the other issue (nodejs/node#16888 (comment)), hopefully that will clarify things. |
Gareth opened nodejs/node#17359 to let collaborators know about it. |
Anything more to be discussed in this issue ? The benchmark job is now available, publicized and initial issues resolved. |
Since the PR was closed once I'd merged in my changes, I figured I'd open this to track current progress on getting this working.
This is a continuation of #58 .
Current status is that I have the job setup, however we currently fail on the fact Rscript is missing. I have opened nodejs/build#821 to get this installed.
In the mean time I am going to create a branch in benchmarking with a change to not use Rscript, which we can then use for continued testing.
The text was updated successfully, but these errors were encountered: