Description
Is there something wrong with the benchmark machine? For some reason it keeps stalling/hanging or maybe the connection to Jenkins is silently being severed? I've tried restarting the job a couple times now. Normally the async_hooks
benchmarks should only take about 50 minutes or less on a decent modern machine....
Originally posted by @mscdex in nodejs/node#38785 (comment)
The above comment is referencing https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1027/, https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1028/, https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1029/, ... It seems that any benchmark using http.createServer
gets disconnected.
Also it seems the machine is using Ubuntu 16.04, maybe it's due to an update. Let me know if there's something I can do to help with that.
Activity
Linkgoron commentedon May 24, 2021
This also reproduces on my local machine (macOS Catalina)
Linkgoron commentedon May 24, 2021
I assume that it's a similar issue to nodejs/node#36871. There's a timeout issue that's getting emitted on the request in
_test-double-benchmarker
. A simple workaround that worked locally for me was to add an error handler on the http request (which doesn't exist, only on the response), which was also added tohttp2
a few months ago. I thought that there might be an issue with theserver.close
, but at least for me awaiting it didn't solve the problem (I also think that the issue was emitted before the server closed).IMO there's probably a real underlying issue in HTTP, and this is just putting a band-aid on the problem, but it would at least allow benchmarking again.
rvagg commentedon May 25, 2021
I've done a big update and cleanup on both benchmark machines, including clearing out workspaces and temp files (although I have a bad feeling I might have been too liberal with my removals because these machines have some very specific workflows that may be putting things into unexpected places 🤞). They've been rebooted so let's see if they behave any differently now.
We could upgrade to 18.04, but that might take input from maintainers of the benchmarking work - is a jump in OS likely to have any meaningful impact on benchmark numbers? Does it matter?
aduh95 commentedon May 25, 2021
I might be wrong, it seems the (only?) benchmark CI that is run on nodejs/node PRs is
benchmark-node-micro-benchmarks
which gives the relative perf difference that one PR introduce. In this case, bumping the OS should not be a problem. Could we go straight to v20.04 maybe?While we're on the topic of maintenance of the benchmark CI, do you know if the script that spawns the benchmark CI run is on this repo? I couldn't find it, I'd like to change it to do a shallow git clone instead of a deep one..
I've spawn https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1030/, let's see how it performs.
rvagg commentedon May 25, 2021
Options are in https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/configure if you have access to see what's going on.
It does a
git clone https://github.com/nodejs/benchmarking.git
- is that the one you want to be shallow? Beyond that it runs. benchmarking/experimental/benchmarks/community-benchmark/run.sh
from that clone so any further git operations are executed from that script.aduh95 commentedon May 25, 2021
Thanks for the info, the script I was looking for is this one: https://github.com/nodejs/benchmarking/blob/master/experimental/benchmarks/community-benchmark/run.sh. The repo is read-only, I've asked in nodejs/TSC#822 (comment) where we can move this script.
The job seems to be stuck again, it hasn't output anything in the last hour...
rvagg commentedon May 25, 2021
sxa commentedon May 25, 2021
I had a look on the machine and it looked like your job had indeed "stalled" by some definition - the load on the machine was effectively zero. While it was going I attempted to initiate another run from another user account, and it looks like that has caused an port conflict in your job which has caused it to end. This suggests that your job was, in fact, still progressing in some fashion, even though it wasn't visibily using much CPU or producing additional output so it's possible it would have eventually run to completion.
I've re-initiated your job as https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1031/console which is now running on the
-1
performance machine (since I marked-2
offline for now) and we'll see how that one progresses and whether it succumbs to the same sort of stalling. I'll continue with some experiments on the-2
machine for now ...I would definitely be in favour of upgrading the machine to 20.04 in principle, although the two benchmarking machines are of a specific type so we'd need to be certain that they could be upgraded cleanly...
targos commentedon May 25, 2021
@aduh95 Have you tried to run the benchmarks locally? Maybe something is broken and they cannot end.
aduh95 commentedon May 25, 2021
For some reason the benchmark involving http cannot even start on my machine, (probably some issue with my config), but I think @mcollina is able to run them on a personal server.
targos commentedon May 25, 2021
It's the
async_hooks
benchmarks that hang in https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1031/consoleLinkgoron commentedon May 25, 2021
Just to clarify what I've stated earlier - at least locally, for me, I see similar issues with the http-server benchmark in the
async_hooks
benchmarks. There are two issues here that I see - one, that the benchmark doesn't wait for the server close before starting the next benchmark, and the other that sometimes it looks like we get timeouts from the request which throws (emits error on the request) in the child process which causes issues.aduh95 commentedon May 25, 2021
Only the
async_hooks
benchmarks that involve http (benchmark/async_hooks/async-resource-vs-destroy.js
andbenchmark/async_hooks/http-server.js
). I've noticed the same behaviour withbenchmark/http/cluster.js
andbenchmark/http/simple.js
.aduh95 commentedon Jun 13, 2021
Is there something I can do to help this move forward? I think upgrading to 20.04 would be a good first step, even if it doesn't solve the stalling issue.
34 remaining items