Skip to content

Benchmark machine maintenance  #2656

Closed
Closed
@aduh95

Description

@aduh95

Is there something wrong with the benchmark machine? For some reason it keeps stalling/hanging or maybe the connection to Jenkins is silently being severed? I've tried restarting the job a couple times now. Normally the async_hooks benchmarks should only take about 50 minutes or less on a decent modern machine....

Originally posted by @mscdex in nodejs/node#38785 (comment)

The above comment is referencing https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1027/, https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1028/, https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1029/, ... It seems that any benchmark using http.createServer gets disconnected.
Also it seems the machine is using Ubuntu 16.04, maybe it's due to an update. Let me know if there's something I can do to help with that.

Activity

Linkgoron

Linkgoron commented on May 24, 2021

@Linkgoron
Member

This also reproduces on my local machine (macOS Catalina)

Linkgoron

Linkgoron commented on May 24, 2021

@Linkgoron
Member

I assume that it's a similar issue to nodejs/node#36871. There's a timeout issue that's getting emitted on the request in _test-double-benchmarker. A simple workaround that worked locally for me was to add an error handler on the http request (which doesn't exist, only on the response), which was also added to http2 a few months ago. I thought that there might be an issue with the server.close, but at least for me awaiting it didn't solve the problem (I also think that the issue was emitted before the server closed).

IMO there's probably a real underlying issue in HTTP, and this is just putting a band-aid on the problem, but it would at least allow benchmarking again.

rvagg

rvagg commented on May 25, 2021

@rvagg
Member

I've done a big update and cleanup on both benchmark machines, including clearing out workspaces and temp files (although I have a bad feeling I might have been too liberal with my removals because these machines have some very specific workflows that may be putting things into unexpected places 🤞). They've been rebooted so let's see if they behave any differently now.

We could upgrade to 18.04, but that might take input from maintainers of the benchmarking work - is a jump in OS likely to have any meaningful impact on benchmark numbers? Does it matter?

aduh95

aduh95 commented on May 25, 2021

@aduh95
ContributorAuthor

We could upgrade to 18.04, but that might take input from maintainers of the benchmarking work - is a jump in OS likely to have any meaningful impact on benchmark numbers? Does it matter?

I might be wrong, it seems the (only?) benchmark CI that is run on nodejs/node PRs is benchmark-node-micro-benchmarks which gives the relative perf difference that one PR introduce. In this case, bumping the OS should not be a problem. Could we go straight to v20.04 maybe?
While we're on the topic of maintenance of the benchmark CI, do you know if the script that spawns the benchmark CI run is on this repo? I couldn't find it, I'd like to change it to do a shallow git clone instead of a deep one..

They've been rebooted so let's see if they behave any differently now.

I've spawn https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1030/, let's see how it performs.

rvagg

rvagg commented on May 25, 2021

@rvagg
Member

Options are in https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/configure if you have access to see what's going on.

It does a git clone https://github.com/nodejs/benchmarking.git - is that the one you want to be shallow? Beyond that it runs . benchmarking/experimental/benchmarks/community-benchmark/run.sh from that clone so any further git operations are executed from that script.

aduh95

aduh95 commented on May 25, 2021

@aduh95
ContributorAuthor

Thanks for the info, the script I was looking for is this one: https://github.com/nodejs/benchmarking/blob/master/experimental/benchmarks/community-benchmark/run.sh. The repo is read-only, I've asked in nodejs/TSC#822 (comment) where we can move this script.

I've spawn https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1030/, let's see how it performs.

The job seems to be stuck again, it hasn't output anything in the last hour...

rvagg

rvagg commented on May 25, 2021

@rvagg
Member
root@test-nearform--intel-ubuntu1604-x64-2:~# ps auxww | grep ^iojs
iojs       1900  0.8  0.3 9239832 210320 ?      Ssl  04:14   4:08 /usr/bin/java -Xmx128m -jar /home/iojs/slave.jar -jnlpUrl https://ci.nodejs.org/computer/test-nearform_intel-ubuntu1604-x64-2/slave-agent.jnlp -secret 1c49efc3534392967acc0521aa0d82594c7ce9e54bdff5ff6df669e8337ce79d
iojs      11016  0.0  0.0  11356  3188 ?        S    09:17   0:00 bash -xe /tmp/jenkins7987259599734232559.sh
iojs      95339  0.0  0.0 596216 40300 ?        Sl   09:29   0:01 ./node-master benchmark/compare.js --old ./node-master --new ./node-pr -- async_hooks
iojs      95340  0.0  0.0   6012   664 ?        S    09:29   0:00 tee output250521-092939.csv
iojs     133265  0.0  0.0 331720 34016 ?        Sl   09:57   0:00 ./node-master /w/bnch-comp/node/benchmark/async_hooks/async-resource-vs-destroy.js
iojs     133288  0.1  0.0 670556 55736 ?        Sl   09:57   0:09 /w/bnch-comp/node/node-master /w/bnch-comp/node/benchmark/async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path=/ asyncMethod=callbacks type=async-resource
sxa

sxa commented on May 25, 2021

@sxa
Member

I had a look on the machine and it looked like your job had indeed "stalled" by some definition - the load on the machine was effectively zero. While it was going I attempted to initiate another run from another user account, and it looks like that has caused an port conflict in your job which has caused it to end. This suggests that your job was, in fact, still progressing in some fashion, even though it wasn't visibily using much CPU or producing additional output so it's possible it would have eventually run to completion.

12:15:26 node:events:371
12:15:26       throw er; // Unhandled 'error' event
12:15:26       ^
12:15:26 
12:15:26 Error: listen EADDRINUSE: address already in use :::12346
12:15:26     at Server.setupListenHandle [as _listen2] (node:net:1306:16)
12:15:26     at listenInCluster (node:net:1354:12)
12:15:26     at Server.listen (node:net:1441:7)
12:15:26     at main (/w/bnch-comp/node/benchmark/async_hooks/async-resource-vs-destroy.js:175:6)
12:15:26     at /w/bnch-comp/node/benchmark/common.js:42:9
12:15:26     at processTicksAndRejections (node:internal/process/task_queues:78:11)
12:15:26 Emitted 'error' event on Server instance at:
12:15:26     at emitErrorNT (node:net:1333:8)
12:15:26     at processTicksAndRejections (node:internal/process/task_queues:83:21) {
12:15:26   code: 'EADDRINUSE',
12:15:26   errno: -98,
12:15:26   syscall: 'listen',
12:15:26   address: '::',
12:15:26   port: 12346
12:15:26 }
12:15:26 ++ cat output250521-092939.csv
12:15:26 ++ Rscript benchmark/compare.R

I've re-initiated your job as https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1031/console which is now running on the -1 performance machine (since I marked -2 offline for now) and we'll see how that one progresses and whether it succumbs to the same sort of stalling. I'll continue with some experiments on the -2 machine for now ...

I would definitely be in favour of upgrading the machine to 20.04 in principle, although the two benchmarking machines are of a specific type so we'd need to be certain that they could be upgraded cleanly...

targos

targos commented on May 25, 2021

@targos
Member

@aduh95 Have you tried to run the benchmarks locally? Maybe something is broken and they cannot end.

aduh95

aduh95 commented on May 25, 2021

@aduh95
ContributorAuthor

For some reason the benchmark involving http cannot even start on my machine, (probably some issue with my config), but I think @mcollina is able to run them on a personal server.

targos

targos commented on May 25, 2021

@targos
Member
Linkgoron

Linkgoron commented on May 25, 2021

@Linkgoron
Member

Just to clarify what I've stated earlier - at least locally, for me, I see similar issues with the http-server benchmark in the async_hooks benchmarks. There are two issues here that I see - one, that the benchmark doesn't wait for the server close before starting the next benchmark, and the other that sometimes it looks like we get timeouts from the request which throws (emits error on the request) in the child process which causes issues.

aduh95

aduh95 commented on May 25, 2021

@aduh95
ContributorAuthor

It's the async_hooks benchmarks that hang in https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1031/console

Only the async_hooks benchmarks that involve http (benchmark/async_hooks/async-resource-vs-destroy.js and benchmark/async_hooks/http-server.js). I've noticed the same behaviour with benchmark/http/cluster.js and benchmark/http/simple.js.

aduh95

aduh95 commented on Jun 13, 2021

@aduh95
ContributorAuthor

Is there something I can do to help this move forward? I think upgrading to 20.04 would be a good first step, even if it doesn't solve the stalling issue.

34 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Benchmark machine maintenance · Issue #2656 · nodejs/build