Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Runner is slow #47663

Closed
egoroof opened this issue Apr 21, 2023 · 11 comments
Closed

Test Runner is slow #47663

egoroof opened this issue Apr 21, 2023 · 11 comments
Labels
performance Issues and PRs related to the performance of Node.js. test_runner Issues and PRs related to the test runner subsystem.

Comments

@egoroof
Copy link

egoroof commented Apr 21, 2023

Version

18.16.0 same with 20.0.0

Platform

Ubuntu 22.04.2 and Windows 11

Subsystem

test_runner

What steps will reproduce the bug?

Run some simple tests with Nodejs Test Runner. For example:

git clone https://github.com/egoroof/browser-id3-writer.git
cd browser-id3-writer
npm ci
npm run build
node --test --test-reporter spec test/

How often does it reproduce? Is there a required condition?

always

What is the expected behavior? Why is that the expected behavior?

I expect native test runner to run at least as Mocha speed or even faster.

What do you see instead?

duration_ms 1977 when on same tests Mocha runs in 38ms (50x faster, yeah)

Additional information

I also tested it on nodejs 20.0.0 and got same results. Same on Windows 11 and Ubuntu 22.04.2

I was happy to drop Mocha as a dependency, but was not expect to such a slow down.

You can see the switch from Mocha to native Test Runner in this commit egoroof/browser-id3-writer@6d29a06

@VoltrexKeyva VoltrexKeyva added test_runner Issues and PRs related to the test runner subsystem. performance Issues and PRs related to the performance of Node.js. labels Apr 22, 2023
@meyfa
Copy link
Contributor

meyfa commented Apr 22, 2023

Just my educated guess as someone who's been watching test_runner development from the outside:

The overhead likely comes from Node.js spawning many child processes, one per test file 1. Meanwhile, Mocha is running all tests in the same process.

I generated a performance profile 2 like this (in Bash on Linux):

$ node --prof --test --test-reporter spec test/ &
$ echo $!
11456 # PID of main process
$ node --prof-process isolate-*-11456-v8.log > processed_main.txt
$ node --prof-process isolate-*-11616-v8.log > processed_test_file.txt

There were 20 isolate-*.log files generated in total, and your project has 19 test files. The main profile has the following summary:

 [Summary]:
   ticks  total  nonlib   name
      6    0.3%    0.4%  JavaScript
   1541   82.4%   97.5%  C++
    479   25.6%   30.3%  GC
    290   15.5%          Shared libraries
     34    1.8%          Unaccounted

and the one sample test file:

 [Summary]:
   ticks  total  nonlib   name
      0    0.0%    0.0%  JavaScript
    844   77.0%  100.0%  C++
      6    0.5%    0.7%  GC
    252   23.0%          Shared libraries

In both cases it's almost entirely C++. In the C++ section of the profiles, __write is always dominating (total 43% in main and total 62% in child process), followed by __lll_lock_wait. Unfortunately I have no idea how to interpret the results any further since I'm not familiar with the C++ code.

Footnotes

  1. https://nodejs.org/dist/latest-v18.x/docs/api/test.html#test-runner-execution-model

  2. https://nodejs.org/en/docs/guides/simple-profiling

@cjihrig
Copy link
Contributor

cjihrig commented Apr 22, 2023

There is indeed some overhead here from using child processes. You can see mocha slow down on this particular test suite if you run it with --parallel, which is more similar to what Node's test runner does.

The more interesting thing is that there was a huge slowdown on this suite between Node 18.12.1 and 18.13.0. In 18.12.1, the performance is on par with mocha --parallel. v18.13.0 is visibly slower. 18.13.0 had a few test runner changes, but I would be willing to bet that the change that introduced the TAP parser is the cause of the slowdown.

I think we should confirm if that is indeed the culprit. If it is, we can try to optimize the existing code. It would also be worth looking at removing the TAP parser and lexer completely and using a different serialization format such as JSON or even V8's serializer. I think we could serialize the reporter events directly to the parent process (minus the Error object in the test:fail event) more quickly than TAP. Unfortunately, changing that format would be a breaking change because 334bb17 publicly tied us to using TAP in this scenario.

@egoroof

This comment was marked as abuse.

@MoLow
Copy link
Member

MoLow commented Apr 22, 2023

this might very well be a duplicate of #47365,
besides that, I am +1 on the suggestion made by @cjihrig to report using v8 serialization (perhaps by using IPC)

@cjihrig
Copy link
Contributor

cjihrig commented Apr 23, 2023

this might very well be a duplicate of #47365

Was the cause of that issue introduced in 18.13.0? If not, then there is at least one other performance issue at play.

perhaps by using IPC

I would be hesitant to spawn the child processes with an IPC channel. That changes how some APIs work, and could lead to different behavior when a file is run with --test vs. without it.

@MoLow
Copy link
Member

MoLow commented Apr 23, 2023

Was the cause of that issue introduced in 18.13.0? If not, then there is at least one other performance issue at play.

Yep, usage of SafePromiseAllSettledReturnVoid was added here #45214, and it was released at 18.13.0

I would be hesitant to spawn the child processes with an IPC channel. That changes how some APIs work, and could lead to different behavior when a file is run with --test vs. without it.

I am curios, what does it change? the idea was to use IPC to avoid breaking TAP

@cjihrig
Copy link
Contributor

cjihrig commented Apr 23, 2023

Yep, usage of SafePromiseAllSettledReturnVoid was added here #45214, and it was released at 18.13.0

That's great to hear.

I am curios, what does it change?

process.channel, process.connected, process.disconnect(), process.send(), and possibly other things that I'm forgetting.

the idea was to use IPC to avoid breaking TAP

I'm not 100% sure what you mean here, but I don't think we would break TAP. This would just be the communication protocol between the CLI runner and the child processes. It should never be visible to users unless they were spawning child processes with NODE_TEST_CONTEXT=child.

It's only really a breaking change because we have documented that NODE_TEST_CONTEXT=child outputs TAP. We could make it a non-breaking change by changing the CLI to use NODE_TEST_CONTEXT=some-value-other-than-child (actual value to be determined). In that case we could document that the new communication protocol is an internal implementation detail that does not follow semver. We should probably do that for v21 no matter what so that we aren't bound to a specific format forever.

@egoroof

This comment was marked as abuse.

@cjihrig
Copy link
Contributor

cjihrig commented May 3, 2023

I've noticed a few things when running your test suite locally with node 20.1.0:

  • The time elapsed as reported by the test runners seems to be pretty different. I ran the test suites with the time command, and the elapsed time reported by Node's test runner seems to be much closer to what is reported by time.
  • I added a three second timeout in the tests. Node properly accounted for this, while mocha completely ignored it when reporting elapsed time.
  • When mocha runs all of the tests in a single process, it is indeed faster. According to time, mocha clocks in around 131ms on my machine, versus 175ms for Node's test runner. However, I don't think running the tests in the same process is a good fundamental design. For example, when I add a process.exit() into one of the test files, mocha exits and does not report anything.
  • When running mocha with --parallel, which is more similar to how Node's test runner works, time reports mocha being around 261ms, while Node is still at 175ms.

There is definitely still room for improvement in Node's test runner, but I don't think the numbers are quite as drastic as reported in the previous comment.

@egoroof

This comment was marked as abuse.

@egoroof egoroof closed this as completed May 3, 2023
@egoroof

This comment was marked as abuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues and PRs related to the performance of Node.js. test_runner Issues and PRs related to the test runner subsystem.
Projects
None yet
Development

No branches or pull requests

5 participants