Description
Version
v20.10.0
Platform
Linux one-87-c49d8fd65-xg6j9 5.4.247-162.350.amzn2.x86_64 #1 SMP Tue Jun 27 22:03:59 UTC 2023 x86_64 GNU/Linux
Subsystem
vm
What steps will reproduce the bug?
We encountered a big performance difference in one particular scenario. Using vm script with timeout options, like:
const { runInContext, createContext } = require('vm');
const http = require('http');
const context = createContext({});
const server = http.createServer(async (req, res) => {
await runInContext(
``, // empty code
context, // empty context
{
timeout: 300, // but with default timeout
});
res.write('Hello, World!\n');
res.end();
});
server.listen(8080, () => {
console.log('Server is running on http://localhost:8080');
});
Then do a apache ab test: ab -n1000 -c100 http://localhost:8080/
.
The programs are running in pods of 2 different k8s clusters. So I also made an image, which we can build with Dockerfile like:
# FROM amd64/node:21.7.1
# FROM amd64/node:16.16.0
FROM amd64/node:20.10.0
# FROM amd64/node:20.11.0
# FROM node:20.10.0
RUN apt-get update && apt-get install apache2-utils
WORKDIR /root/home
COPY . .
# generate application version
RUN TZ=Asia/Shanghai date +%Y_%m%d_%H%M > .version
# startup cmd
ENTRYPOINT [ "/bin/bash", "/root/home/entrypoint.sh" ]
the entrypointh.sh is like:
#!/bin/bash
(sleep 3 && time curl localhost:8080/) &
(sleep 3 && echo "start apache ab benchmark") &
(sleep 4 && ab -n1000 -c100 localhost:8080/) &
node /root/home/test.js
If docker is installed, you can directly test it with this command docker run -it lellansin/test-vm-scripts:0.1
.
We got:
- Total time 60s with 15 QPS: on the machine in question
- uname: Linux xx 5.4.247-162.350.amzn2.x86_64 # 1 SMP Tue Jun 27 22:03:59 UTC 2023 x86_64 GNU/Linux
- CPU lower than 10%
- Total time 10s with 100 QPS: on the machine in normal
- uname: Linux xx 5.10.184-175.731.amzn2.x86_64 # 1 SMP Tue Jun 27 21:48:55 UTC 2023 x86_64 GNU/Linux
- CPU can reach 100%
How often does it reproduce? Is there a required condition?
Must appear on the problem machine now.
What is the expected behavior? Why is that the expected behavior?
The performance of normal machines and problem machines should not be so different.
What do you see instead?
During my troubleshooting process, I noticed that this Tuesday (2024-03-26), Node.js released the v20.12.0 version, which included a vm: fix V8 compilation cache support for vm.Script.
Then I tested the upgrade from v2.10.0 -> v2.12.0, and the new version did fix this problem. it's seem that hitting the V8 compilation cache is more of a workaround than a final solution to the current problem. With the desire to know the root cause of the problem, I still want to confirm what specifically causes this problem.
According to this code, the difference between setting timeout or not is that setting it will initialize a watchdog object. This object will initialize the event loop and corresponding thread in the constructor.
The CPU utilization on the problem machine is always below 10%. It looks like a certain resource is restricted, or the main thread of Node.js is waiting for something.
Through ps -eLf
, I observed that the Node.js process can scale from 11 to 28 on the normal machine, and the number of threads on the problem machine is always 11.
I tried to write the following libuv test code:
#include <iostream>
#include <pthread.h>
#include <vector>
#include <chrono>
#include <uv.h>
using namespace std;
struct ThreadData {
uv_loop_t* loop;
uv_timer_t timer;
bool timed_out;
};
void timer_cb(uv_timer_t* handle) {
ThreadData* data = static_cast<ThreadData*>(handle->data);
data->timed_out = true;
uv_stop(data->loop);
}
void* threadFunc(void* arg) {
ThreadData* data = static_cast<ThreadData*>(arg);
data->loop = new uv_loop_t;
uv_loop_init(data->loop);
data->timed_out = false;
uv_timer_init(data->loop, &data->timer);
data->timer.data = data;
uv_timer_start(&data->timer, timer_cb, 100, 0);
uv_run(data->loop, UV_RUN_DEFAULT);
uv_loop_close(data->loop);
delete data->loop;
return nullptr;
}
int main() {
const int NUM_THREADS = 10;
pthread_t threads[NUM_THREADS];
ThreadData threadData[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; ++i) {
if (pthread_create(&threads[i], nullptr, threadFunc, &threadData[i])) {
cerr << "Error creating thread " << i << endl;
return 1;
}
}
for (int i = 0; i < NUM_THREADS; ++i) {
pthread_join(threads[i], nullptr);
}
int timed_out_count = 0;
for (int i = 0; i < NUM_THREADS; ++i) {
if (threadData[i].timed_out) {
timed_out_count++;
}
}
cout << "Timed out threads: " << timed_out_count << endl;
return 0;
}
And found that there is not much difference between the normal machine and the problem machine.
Additional information
No response