Description
When using nginx 1.18.0 with njs 0.6.0 on a high traffic server, it can be spotted some of the nginx worker processes are utilizing 100% of cpu time.
Using lsof
, tcpdump
, and strace
, it can be verified that the data stream is running through. Using perf
to get samples from the process in question and analyze hot spot, I can find that it takes the worker 99.98% of cpu time to reach the end of the linked list *busy
in https://github.com/nginx/nginx/blob/release-1.18.0/src/core/ngx_buf.c#L195 .
I analyze the core image of the worker process and confirmed that the *busy
chain is really really long, piled up with buffers with the tag ngx_stream_proxy_module
. Some of the chain
objects are referring the same buf
object, and some of the buf
objects are within the range of session->upstream->upstream_buf
and the others session->upstream-> downstream_buf
.
I believe this issue is because in njs_stream
module, buf
objects from both directions are mixed and appended into the same *busy
chain. Suppose ngx_stream_js_body_filter
is handling a chain of one buf
object from upstream and a chain of two buf
objects from downstream and the first buf
is not sent entirely and moved to *busy
chain. If the two following buf
objects from downstream are fully sent, they will still be appended to the *busy
chain, and not moved to the *free
chain because there is one chain containing the first buf
, which is not “empty”. However, when the control flow returns to ngx_stream_proxy_process
, things become different: busy
chain is separate for the two direction and the two buf
objects will be put into u->free
and be reused. Considering u->free
chain is last-in-first-out, these objects will be soon reused. Remember now there are still references from the *busy
chain in ngx_stream_js_body_filter
to these objects. The chain
objects there may have no chance to be freed and begin to pile up.
The root problem is that modules in the chain of buf
processing should have the same view whether the buf
object referred by a chain
is busy or not. Since the *busy
chain is separated in ngx_stream_proxy_module
, we should also separate them in ngx_stream_js_module
.
Since applying the following patch on our production environment, the symptom has not happened again.
From 7d0a302c591ac64642bae578b0eb9e4f63b83611 Mon Sep 17 00:00:00 2001
From: Miao Wang <shankerwangmiao@gmail.com>
Date: Wed, 11 Aug 2021 11:44:12 +0800
Subject: [PATCH] stream: seperate busy chain for two directions
---
nginx/ngx_stream_js_module.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/nginx/ngx_stream_js_module.c b/nginx/ngx_stream_js_module.c
index e7b29a1..24054e9 100644
--- a/nginx/ngx_stream_js_module.c
+++ b/nginx/ngx_stream_js_module.c
@@ -49,7 +49,7 @@ typedef struct {
ngx_buf_t *buf;
ngx_chain_t **last_out;
ngx_chain_t *free;
- ngx_chain_t *busy;
+ ngx_chain_t *busy[2];
ngx_int_t status;
#define NGX_JS_EVENT_UPLOAD 0
#define NGX_JS_EVENT_DOWNLOAD 1
@@ -611,7 +611,8 @@ ngx_stream_js_body_filter(ngx_stream_session_t *s, ngx_chain_t *in,
if (out != NULL || dst == NULL || dst->buffered) {
rc = ngx_stream_next_filter(s, out, from_upstream);
- ngx_chain_update_chains(c->pool, &ctx->free, &ctx->busy, &out,
+ ngx_chain_update_chains(c->pool, &ctx->free,
+ &ctx->busy[!!from_upstream], &out,
(ngx_buf_tag_t) &ngx_stream_js_module);
} else {
--
2.20.1