Skip to content

nginx worker goes to 100% cpu utilization with njs_stream #413

Closed
@shankerwangmiao

Description

@shankerwangmiao

When using nginx 1.18.0 with njs 0.6.0 on a high traffic server, it can be spotted some of the nginx worker processes are utilizing 100% of cpu time.

Using lsof, tcpdump, and strace, it can be verified that the data stream is running through. Using perf to get samples from the process in question and analyze hot spot, I can find that it takes the worker 99.98% of cpu time to reach the end of the linked list *busy in https://github.com/nginx/nginx/blob/release-1.18.0/src/core/ngx_buf.c#L195 .

I analyze the core image of the worker process and confirmed that the *busy chain is really really long, piled up with buffers with the tag ngx_stream_proxy_module. Some of the chain objects are referring the same buf object, and some of the buf objects are within the range of session->upstream->upstream_buf and the others session->upstream-> downstream_buf.

I believe this issue is because in njs_stream module, buf objects from both directions are mixed and appended into the same *busy chain. Suppose ngx_stream_js_body_filter is handling a chain of one buf object from upstream and a chain of two buf objects from downstream and the first buf is not sent entirely and moved to *busy chain. If the two following buf objects from downstream are fully sent, they will still be appended to the *busy chain, and not moved to the *free chain because there is one chain containing the first buf, which is not “empty”. However, when the control flow returns to ngx_stream_proxy_process, things become different: busy chain is separate for the two direction and the two buf objects will be put into u->free and be reused. Considering u->free chain is last-in-first-out, these objects will be soon reused. Remember now there are still references from the *busy chain in ngx_stream_js_body_filter to these objects. The chain objects there may have no chance to be freed and begin to pile up.

The root problem is that modules in the chain of buf processing should have the same view whether the buf object referred by a chain is busy or not. Since the *busy chain is separated in ngx_stream_proxy_module, we should also separate them in ngx_stream_js_module.

Since applying the following patch on our production environment, the symptom has not happened again.

From 7d0a302c591ac64642bae578b0eb9e4f63b83611 Mon Sep 17 00:00:00 2001
From: Miao Wang <shankerwangmiao@gmail.com>
Date: Wed, 11 Aug 2021 11:44:12 +0800
Subject: [PATCH] stream: seperate busy chain for two directions

---
 nginx/ngx_stream_js_module.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/nginx/ngx_stream_js_module.c b/nginx/ngx_stream_js_module.c
index e7b29a1..24054e9 100644
--- a/nginx/ngx_stream_js_module.c
+++ b/nginx/ngx_stream_js_module.c
@@ -49,7 +49,7 @@ typedef struct {
     ngx_buf_t              *buf;
     ngx_chain_t           **last_out;
     ngx_chain_t            *free;
-    ngx_chain_t            *busy;
+    ngx_chain_t            *busy[2];
     ngx_int_t               status;
 #define NGX_JS_EVENT_UPLOAD   0
 #define NGX_JS_EVENT_DOWNLOAD 1
@@ -611,7 +611,8 @@ ngx_stream_js_body_filter(ngx_stream_session_t *s, ngx_chain_t *in,
     if (out != NULL || dst == NULL || dst->buffered) {
         rc = ngx_stream_next_filter(s, out, from_upstream);
 
-        ngx_chain_update_chains(c->pool, &ctx->free, &ctx->busy, &out,
+        ngx_chain_update_chains(c->pool, &ctx->free,
+                                &ctx->busy[!!from_upstream], &out,
                                 (ngx_buf_tag_t) &ngx_stream_js_module);
 
     } else {
-- 
2.20.1

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions