Skip to content

[BUG] Ran Out of Memory for Shared Memory on passive node #3656

@NicoFrLy

Description

@NicoFrLy

OpenSIPS version you are running

version: opensips 3.5.5 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: 2ef56f0dc
main.c compiled on  with gcc 12

Describe the bug

We have 2 cluster of OpenSips. On each, the passive node is often crashing after all the Shared Memory is consumed (ran out of memory).

We have these message :
/var/log/syslog:May 28 09:38:25 opensips[518839]: May 28 09:38:25 [518839] ERROR:dialog:receive_dlg_repl: Failed to process a binary packet!

And then during the crash :
/var/log/syslog:May 28 09:38:25 opensips[518839]: May 28 09:38:25 [518839] ERROR:dialog:receive_dlg_repl: Failed to process a binary packet!
/var/log/syslog:May 28 09:38:25 opensips[518839]: May 28 09:38:25 [518839] ERROR:dialog:receive_dlg_repl: Failed to process a binary packet!
/var/log/syslog:May 28 09:38:26 opensips[518847]: May 28 09:38:26 [518847] ERROR:core:qm_malloc_dbg: not enough free shm memory (0 bytes left, need 0), please increase the "-m" command line parameter!
/var/log/syslog:May 28 09:38:26 opensips[518847]: May 28 09:38:26 [518847] ERROR:cgrates:cgr_loaded_callback: out of shm mem!
/var/log/syslog:May 28 09:38:26 opensips[518847]: May 28 09:38:26 [518847] CRITICAL:core:sig_usr: segfault in process pid: 518847, id: 27
/var/log/syslog:May 28 09:38:26 kernel: [1805610.883896] opensips[518847]: segfault at 0 ip 00007f6f5f1ac134 sp 00007fff8f70ff20 error 6 in cgrates.so[7f6f5f19b000+19000] likely on CPU 6 (core 0, socket 12)
/var/log/syslog:May 28 09:38:26 kernel: [1805610.883930] Code: 00 00 00 00 83 e0 f8 88 03 5b c3 66 66 2e 0f 1f 84 00 00 00 00 00 53 48 8b 57 30 48 89 fb 48 8b 47 28 66 0f 6f 05 5c c0 00 00 <48> 89 02 48 89 50 08 0f 11 47 28 e8 1c f4 fe ff 48 89 df 5b b9 6b
/var/log/syslog:May 28 09:38:26 opensips[518820]: Thank you for running opensips
/var/log/syslog:May 28 09:38:26 systemd[1]: opensips.service: Main process exited, code=exited, status=11/n/a
/var/log/syslog:May 28 09:38:26 systemd[1]: opensips.service: Failed with result 'exit-code'.
/var/log/syslog:May 28 09:38:26 systemd[1]: opensips.service: Consumed 12min 4.469s CPU time.
/var/log/syslog:May 28 09:38:26 systemd[1]: opensips.service: Scheduled restart job, restart counter is at 47.
/var/log/syslog:May 28 09:38:26 systemd[1]: Stopped opensips.service - OpenSIPS is a very fast and flexible SIP (RFC3261) server.
/var/log/syslog:May 28 09:38:26 systemd[1]: opensips.service: Consumed 12min 4.469s CPU time.
/var/log/syslog:May 28 09:38:26 systemd[1]: Starting opensips.service - OpenSIPS is a very fast and flexible SIP (RFC3261) server...

To Reproduce

It is not easy to reproduce in our Preproduction servers, as I don't know what is causing this memory leak.
We have Cgrates modules activated, and it seems that this is module that is causing the shared memory leak (to be confirmed).
The memory leak is related to the number of call (seems like the calls are not release from memory).
There is no issue on the active nodes (all Shared and Private memory are good).

Expected behavior
No memory leak and no error message.

Relevant System Logs

OS/environment information

  • Operating System: Debian 12 Bookworm (up to date)
  • OpenSIPS installation: Installation from APT repository.
  • other relevant information:

Additional context
I don't know when it has started, but seems to be related to the update from 3.4 to 3.5.2 a few months ago.

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions