Skip to content

[CRASH] Repeat crash of backup service Opensips once upgraded to 3.5.7 #3720

@NicoFrLy

Description

@NicoFrLy

OpenSIPS version you are running

version: opensips 3.5.7 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: a6ddbb970
main.c compiled on  with gcc 12

Crash Core Dump

opensips-01.log

Describe the traffic that generated the bug
Only with a few call. We had this alert on the active node :

Sep 15 22:00:39 OPENSIPS-02 opensips[1396185]: Sep 15 22:00:39 [1396185] ERROR:dialog:replicate_dialog_value: Failed to replicate dialog values
Sep 15 22:00:39 OPENSIPS-02 opensips[1396200]: Sep 15 22:00:39 [1396200] ERROR:core:tcp_async_connect: poll error: flags 1c
Sep 15 22:00:39 OPENSIPS-02 opensips[1396200]: Sep 15 22:00:39 [1396200] ERROR:core:tcp_async_connect: failed to retrieve SO_ERROR [server=192.168.X.X:5080] (111) Connection refused
Sep 15 22:00:39 OPENSIPS-02 opensips[1396200]: Sep 15 22:00:39 [1396200] ERROR:proto_bin:proto_bin_send: async TCP connect failed
Sep 15 22:00:39 OPENSIPS-02 opensips[1396200]: Sep 15 22:00:39 [1396200] ERROR:clusterer:msg_send: send() to 192.168.X.X:5080 for proto bin/8 failed
Sep 15 22:00:39 OPENSIPS-02 opensips[1396200]: Sep 15 22:00:39 [1396200] ERROR:clusterer:do_action_trans_0: Failed to send ping to node [2]
Sep 15 22:00:40 OPENSIPS-02 opensips[1396180]: Sep 15 22:00:40 [1396180] ERROR:dialog:replicate_dialog_deleted: Failed to replicate deleted dialog
Sep 15 22:00:40 OPENSIPS-02 opensips[1396185]: Sep 15 22:00:40 [1396185] ERROR:dialog:replicate_dialog_value: Failed to replicate dialog values
Sep 15 22:00:50 OPENSIPS-02 opensips[1396193]: Sep 15 22:00:50 [1396193] CRITICAL:cgrates:cgr_ref_acc_ctx:
Sep 15 22:00:50 OPENSIPS-02 opensips[1396193]: >>> ref=-1 ctx=0x7fbaf18da558 gone negative!
Sep 15 22:00:50 OPENSIPS-02 opensips[1396193]: It seems you have hit a programming bug.
Sep 15 22:00:50 OPENSIPS-02 opensips[1396193]: Please help us make OpenSIPS better by reporting it at https://github.com/OpenSIPS/opensips/issues
Sep 15 22:00:56 OPENSIPS-02 opensips[1396188]: Sep 15 22:00:56 [1396188] CRITICAL:cgrates:cgr_ref_acc_ctx:
Sep 15 22:00:56 OPENSIPS-02 opensips[1396188]: >>> ref=-1 ctx=0x7fbaf192fb50 gone negative!
Sep 15 22:00:56 OPENSIPS-02 opensips[1396188]: It seems you have hit a programming bug.
Sep 15 22:00:56 OPENSIPS-02 opensips[1396188]: Please help us make OpenSIPS better by reporting it at https://github.com/OpenSIPS/opensips/issues
Sep 15 22:00:59 OPENSIPS-02 opensips[1396189]: Sep 15 22:00:59 [1396189] CRITICAL:cgrates:cgr_ref_acc_ctx:
Sep 15 22:00:59 OPENSIPS-02 opensips[1396189]: >>> ref=-1 ctx=0x7fbaf18e8390 gone negative!
Sep 15 22:00:59 OPENSIPS-02 opensips[1396189]: It seems you have hit a programming bug.
Sep 15 22:00:59 OPENSIPS-02 opensips[1396189]: Please help us make OpenSIPS better by reporting it at https://github.com/OpenSIPS/opensips/issues
Sep 15 22:01:31 OPENSIPS-02 opensips[1396181]: Sep 15 22:01:31 [1396181] ERROR:core:tcp_async_connect: poll error: flags 1c
Sep 15 22:01:31 OPENSIPS-02 opensips[1396181]: Sep 15 22:01:31 [1396181] ERROR:core:tcp_async_connect: failed to retrieve SO_ERROR [server=192.168.X.X:5080] (111) Connection refused
Sep 15 22:01:31 OPENSIPS-02 opensips[1396181]: Sep 15 22:01:31 [1396181] ERROR:proto_bin:proto_bin_send: async TCP connect failed
Sep 15 22:01:31 OPENSIPS-02 opensips[1396181]: Sep 15 22:01:31 [1396181] ERROR:clusterer:msg_send: send() to 192.168.X.X:5080 for proto bin/8 failed
Sep 15 22:01:31 OPENSIPS-02 opensips[1396181]: Sep 15 22:01:31 [1396181] ERROR:clusterer:msg_send_retry: msg_send() to node [2] failed
Sep 15 22:01:31 OPENSIPS-02 opensips[1396181]: Sep 15 22:01:31 [1396181] ERROR:dialog:replicate_dialog_value: Error sending in cluster: 1
Sep 15 22:01:31 OPENSIPS-02 opensips[1396181]: Sep 15 22:01:31 [1396181] ERROR:dialog:replicate_dialog_value: Failed to replicate dialog values
`

To Reproduce

Upgrade to OpenSips 3.5.7 with Cgrates modules
Have 2 nodes with keepalived, sharing one "VIP" tag. Errors are seen on both node (backup node crash every few minutes, active node has errors "It seems you have hit a programming bug.").

Relevant System Logs

OS/environment information

  • Operating System: Debian 12
  • OpenSIPS installation: APT Debian repo
  • other relevant information:

Additional context

We upgraded following the fix on this issue (#3656). Maybe the fix introduce a regression ?

Metadata

Metadata

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions