Skip to content

Rocket.Chat stability issue when many users disconnected and connected again #21182

@emikolajczak

Description

@emikolajczak

Description:

Hi,
From time to time we have observing really serious Rocket.Chat stability issues. Duo to network problem, about half of our 3000 connected users are disconnected and connected again to Rocket.Chat. This situation caused under 100% processors load on Rocket docker containers and unavailability (messages greyed out, client disconnects). Also high containers RAM and Garbage Collector usage (screens below). It was probably caused by presence broadcast storm, because when we switched on "Disable Presence Broadcast", after couple of minutes everything back to normal. Switch off it again caused above problems and unavailability.
Permanent switch on "Disable Presence Broadcast" is not an option because user presence is not correct.

Steps to reproduce:

  1. Server with about 3000 users online
  2. In example restart reverse proxy (HAproxy, Nginx...)
  3. Users are disconnected and connected again
  4. Rocket.Chat is unavailable due to high load

Expected behavior:

Disconnect and connect again a lot of users same time not cause Rocket unavailable

Actual behavior:

Rocket is unavailable

Server Setup Information:

  • Version of Rocket.Chat Server: 3.9.7
  • Operating System: Centos7
  • Deployment Method: docker-compose
  • Number of Running Instances: 33
  • DB Replicaset Oplog: YES
  • NodeJS Version: v12.18.4
  • MongoDB Version: 4.0.14

Client Setup Information

  • Desktop App or Browser Version: All
  • Operating System: All

Additional context

Docker contaniers CPU usage
image

Users sessions
image

Garbage collector
image

Heavy rocketchat.users collection updates
image

Relevant logs:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions