-
Notifications
You must be signed in to change notification settings - Fork 158
Description
Currently, we do not set any AckWait or MaxAckPending Options. They default to 30sec and 1000 pending Acks. This is a problem for slow consumers lick the activitylog, userlog and others.
AckWait should be increased to align nets expectations with how long it takes a consumer to actually ack an event. This is an upper bound: when NATS does not receive an Ack in time it will resend the event, causing even more load on the consumer.
MaxAckPending should be decreased to limit the number of events that are processed concurrently. This is also an upper bound: NATS will stop sending events to the consumer when this limit is reached. It will still queue events internally and send them once the consumer has acked events.
While the userlog and frontend services have a configurable MAX_WORKERS setting, the activitylog service is hardcodet to 1. But nats will happily send it 1000 events and resend them if they have not been acked in 30sec. the activitylog is actually very slow when processing events ... longer than 30sec ... which is a different issue we need to investigate.
Expected behavior
We should limit the services to a reasonable default for a single node machine:
| AckWait | MaxAckPending | |
|---|---|---|
| activitylog | 3 | 30 |
| userlog | 1 | 30 |
| sse | ? | ? |
| others | ? | ? |
The defaults need to be configurable.
Related Issues
This is the underlying cause for issues like:
- Activitylog service creates endless load on the backend after uploading #716
- Increased CPU load when idle #779
We may be dumping thousands of events on the activitylog service. we could reduce the load by limiting the number of concurrent requests further. At the cost of having to consume events for a longer period of time.
Todo
- Reva contains an events abstraction that needs to learn about AckWait: add ConsumerOptions reva#205
- opencloud and reva need to use the new Consume*WithOptions to set an AckWait
- they need to be configurable in the helm chart
- go-micro natsjs, the events implementation for NATS, needs a config option to support MaxAckPending because we cannot set this through the go micro ConsumeOption struct used in the stream.Consume interface.
- we currently replace the go micro plugins with https://github.com/kobergj/plugins/ - we should use an opencloud repository
- opencloud and reva need to use the MaxAckPending option for natsjs
- they need to be configurable in the helm chart
or maybe replace the go micro events interface with a direct use of nats. one less dependency on micro
Metadata
Metadata
Assignees
Labels
Type
Projects
Status