Skip to content

Set sane defaults for nats consumers #783

@butonic

Description

@butonic

Currently, we do not set any AckWait or MaxAckPending Options. They default to 30sec and 1000 pending Acks. This is a problem for slow consumers lick the activitylog, userlog and others.

AckWait should be increased to align nets expectations with how long it takes a consumer to actually ack an event. This is an upper bound: when NATS does not receive an Ack in time it will resend the event, causing even more load on the consumer.
MaxAckPending should be decreased to limit the number of events that are processed concurrently. This is also an upper bound: NATS will stop sending events to the consumer when this limit is reached. It will still queue events internally and send them once the consumer has acked events.

While the userlog and frontend services have a configurable MAX_WORKERS setting, the activitylog service is hardcodet to 1. But nats will happily send it 1000 events and resend them if they have not been acked in 30sec. the activitylog is actually very slow when processing events ... longer than 30sec ... which is a different issue we need to investigate.

Expected behavior

We should limit the services to a reasonable default for a single node machine:

AckWait MaxAckPending
activitylog 3 30
userlog 1 30
sse ? ?
others ? ?

The defaults need to be configurable.

Related Issues

This is the underlying cause for issues like:

We may be dumping thousands of events on the activitylog service. we could reduce the load by limiting the number of concurrent requests further. At the cost of having to consume events for a longer period of time.

Todo

  • Reva contains an events abstraction that needs to learn about AckWait: add ConsumerOptions reva#205
    • opencloud and reva need to use the new Consume*WithOptions to set an AckWait
    • they need to be configurable in the helm chart
  • go-micro natsjs, the events implementation for NATS, needs a config option to support MaxAckPending because we cannot set this through the go micro ConsumeOption struct used in the stream.Consume interface.
    • we currently replace the go micro plugins with https://github.com/kobergj/plugins/ - we should use an opencloud repository
    • opencloud and reva need to use the MaxAckPending option for natsjs
    • they need to be configurable in the helm chart

or maybe replace the go micro events interface with a direct use of nats. one less dependency on micro

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    Status

    Qualification

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions