Some events are sent twice

**Symptoms**

Testing a full snapshot creation in one of our systems, I found 405555 log lines about events having been submitted to Nakadi, but only 347889 snapshots should have been created. (The whole snapshot creation took 91 minutes.)

Looking into the logs, I find that some events are sent twice:

> 2018-01-30 12:43:49,732  INFO defaultTaskScheduler-1 org.zalando.nakadiproducer.transmission.impl.EventTransmissionService.sendEvent(EventTransmissionService.java:51): Event 762966 locked by 67c108fb-e689-4595-a28b-e78ab4030726 was sucessfully transmitted to nakadi
> ...
> 2018-01-30 12:44:25,618  INFO pool-6-thread-1 org.zalando.nakadiproducer.transmission.impl.EventTransmissionService.sendEvent(EventTransmissionService.java:51): Event 762966 locked by 2f9e651b-49ef-4f7c-a1ba-2d5d976abed2 was sucessfully transmitted to nakadi

**Analysis**

The event sending (from the event log) works like this:

* update all DB entries without lockTime, or with lockTime before now, setting a lock time 10 minutes into the future (and an identifier for the lock)
* fetch all entries which are locked by my identifier
* for each entry:
   * send the event for that entry
   * delete the entry from the DB. 

The whole thing is a task scheduled every second.
I think the problem occurs if the last loop takes longer than 10 minutes: in this case the next task can lock the same entries again (and start processing), while the first one is still processing them.

**Proposed solution**

The first step (which locks entries) should get a limit which locks only that many entries as can be comfortably sent by one thread in a time a lot lower than 10 minutes (to make sure that even in extraordinary cases they are all processed in the 10 minute window).
In our example, we had a submission rate of 70-100/s (from our Berlin data center to a Nakadi instance in Ireland's AWS, with single-event POSTs of events with average size 3KB), so a limit of 200 should be more than enough, and might even speed things up by using more parallel processing. With batch submission (#48), likely higher rates can be reached, and the limit should be thus increased a bit more.

An alternative would be to regularly check the current time in the processing thread, and just stop when it is comes close to the lock time, then letting the next scheduled task take over.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some events are sent twice #67

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some events are sent twice #67

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions