New-offset is off between 0.2.0 and 0.3.0, resulting in reprocessing last record (or many records) on worker restart #48

jkgenser · 2020-11-30T14:14:23Z

Checklist

I have included information about relevant versions
I have verified that the issue persists when using the master branch of Faust.

Steps to reproduce

Pretty much any time you restart a worker, it will replay the last message it received. So if there were messages [0,1,2,3,4] that a worker processed, then restart, it will re-process item 4. This will mess up any analytics that are based on stateful counts. With a trivial case of incrementing a counter in a table, this can consistently reproduced by simply restarting and starting a worker and finding the last id continue to increment even though there were no new messages to the underlying topic.
If using the group_by functionality to re-partition a stream, I am finding that it will replay ALL of the messages resulting in much more duplicates than simply +1 to counts.

Expected behavior

Do not replay the most recent message.

Actual behavior

Replays messages on restart.

Full traceback

Paste the full traceback (if there is any)

Versions

Python version: 3.7
Faust version: 0.3.0
Operating system: ubuntu 18.04
Kafka version: latest
RocksDB version (if applicable)

The text was updated successfully, but these errors were encountered:

#49) * Fixing issues #47 and #48 * fix linting

patkivikram added a commit that referenced this issue Nov 30, 2020

Fixing issues #47 and #48

f125677

patkivikram mentioned this issue Nov 30, 2020

Fix recovery issue in transaction and reprocessing message in consumer #49

Merged

patkivikram added a commit that referenced this issue Nov 30, 2020

Fix recovery issue in transaction and reprocessing message in consumer (

be1e6db

#49) * Fixing issues #47 and #48 * fix linting

patkivikram closed this as completed Dec 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New-offset is off between 0.2.0 and 0.3.0, resulting in reprocessing last record (or many records) on worker restart #48

New-offset is off between 0.2.0 and 0.3.0, resulting in reprocessing last record (or many records) on worker restart #48

jkgenser commented Nov 30, 2020 •

edited

Loading

New-offset is off between 0.2.0 and 0.3.0, resulting in reprocessing last record (or many records) on worker restart #48

New-offset is off between 0.2.0 and 0.3.0, resulting in reprocessing last record (or many records) on worker restart #48

Comments

jkgenser commented Nov 30, 2020 • edited Loading

Checklist

Steps to reproduce

Expected behavior

Actual behavior

Full traceback

Versions

jkgenser commented Nov 30, 2020 •

edited

Loading