You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[] I have included information about relevant versions
[] I have verified that the issue persists when using the master branch of Faust.
Steps to reproduce
Application is configured with processing_guarantee="exactly_once"
publish 5 messages to a topic, keyed by id
repartition the topic using group_by(new_id)
increment count on table with keys that are new_id
Initially, the worker is up, processes the messages and stores the correct data in the changelog topic.
Then I send SIGTERM to stop the worker
When restarting the worker, it gets stuck on recovering per the logs below.
Tell us what you did to cause something to happen.
Possibly some issue with the transaction producer and a transaction potentially getting aborted leads to worker not able to recover.
Expected behavior
Tell us what you expected to happen.
Worker recovers and is able to process events.
Actual behavior
Tell us what happened instead.
Worker hangs on recovery.
Full traceback
Log showing this behavior.
[2020-11-29 15:55:45,969] [114] [WARNING] [^---Recovery]: No event received for active tp TP(topic='meteor-submission-count-by-workflow-changelog', partition=0) in the last 30.0 seconds (last event received 1.04 minute ago)
[2020-11-29 15:55:50,974] [114] [WARNING] [^---Recovery]: No event received for active tp TP(topic='meteor-submission-count-by-workflow-changelog', partition=0) in the last 30.0 seconds (last event received 1.13 minute ago)
[2020-11-29 15:55:55,970] [114] [WARNING] [^---Recovery]: No event received for active tp TP(topic='meteor-submission-count-by-workflow-changelog', partition=0) in the last 30.0 seconds (last event received 1.21 minute ago)
[2020-11-29 15:56:00,976] [114] [WARNING] [^---Recovery]: No event received for active tp TP(topic='meteor-submission-count-by-workflow-changelog', partition=0) in the last 30.0 seconds (last event received 1.29 minute ago)
[2020-11-29 15:56:05,972] [114] [WARNING] [^---Recovery]: No event received for active tp TP(topic='meteor-submission-count-by-workflow-changelog', partition=0) in the last 30.0 seconds (last event received 1.38 minute ago)
[2020-11-29 15:56:10,977] [114] [WARNING] [^---Recovery]: No event received for active tp TP(topic='meteor-submission-count-by-workflow-changelog', partition=0) in the last 30.0 seconds (last event received 1.46 minute ago)
[2020-11-29 15:56:15,974] [114] [WARNING] [^---Recovery]: No event received for active tp TP(topic='meteor-submission-count-by-workflow-changelog', partition=0) in the last 30.0 seconds (last event received 1.54 minute ago)
[2020-11-29 15:56:20,979] [114] [WARNING] [^---Recovery]: No event received for active tp TP(topic='meteor-submission-count-by-workflow-changelog', partition=0) in the last 30.0 seconds (last event received 1.63 minute ago)
[2020-11-29 15:56:25,975] [114] [WARNING] [^---Recovery]: No event received for active tp TP(topic='meteor-submission-count-by-workflow-changelog', partition=0) in the last 30.0 seconds (last event received 1.71 minute ago)
[2020-11-29 15:56:30,980] [114] [WARNING] [^---Recovery]: No event received for active tp TP(topic='meteor-submission-count-by-workflow-changelog', partition=0) in the last 30.0 seconds (last event received 1.79 minute ago)
[2020-11-29 15:56:35,977] [114] [WARNING] [^---Recovery]: No event received for active tp TP(topic='meteor-submission-count-by-workflow-changelog', partition=0) in the last 30.0 seconds (last event received 1.88 minute ago)
[2020-11-29 15:56:40,982] [114] [WARNING] [^---Recovery]: Recovery has not flushed buffers in the last 120.0 seconds (last flush was 2.00 minutes ago). Current total buffer size: 5
Versions
Python version: 3.7
Faust version: 0.3.0
Operating system: Ubuntu 18:04
Kafka version: 2.6.0
RocksDB version (if applicable)
The text was updated successfully, but these errors were encountered:
Checklist
master
branch of Faust.Steps to reproduce
Application is configured with
processing_guarantee="exactly_once"
publish 5 messages to a topic, keyed by
id
repartition the topic using
group_by(new_id)
increment count on table with keys that are
new_id
Initially, the worker is up, processes the messages and stores the correct data in the changelog topic.
Then I send SIGTERM to stop the worker
When restarting the worker, it gets stuck on recovering per the logs below.
Tell us what you did to cause something to happen.
Possibly some issue with the transaction producer and a transaction potentially getting aborted leads to worker not able to recover.
Expected behavior
Tell us what you expected to happen.
Worker recovers and is able to process events.
Actual behavior
Tell us what happened instead.
Worker hangs on recovery.
Full traceback
Log showing this behavior.
Versions
The text was updated successfully, but these errors were encountered: