-
Notifications
You must be signed in to change notification settings - Fork 2.3k
MySQL queues: reduce the impact of racing by popping the queue 1 by 1 #1355
MySQL queues: reduce the impact of racing by popping the queue 1 by 1 #1355
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #1355 +/- ##
============================================
+ Coverage 64.30% 64.31% +0.01%
- Complexity 2846 2847 +1
============================================
Files 241 241
Lines 14317 14316 -1
Branches 1409 1410 +1
============================================
+ Hits 9206 9208 +2
+ Misses 4330 4327 -3
Partials 781 781
Continue to review full report at Codecov.
|
Pull Request Test Coverage Report for Build 3229
💛 - Coveralls |
@s50600822 @jvemugunta @mashurex As primary contributors to the MySQL module, we would appreciate it if you could help us out with reviewing this pull request. |
@ggrekhov @s50600822 @kishorebanala Here's what happens and why it's trying to do them all at once:
In a highly concurrent environment I can see why this leads to issues. The most satisfactory solution would probably not be to pop each individual one after collecting them as this is going to lead to data consistency issues and (probably) the same task being handled twice. I think a better solution would be to implement a time based visibility column that would allow us to collect and safely mark a task in the queue as 'invisible' for some window of time (like the Then we could implement a more efficient batch pop group of statements that won't introduce the deadlocks. EDIT |
@ggrekhov Please makes changes as suggested above. |
Hey @ggrekhov, Are you still considering working on this feature? |
Hi @kishorebanala. To be honest I didn't get why my solution is wrong. @mashurex, you say: I admit that my implementation brings a bit new behavior of By the way, this PR has been deployed in our production instance for several months already - we didn't experience any inconsistency issues in our workflows. |
@apanicker-nflx @kishorebanala @ggrekhov @mashurex Could y'all follow up with @ggrekhov 's comment? It looks like the Postgres one was approved and merged: #1741 I'm not sure what the difference would be between Postgres and Mysql implementation where one would be ok, but other not in this instance. |
Merging this because the corresponding change in postgres was merged and the OSS community seems to agree. Please post an issue if this causes any issues with the MySQL persistence implementation. |
As I understand, the current implementation is done in this way (everything or nothing) just because there is no way to get the updated row IDs from a MySQL response. And therefore the only consistent behavior is to completely cancel the partial update as it conflicts with another worker.
But it hurts a lot if there is a huge queue of tasks and a lot of workers trying to poll - they get stuck in a deadlock trying to do
set popped = true
for a bunch of messages.In this change the messages are marker as popped one by one for each row and the popped ones are returned.
EDIT
Closes #577