-
Notifications
You must be signed in to change notification settings - Fork 2.3k
mysql unack cause sub_workflow executed twice #1859
Comments
@jvemugunta @rickfish @s50600822 As contributors to this module and users of the MySQL persistence/queueing layers, could you please help debug this issue? Thanks. |
I noticed the same issue with postgres persistence, went to check the query and it is the same:
I am using distributed locking with redis and pgsql as storage backend, and for some tasks they are being polled twice. |
@HenryLauu, @apanicker-nflx I imagine this only happens if you have more than one instance of Conductor running, right? Otherwise I think there is only one thread devoted to listening for each system task so it would not be possible for simultaneous polling of the SUB_WORKFLOW queue. Or am I missing something? I am not saying that you should only have one instance running, just trying to clarify the scenario. There is definitely a possibility (if more than one instance of conductor is running) that both instances will get the same SUB_WORKFLOW task. In SystemTaskExecutor.pollAndExecute(), if queueDAO.pop() is called in one instance and it has not yet gone through the list of taskIds and called executionService.ackTaskReceived(), the other instance could then get the same task on its pollAndExecute() call. Not sure how to fix this. It is definitely a small gap where it could be an issue but the possibility definitely exists. Note that this has nothing to do with the functionality of processAllUnacks()... |
I changed the query of processAllUnacks to decrement -60 instead of increment +60 and after a full night running some tests it seems to poll only each task once. I wrote a dummy workflow with three HTTP tasks calling a route that returns its value after 60 seconds (to simulate delay / processing time) and it seems to work well now... Three tasks per workflow, 3 route calls, as follows:
|
would it be better to set default value = -60 and allow users to set their desired value at config file? BTW, I think config value should be positive and negative sign is processed in the code automatically. |
My 2 cents: I am not sure that this is a good candidate for a property. Just describing what that property controls is difficult. Also, I think that it would be difficult for someone to figure out what to use as that property's value and why would it vary by installation anyway? I think that if -60 works for you that it can just be hard-coded. Just my opinion, curious what others think... |
Hello @rickfish @apanicker-nflx this is obviously concurrent bug which need to be fixed. Please make a consideration. |
@HenryLauu @apanicker-nflx @deluxor I created PR #1875 to fix this. Unfortunately, it caused the MySQLQueueDAOTest.processUnacksTest to fail. I will have to look into it. |
processAllUnacks schedule thread set popped=false before this message was acked by SystemTaskExecutor. below are timeline:
I don't understand this sql :
UPDATE queue_message SET popped = false WHERE popped = true AND TIMESTAMPADD(SECOND,60,CURRENT_TIMESTAMP) > deliver_on
should it be TIMESTAMPADD(SECOND,-60,CURRENT_TIMESTAMP) > deliver_on? not add but minus?
@apanicker-nflx
The text was updated successfully, but these errors were encountered: