Skip to content
This repository has been archived by the owner on Dec 13, 2023. It is now read-only.

Operation: ( processUnacks ) failed on key: [conductor_queue.test.UNACK._deciderQueue.c ] #3755

Open
dpozinen opened this issue Aug 30, 2023 · 0 comments
Labels
type: bug bugs/ bug fixes

Comments

@dpozinen
Copy link

Describe the bug
RedisDynoQueue starts failing unpredictably when using memory db type, logs show
Operation: ( processUnacks ) failed on key: [conductor_queue.test.UNACK._deciderQueue.c ].

Upon overriding the class to add more detailed logs, I can see and NPE here in redis mock is the cause

ERROR [up-be-conductor-server,,] 1 --- [ool-17-thread-1] c.n.d.q.r.RedisDynoQueue                 : Error while processing unacks. Operation: ( processUnacks ) failed on key: [conductor_queue.test.UNACK._deciderQueue.c ].
java.lang.RuntimeException: Operation: ( processUnacks ) failed on key: [conductor_queue.test.UNACK._deciderQueue.c ].
at com.netflix.dyno.queues.redis.QueueUtils.executeWithRetry(QueueUtils.java:47) ~[dyno-queues-redis-2.0.22.jar!/:2.0.22]
at com.netflix.dyno.queues.redis.QueueUtils.execute(QueueUtils.java:29) ~[dyno-queues-redis-2.0.22.jar!/:2.0.22]
at com.netflix.dyno.queues.redis.RedisDynoQueue.processUnacks(RedisDynoQueue.java:1426) ~[classes!/:2.0.22]
at com.netflix.dyno.queues.redis.RedisDynoQueue.lambda$new$1(RedisDynoQueue.java:144) ~[classes!/:2.0.22]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: redis.clients.jedis.exceptions.JedisException: java.lang.NullPointerException
at com.netflix.conductor.redis.jedis.JedisMock.zrangeByScoreWithScores(JedisMock.java:958) ~[conductor-redis-persistence-3.13.8.jar!/:3.13.8]
at com.netflix.dyno.queues.redis.RedisDynoQueue.lambda$processUnacks$23(RedisDynoQueue.java:1435) ~[classes!/:2.0.22]
at com.netflix.dyno.queues.redis.QueueUtils.executeWithRetry(QueueUtils.java:36) ~[dyno-queues-redis-2.0.22.jar!/:2.0.22]
... 9 more
Caused by: java.lang.NullPointerException

I've looked at the library used and it hasn't been updated since 2015.

Details
Conductor version: 3.13.8
Persistence implementation: memory
Platform: Macbook Pro M1
Docker Engine: 20.10.23

Additional context
I am running conductor locally inside docker and executing random workflows, as part of a test. It seems to be happening on any kind of workflow, as long as it runs long enough (1m+).
It happens on random parts of the workflow too, and sometimes (although rarely) may not happen at all, even on the same workflow. Conductor does not recover after this error, once encountered it is logged indefinitely and the workflow is not executed.

I realize that the memory db option isn't stable, but I think I'm using it as intended. I also realize that this seems to be potentially a bug in the mock library, but either way it is impacting conductor, and I think moving away or forking that library to fix potential bugs is the way to go here, since it hasn't been updated since 2015.

@dpozinen dpozinen added the type: bug bugs/ bug fixes label Aug 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: bug bugs/ bug fixes
Projects
None yet
Development

No branches or pull requests

1 participant