This repository was archived by the owner on Jan 6, 2023. It is now read-only.
This repository was archived by the owner on Jan 6, 2023. It is now read-only.
IndexOutOfBoundsException from aeron #900
Open
Description
I tried upgrading our Onyx system to 0.14.6 this morning and I'm getting errors on startup, it looks to be every task blowing up. There are two versions of the error:
19-10-22 20:16:14 robert-downey-jr-master-5c869697c5-dkm7w WARN [onyx.messaging.aeron.status-publisher:40] - Aeron status channel error
java.lang.Thread.run Thread.java: 748
org.agrona.concurrent.AgentRunner.run AgentRunner.java: 164
org.agrona.concurrent.AgentRunner.doDutyCycle AgentRunner.java: 283
io.aeron.ClientConductor.doWork ClientConductor.java: 191
io.aeron.ClientConductor.service ClientConductor.java: 896
io.aeron.DriverEventsAdapter.receive DriverEventsAdapter.java: 63
org.agrona.concurrent.broadcast.CopyBroadcastReceiver.receive CopyBroadcastReceiver.java: 116
io.aeron.DriverEventsAdapter.onMessage DriverEventsAdapter.java: 123
io.aeron.command.ImageBuffersReadyFlyweight.sourceIdentity ImageBuffersReadyFlyweight.java: 239
org.agrona.concurrent.UnsafeBuffer.getStringAscii UnsafeBuffer.java: 1085
org.agrona.concurrent.UnsafeBuffer.getStringAscii UnsafeBuffer.java: 1134
org.agrona.concurrent.UnsafeBuffer.boundsCheck0 UnsafeBuffer.java: 1716
java.lang.IndexOutOfBoundsException: index=124 length=822083584 capacity=4096
and specific task versions in poll-recover:
19-10-22 20:16:14 robert-downey-jr-master-5c869697c5-dkm7w WARN [onyx.peer.task-lifecycle:177] -
java.lang.Thread.run Thread.java: 748
java.util.concurrent.ThreadPoolExecutor$Worker.run ThreadPoolExecutor.java: 624
java.util.concurrent.ThreadPoolExecutor.runWorker ThreadPoolExecutor.java: 1149
...
clojure.core.async/thread-call/fn async.clj: 434
onyx.peer.task-lifecycle/start-task-lifecycle!/fn task_lifecycle.clj: 1155
onyx.peer.task-lifecycle/run-task-lifecycle! task_lifecycle.clj: 551
onyx.peer.task-lifecycle.TaskStateMachine/next-replica! task_lifecycle.clj: 961
onyx.messaging.messenger-state/next-messenger-state! messenger_state.clj: 92
onyx.messaging.messenger-state/transition-messenger messenger_state.clj: 83
onyx.messaging.aeron.messenger.AeronMessenger/update-publishers messenger.clj: 112
onyx.messaging.aeron.messenger/transition-publishers messenger.clj: 51
clojure.core/group-by core.clj: 7146
clojure.core/reduce core.clj: 6828
clojure.core.protocols/fn/G protocols.clj: 13
clojure.core.protocols/fn protocols.clj: 75
clojure.core.protocols/seq-reduce protocols.clj: 24
clojure.core/seq core.clj: 137
...
clojure.core/keep/fn core.clj: 7341
onyx.messaging.aeron.messenger/transition-publishers/fn messenger.clj: 50
onyx.messaging.aeron.publisher/reconcile-pub publisher.clj: 291
onyx.messaging.aeron.publisher.Publisher/start publisher.clj: 198
onyx.messaging.aeron.endpoint-status.EndpointStatus/start endpoint_status.clj: 79
io.aeron.Aeron.addSubscription Aeron.java: 263
io.aeron.ClientConductor.addSubscription ClientConductor.java: 495
io.aeron.ClientConductor.addSubscription ClientConductor.java: 521
io.aeron.ClientConductor.awaitResponse ClientConductor.java: 945
io.aeron.ClientConductor.service ClientConductor.java: 896
io.aeron.DriverEventsAdapter.receive DriverEventsAdapter.java: 63
org.agrona.concurrent.broadcast.CopyBroadcastReceiver.receive CopyBroadcastReceiver.java: 116
io.aeron.DriverEventsAdapter.onMessage DriverEventsAdapter.java: 123
io.aeron.command.ImageBuffersReadyFlyweight.sourceIdentity ImageBuffersReadyFlyweight.java: 239
org.agrona.concurrent.UnsafeBuffer.getStringAscii UnsafeBuffer.java: 1085
org.agrona.concurrent.UnsafeBuffer.getStringAscii UnsafeBuffer.java: 1134
org.agrona.concurrent.UnsafeBuffer.boundsCheck0 UnsafeBuffer.java: 1716
java.lang.IndexOutOfBoundsException: index=124 length=808517632 capacity=4096
clojure.lang.ExceptionInfo: Handling uncaught exception thrown inside task lifecycle :lifecycle/poll-recover. Killing the job. -> Exception type: java.lang.IndexOutOfBoundsException. Exception message: index=124 length=808517632 capacity=4096
job-id: #uuid "00000000-0000-0000-0000-000000000003"
metadata: {:job-id #uuid "00000000-0000-0000-0000-000000000003", :job-hash "7ba27abbd73fa66ec2351c328b997173d84067333d334ca41584c39e0669f"}
peer-id: #uuid "3236c6dd-c980-caac-12e3-d339f7c564ad"
task-name: :prepare-pending-state-tx
I'm not even quite sure how to begin figuring out what's going wrong here. I started poking around but haven't made much headway.
The problem doesn't occur in 0.14.5, and I see aeron was upgraded in 0.14.6. We do set a large term buffer length (-Daeron.term.buffer.length=8388608) which is less than the length in the thrown exception. I've tried without that setting as well, but the errors still occur.
Does anyone have any advice?
Metadata
Metadata
Assignees
Labels
No labels