Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to stabilize the WindowedSubscriber test that uses VTS time advancing to close window and test comments. #39418

Merged
merged 1 commit into from
Mar 29, 2024

Conversation

anuchandy
Copy link
Member

No description provided.

@anuchandy anuchandy force-pushed the adjust-next-window-on-timeout-test branch from 16459c4 to b760bde Compare March 28, 2024 18:23
@anuchandy
Copy link
Member Author

/azp run java - eventhubs - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@anuchandy
Copy link
Member Author

/azp run java - servicebus - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@anuchandy anuchandy force-pushed the adjust-next-window-on-timeout-test branch from b760bde to 360eb36 Compare March 28, 2024 21:15
@anuchandy
Copy link
Member Author

/azp run java - servicebus - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

} else {
logger.atError().log("VirtualTimeScheduler unavailable (unexpected close by the test).");
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Liudmila and I’ve been investigating a strange random CI error with the following stack trace –

[ForkJoinPool-1-worker-1] WARN com.azure.messaging.servicebus.WindowedSubscriber$WindowWork - {"az.sdk.message":"Terminating the work. Error while scheduling or waiting for timeout.","exception":"Scheduler unavailable","demand":1,"workId":0,"pending":1}
reactor.core.Exceptions$ReactorRejectedExecutionException: Scheduler unavailable
	at reactor.core.Exceptions.failWithRejected(Exceptions.java:271)
	at reactor.core.publisher.Operators.onRejectedExecution(Operators.java:1024)
	at reactor.core.publisher.MonoDelay.subscribe(MonoDelay.java:64)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4491)
	at reactor.core.publisher.Mono.subscribeWith(Mono.java:4606)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4458)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4394)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4366)
	at com.azure.messaging.servicebus.WindowedSubscriber$WindowWork.beginTimeoutTimer(WindowedSubscriber.java:859)
	at com.azure.messaging.servicebus.WindowedSubscriber$WindowWork.init(WindowedSubscriber.java:700)
	at com.azure.messaging.servicebus.WindowedSubscriber$WindowWork.access$1100(WindowedSubscriber.java:576)
	at com.azure.messaging.servicebus.WindowedSubscriber.initWorkOnce(WindowedSubscriber.java:386)
	at com.azure.messaging.servicebus.WindowedSubscriber.drainLoop(WindowedSubscriber.java:264)
	at com.azure.messaging.servicebus.WindowedSubscriber.drain(WindowedSubscriber.java:223)
	at com.azure.messaging.servicebus.WindowedSubscriber.enqueueRequestImpl(WindowedSubscriber.java:144)
	at com.azure.messaging.servicebus.WindowedSubscriberFluxWindowIsolatedTest.lambda$shouldPickEnqueuedWindowRequestsOnSubscriptionReady$19(WindowedSubscriberFluxWindowIsolatedTest.java:326)
	at reactor.test.DefaultStepVerifierBuilder$DefaultStepVerifier.toVerifierAndSubscribe(DefaultStepVerifierBuilder.java:862)
	at reactor.test.DefaultStepVerifierBuilder$DefaultStepVerifier.verify(DefaultStepVerifierBuilder.java:831)
	at reactor.test.DefaultStepVerifierBuilder$DefaultStepVerifier.verify(DefaultStepVerifierBuilder.java:823)
	at reactor.test.DefaultStepVerifierBuilder.verifyComplete(DefaultStepVerifierBuilder.java:690)
	at com.azure.messaging.servicebus.WindowedSubscriberFluxWindowIsolatedTest.shouldPickEnqueuedWindowRequestsOnSubscriptionReady(WindowedSubscriberFluxWindowIsolatedTest.java:344)

All the tests in this class are running under VirtualTimeScheduler (VTS) and once in a while the test "shouldPickEnqueuedWindowRequestsOnSubscriptionReady“ fails with "Scheduler unavailable".

From the stack trace this error comes from the scheduler (should be VTS in case of test) when WindowedSubscriber tries to register the timeout as part of starting the first WindowWork.

While in the real-world scenario the scheduler for timeout will be the global ParallelScheduler not VTS, both schedulers can throw this error only when it’s shutdown.

In the test we shut down the local VTS (scoped to the test) only when the test is completed, not in the middle of the test run. It’s known that VTS infra tracks the current VTS instance in a static member, so we had seen similar "scheduler unavailable" when two test using VTS infra runs in parallel and completion of one disposes the static current VTS. But as of today, we run all VTS tests in Isolated mode, so this should not happen.

Given this is so random failure and also observed happening only when CI is loaded, adding this logging function to shed some light into what’s going on here. The tests log using this function in case the VTS gets closed unexpectedly, this helps us,

  1. To see if the problem is local to the test, I.e., test run forward and closes the VTS.
  2. To see if the problem triggered from outside of the test, i.e., the VTS infra itself tear down the static current VTS holder.
  3. To see if there is for some reason VTS is not even gets injected.

@anuchandy anuchandy merged commit bf1dad9 into Azure:main Mar 29, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants