Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][test] Fix CoordinatorService, MetadataStore and MockZooKeeper leaks in tests #15638

Merged
merged 10 commits into from
Jul 13, 2022

Conversation

nicoloboschi
Copy link
Contributor

@nicoloboschi nicoloboschi commented May 17, 2022

Motivation

There are some tests that occasionally fails with non-sense mockito errors like this one:

org.apache.pulsar.broker.PulsarServerException: 
org.mockito.exceptions.misusing.WrongTypeOfReturnValue: 
BrokerService cannot be returned by getConfiguration()
getConfiguration() should return ServiceConfiguration
***
If you're unsure why you're getting above error read on.
Due to the nature of the syntax above problem might occur because:
1. This exception *might* occur in wrongly written multi-threaded tests.
   Please refer to Mockito FAQ on limitations of concurrency testing.
2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub spies - 
   - with doReturn|Throw() family of methods. More in javadocs for Mockito.spy() method.

	at org.apache.pulsar.broker.service.BrokerService.close(BrokerService.java:685)
	at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
Caused by: org.mockito.exceptions.misusing.WrongTypeOfReturnValue: 
BrokerService cannot be returned by getConfiguration()
getConfiguration() should return ServiceConfiguration
***
If you're unsure why you're getting above error read on.
Due to the nature of the syntax above problem might occur because:
1. This exception *might* occur in wrongly written multi-threaded tests.
   Please refer to Mockito FAQ on limitations of concurrency testing.
2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub spies - 
   - with doReturn|Throw() family of methods. More in javadocs for Mockito.spy() method.

	at org.apache.pulsar.broker.PulsarService.getConfiguration(PulsarService.java:608)
	at org.apache.pulsar.broker.service.BrokerService.shutdownEventLoopGracefully(BrokerService.java:846)
	at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710)
	at org.apache.pulsar.broker.service.BrokerService.shutdownEventLoopGracefully(BrokerService.java:846)
	at org.apache.pulsar.broker.service.BrokerService.closeAsync(BrokerService.java:757)
	at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710)
	at org.apache.pulsar.broker.service.BrokerService.closeAsync(BrokerService.java:720)
	at org.apache.pulsar.broker.service.BrokerService.close(BrokerService.java:680)
	... 2 more

After investigating I noticed that MetadataStore threads can still receive notifications even if the current test method is finished. Also other thread pools are not shutdown in the correct way and the test must wait for their terminations in order to avoid mixing up Mockito mocked objects.

Modifications

  • MockZookKeeper close method now really shutdown ZK
  • MetadataStore is always closed in MockedBookKeeperTestCase based tests
  • Fixed a couple of tests which manually create executors and Netty event loop groups and reordered the resources disposal (the metadatastore must be closed after all)

This is related to past flaky-tests like:

fix #15774

  • no-need-doc

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label May 17, 2022
@nicoloboschi nicoloboschi force-pushed the fix-tests-close-managed-ledger branch from e16c266 to bd66618 Compare May 17, 2022 13:04
@nicoloboschi nicoloboschi changed the title [fix][tests] Ensure thread pools are disposed after some tests [fix] Ensure resources are disposed in a sync way while closing ManagedLedgerFactory and MetadataStore May 17, 2022
@nicoloboschi nicoloboschi reopened this May 17, 2022
@Technoboy- Technoboy- closed this May 18, 2022
@Technoboy-
Copy link
Contributor

Will reopen later to re-run the test.

@Technoboy- Technoboy- reopened this May 18, 2022
@Technoboy- Technoboy- closed this May 18, 2022
@Technoboy- Technoboy- reopened this May 18, 2022
@nicoloboschi
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

@nicoloboschi nicoloboschi force-pushed the fix-tests-close-managed-ledger branch from 280557d to 86b7d07 Compare May 19, 2022 08:25
@nicoloboschi
Copy link
Contributor Author

Test failed is not related, I opened an issue: #15676

@nicoloboschi nicoloboschi requested review from Jason918, eolivelli, merlimat and lhotari and removed request for Jason918 and eolivelli May 20, 2022 08:22
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement!

@nicoloboschi
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

1 similar comment
@nicoloboschi
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

@lhotari
Copy link
Member

lhotari commented Jun 3, 2022

@nicoloboschi Please rebase this PR.

It looks like this could help with these issues which happen in PulsarFunctionTlsTest.tearDown:

"main" #1 prio=5 os_prio=0 cpu=67703.73ms elapsed=3524.96s tid=0x00007f0b9c0243f0 nid=0xafb waiting on condition  [0x00007f0ba017c000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.3/Native Method)
	- parking to wait for  <0x00000000ce100010> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.3/LockSupport.java:211)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.3/AbstractQueuedSynchronizer.java:715)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@17.0.3/AbstractQueuedSynchronizer.java:1047)
	at java.util.concurrent.CountDownLatch.await(java.base@17.0.3/CountDownLatch.java:230)
	at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl.shutdown(ManagedLedgerFactoryImpl.java:635)
	at org.apache.pulsar.broker.ManagedLedgerClientFactory.close(ManagedLedgerClientFactory.java:134)
	at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:460)
	at org.apache.pulsar.broker.PulsarService.close(PulsarService.java:374)
	at org.apache.pulsar.functions.worker.PulsarFunctionTlsTest.tearDown(PulsarFunctionTlsTest.java:182)

This is the 4th most flaky issue currently.

@nicoloboschi nicoloboschi force-pushed the fix-tests-close-managed-ledger branch 2 times, most recently from 8006c99 to cf21267 Compare June 6, 2022 11:48
@github-actions
Copy link

github-actions bot commented Jul 7, 2022

The pr had no activity for 30 days, mark with Stale label.

@nicoloboschi nicoloboschi force-pushed the fix-tests-close-managed-ledger branch from d63a9b2 to 5cd2170 Compare July 12, 2022 08:02
@nicoloboschi
Copy link
Contributor Author

@lhotari PTAL

@github-actions github-actions bot removed the Stale label Jul 13, 2022
@nicoloboschi nicoloboschi changed the title [fix] Ensure resources are disposed in a sync way while closing ManagedLedgerFactory and CoordinationService [fix] Always close CoordinatorService, fix MetadataStore and MockZooKeeper leaks in tests Jul 13, 2022
@nicoloboschi nicoloboschi changed the title [fix] Always close CoordinatorService, fix MetadataStore and MockZooKeeper leaks in tests [fix][test] Fix CoordinatorService, MetadataStore and MockZooKeeper leaks in tests Jul 13, 2022
@lhotari lhotari merged commit 9c75904 into apache:master Jul 13, 2022
wuxuanqicn pushed a commit to wuxuanqicn/pulsar that referenced this pull request Jul 14, 2022
…eaks in tests (apache#15638)

* [fix][tests] Ensure thread pools are disposed after some tests

* fix comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CoordinationServiceImpl.close hangs and doesn't complete
5 participants