Skip to content

improving flaky SSLCertificateTest #179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 12, 2018

Conversation

the-thing
Copy link
Contributor

SSLCertificateTest fails quite often on Travis.

I was able to recreate SSLCertificateTest false positive failing tests locally using the same Travis CI Docker image snapshot as https://travis-ci.org/quickfix-j/quickfixj worker - travisci/ci-garnet:packer-1512502276-986baf0.

It seems that sometimes MINA produces exceptions other than SSLHandshakeException, where org.apache.mina.filter.ssl.SslFilter#exceptionCaught(NextFilter nextFilter, IoSession session, Throwable cause) can stop executing the filter chain (it actually has a special handling routine for WriteToClosedSessionException) , hence the exception is never captured by SSLCertificateTest.

22:17:21.792 [NioProcessor-8] INFO q.m.s.SSLCertificateTest$TestInitiator -
org.apache.mina.core.write.WriteToClosedSessionException: null
at org.apache.mina.core.polling.AbstractPollingIoProcessor.clearWriteRequestQueue(AbstractPollingIoProcessor.java:633)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.removeNow(AbstractPollingIoProcessor.java:576)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.writeBuffer(AbstractPollingIoProcessor.java:923)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.flushNow(AbstractPollingIoProcessor.java:840)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.flush(AbstractPollingIoProcessor.java:767)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$700(AbstractPollingIoProcessor.java:68)
at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1125)
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

The fix is to register test exception handler before MINA org.apache.mina.filter.ssl.SslFilter. I run the test over 20 times on mentioned Docker image and I could not see the issue anymore (on average it took less than 3 consecutive runs to see false positive failure).

@chrjohn chrjohn added this to the QFJ 2.0.1 milestone Mar 12, 2018
@chrjohn
Copy link
Member

chrjohn commented Mar 12, 2018

Hi Marcin,

great, thanks for the PR. I also stumbled upon this recently and temporarily fixed this on my local branch by overriding the exceptionCaught() method in QFJs SSLFilter so that MINAs SslFilter did not do any filtering of Exceptions. But your solution is of course cleaner and more straightforward. :)

Cheers,
Chris.

@chrjohn chrjohn merged commit a00e6f4 into quickfix-j:master Mar 12, 2018
chrjohn pushed a commit that referenced this pull request Mar 12, 2018
@the-thing the-thing deleted the flaky_ssl_certificate_test branch March 12, 2018 11:44
@chrjohn
Copy link
Member

chrjohn commented Apr 6, 2018

Hi Marcin,

the last build failed with a NPE in the test. https://api.travis-ci.org/v3/job/363043029/log.txt

This seems to happen once in a while. Stack trace excerpt:

[ERROR] shouldFailWhenUsingEmptyServerTrustore(quickfix.mina.ssl.SSLCertificateTest)  Time elapsed: 0.545 s  <<< ERROR!
java.lang.NullPointerException
	at quickfix.mina.ssl.SSLCertificateTest$TestConnector.findSSLSession(SSLCertificateTest.java:571)
	at quickfix.mina.ssl.SSLCertificateTest$TestConnector.assertNotAuthenticated(SSLCertificateTest.java:616)
	at quickfix.mina.ssl.SSLCertificateTest.shouldFailWhenUsingEmptyServerTrustore(SSLCertificateTest.java:459)
...

The corresponding code is this:

SSLFilter sslFilter = (SSLFilter) filterChain.get(SSLSupport.FILTER_NAME);
SSLSession sslSession = sslFilter.getSslSession(ioSession);

So it seems the SSLFilter is not set with the IoSession. I guess this can be checked before? Or is this highlighting another problem?

Thanks,
Chris.

@the-thing
Copy link
Contributor Author

Hi Christoph,

I will have a closer look at the end of the weekend. BTW. Have you spotted other test cases than shouldFailWhenUsingEmptyServerTrustore failing in similar fashion or is it just this one?

Thanks,
Marcin

@chrjohn
Copy link
Member

chrjohn commented Apr 7, 2018

Hi Marcin,

after your fixes to the exception filter I could only observe this NPE. However, I also saw this before your changes.
See my comment here #180 (comment)

Thanks and cheers,
Chris.

@chrjohn chrjohn modified the milestones: QFJ 2.0.1, QFJ 2.1.0 Jul 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants