Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Fix random test failures in test/plugin/test_out_forward.rb #1881

Merged

Conversation

fujimotos
Copy link
Member

This is a fix for the test failures occurring on Travis CI every now and then.

For actual cases of this failure, please see job#4143.2, job#4066.5 and job#4167.2.

How the failure happens

The failing test cases are supposed to test the handling of "broken ack responses";
To cut it short, these test cases are designed to perform the following steps:

  1. Create an in_forward server (which sends back a broken ack response).
  2. Create an out_forward server.
  3. Send some data and check the handling of the broken ack response.
  4. Shutdown the both forwarding servers.

The problem is that there might be another flush attempt occurring between step 3 and 4.
If one occurs, it will cause an unexpected NoNodeAvailable exception (since the destination
node is already labelled as "unavailable" due to the broken ack response in step 3).

In most environments, this possibility does not pose any real problem because the test execution
is modestly fast (so there is very little space for an additional flush attempt to occur). But since
the CI server is known to become very slow from time to time, we need to take special care for it.

Solution

This patch fixes this issue by taking additional care to prevent unintended buffer flushes:

  • Setting a sufficiently long flush_thread_interval (30 seconds)
  • Disable flush_buffer_at_cleanup of the testing driver for the test cases.

Also, this should resolve the annoying "unexpected error while after_shutdown" warnings
produced while the test execution.

@fujimotos fujimotos force-pushed the sf/fix-random-failure-in-out-forward branch from 7f55d8d to 8fdbe31 Compare March 7, 2018 08:25
I notice that the following test cases (which are supposed to test the
handling of broken ack responses) are randomly failing on Travis CI:

 1. 'a destination node not supporting responses by disconnection'
 2. 'a destination node not supporting responses by just ignoring'

The root problem is that, since the destination node gets labelled as
"unavailable" after an ack failure, there really shouldn't be any flush
attempt after the testing steps (or it will raise a NoNodesAvailable
exception).

In most environments, this possibility does not pose much problems
because the test execution is sufficiently fast (so there is very
little space for an additional flush attempt to occur).

But since the CI server is known to become very slow from time to time,
we need to take special care for it.
@fujimotos fujimotos force-pushed the sf/fix-random-failure-in-out-forward branch from 8fdbe31 to 29d2f7e Compare March 7, 2018 09:32
@repeatedly repeatedly self-assigned this Mar 10, 2018
@repeatedly repeatedly merged commit e0df711 into fluent:master Mar 10, 2018
@repeatedly
Copy link
Member

Thanks! Just merged.

@fujimotos fujimotos deleted the sf/fix-random-failure-in-out-forward branch July 5, 2018 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants