Skip to content

Flaky test: TestAddCertsToWatch_remove_ca #2586

Closed

Description

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
https://travis-ci.com/github/albertteoh/jaeger/jobs/406895163

=== RUN   TestAddCertsToWatch_remove_ca
    cert_watcher_test.go:307: 
        	Error Trace:	cert_watcher_test.go:307
        	Error:      	Should be true
        	Test:       	TestAddCertsToWatch_remove_ca
--- FAIL: TestAddCertsToWatch_remove_ca (0.00s)

Expected behavior
Passing test.

What troubleshooting steps did you try?
Adding the following at the end of the test:

fmt.Println(logObserver.All())

Prints:

[
{{warn 2020-10-25 16:37:28.214382 +1100 AEDT m=+0.001567434  Certificate has been removed, using the last known version undefined } [{certificate 15 0 /var/folders/3p/yms48z2s6v7c8fy2m_1481g00000gn/T/ca.crt611136954 <nil>}]}
{{warn 2020-10-25 16:37:28.214443 +1100 AEDT m=+0.001628482  Certificate has been removed, using the last known version undefined } [{certificate 15 0 /var/folders/3p/yms48z2s6v7c8fy2m_1481g00000gn/T/clientCa.crt633105361 <nil>}]}
]

I believe the root cause is a race where the following waitUntil finishes after the first detected "Certificate has been removed..." log, and before the second clientCAFile deletion is detected by the certWatcher:

waitUntil(func() bool {
	return logObserver.FilterMessage("Certificate has been removed, using the last known version").Len() > 0
}, 100, time.Millisecond*100)

Causing the second assertion to fail:

assert.True(t, logObserver.FilterMessage("Certificate has been removed, using the last known version").FilterField(zap.String("certificate", clientCaFile.Name())).Len() > 0)

Proposed Fix

Modify the log observer check to wait until at least 2 cert deletions are seen (>= 2 for readability):

return logObserver.FilterMessage("Certificate has been removed, using the last known version").Len() >= 2

Confirmed that this change doesn't just pass tests due to timeout and hence, does not negatively impact test execution time:

  • With >= 2:
=== RUN   TestAddCertsToWatch_remove_ca
--- PASS: TestAddCertsToWatch_remove_ca (0.10s)
  • Whereas with >=3 still passes but requires a 10s wait to timeout the waitUntil:
=== RUN   TestAddCertsToWatch_remove_ca
--- PASS: TestAddCertsToWatch_remove_ca (10.19s)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    buggood first issueGood for beginnershelp wantedFeatures that maintainers are willing to accept but do not have cycles to implement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions