RSDK-13386 Do not emit logs from testAppender if the associated test has completed#6025
RSDK-13386 Do not emit logs from testAppender if the associated test has completed#6025benjirewis wants to merge 5 commits into
testAppender if the associated test has completed#6025Conversation
|
Test failures are from tests that assert on the now-missing logs; looking into it. |
|
@benjirewis just throwing out a half baked idea (given the test failure fallout): instead of removing the logs, you could have the problematic test pass in a custom logger that finds a way to avoid the problem. Either with an heuristic (e.g: set the level to errors only) or with a more formal switch/Close method on test end that informs the logger to stop emitting logs. Or some combination of the two. |
testAppender if the associated test has completed
Thanks! I think this is a good solution and should better help avoid this issue in the future. Threw some synchronization into |
| func (tapp *testAppender) Write(entry zapcore.Entry, fields []zapcore.Field) error { | ||
| tapp.tb.Helper() | ||
|
|
||
| log := func(toPrint []string) { |
There was a problem hiding this comment.
This looks a bit unwieldy. Was there a reason to only hold the lock around the tapp.tb.Log call? Particularly given you've already went for a RWMutex. All of the work here seems to be CPU (read: non-blocking) stuff, so locking is broadly safe. And because it's RW, we'll fully utilize the CPU to the same degree we already are.
There was a problem hiding this comment.
Was there a reason to only hold the lock around the
tapp.tb.Logcall?
Switched to RLocking around the whole method. I had started off by minimally holding the lock around the read of completed and the call that would cause the race with test completion (tapp.tb.Log). So, no, not much of a reason beyond making a "small" change. RLocking around the entire method also avoids creating a log entry that we're never going to log, which is nice.
Also some questions that are just for my own edification:
This looks a bit unwieldy.
As in the callback with mutex logic inside could be easily miswritten or misunderstood?
All of the work here seems to be CPU (read: non-blocking) stuff, so locking is broadly safe
In other words: because there's no network or disk IO, for example, in this method, there's no chance of holding the lock for any longer than it takes to the CPU to build up a list of strings for fields and encode the zap entry?
And because it's RW, we'll fully utilize the CPU to the same degree we already are
Because multiple concurrent Writes can RLock?
There was a problem hiding this comment.
As in the callback with mutex logic inside could be easily miswritten or misunderstood?
To me, having to define a lambda to wrap a small set of instructions for a small handful of early returns I consider unwieldy. Sometimes unwieldy is the only solution that meets other constraints.
Sometimes there's a risk of misunderstanding code written that way (e.g: the processResource lambda -- which further has a section of goutils.PanicCapturingGo). But that's when the function gets large. In this case it's small enough to be hard to misread.
For what it's worth, I think having three early returns that each have to remember to log is also unwieldy. But in that case I preferred the early returns with unwieldy penalty over if blocks (and perhaps even nested if-blocks).
In other words: because there's no network or disk IO, for example, in this method, there's no chance of holding the lock for any longer than it takes to the CPU to build up a list of strings for fields and encode the zap entry?
Yep exactly. And when reasoning about I/O in the worst case sense "holding the lock for any longer than" could actually mean "hold the lock forever because the internet died".
Because multiple concurrent Writes can RLock?
Yep
| func NewTestAppender(tb testing.TB) Appender { | ||
| return &testAppender{tb} | ||
| tapp := &testAppender{tb: tb} | ||
| tb.Cleanup(func() { |
There was a problem hiding this comment.
What's the consequence of every test appender buying into this behavior?
There was a problem hiding this comment.
(Probably negligibly) increased latency of Write calls and inability for any test to observe any log that is emitted after the test completes.
w.r.t. a doNotLogAfterCompletion parameter or something similar (not that that's what you're suggesting) I'm struggling to come up with a test where you'd want a logger that is allowed to emit logs after the test completes.
There was a problem hiding this comment.
(Probably negligibly) increased latency of Write calls and inability for any test to observe any log that is emitted after the test completes.
Not worried about that one -- as you so annotated.
I'm struggling to come up with a test where you'd want a logger that is allowed to emit logs after the test completes.
Getting closer -- I think the philosophical "problem" this patch creates is that we can mask tests that leak goroutines (which subsequently log). Whether tests leak goroutines directly or tests correctly call "Close", but Close is buggy.
goleak itself is pretty poor at identifying leaks because it only works on the test suite boundary. And not per-test. So it's almost necessary for short-lived goroutine leaks to require being the last test run to show up.
dgottlieb
left a comment
There was a problem hiding this comment.
So I assume one thing worth checking is the test in question is failing because it's supposed to be leaking a goroutine and not waiting for it.
Assuming that's true, I'm fine with disabling the testing log output at test finish. But I do want to make sure the choice is intentional to be giving up coverage it is empircally giving us of (mostly test-specific) goroutine leaks.
RSDK-13386
What
Adds logic to the
testAppenderto avoid emitting logs if the associated test has already completed.Why
These logs could emit after tests have already completed, causing data races like the following:
That data race is complaining about the fact that a reader is trying to check if the testing object is
donewhile the testing object is writingdoneto betrue. In other words, a test is logging at manager.go:939 after the test has finished.We should modify our test appender to be aware of when the associated test has completed.
Testing
Verified that if I put a
time.Sleep(20 * time.Millisecond)before theRestart context canceledlog and rungo test -v -count=100 ./module/modmanager -run TestCrashShortCircuit -race, I get a race like the one shown above (or a panic complaining about a test logging after the test completes). I do not get the same race when running with my changes.