RSDK-13386 Do not emit logs from `testAppender` if the associated test has completed by benjirewis · Pull Request #6025 · viamrobotics/rdk

benjirewis · 2026-05-19T19:12:49Z

What

Adds logic to the testAppender to avoid emitting logs if the associated test has already completed.

Why

These logs could emit after tests have already completed, causing data races like the following:

WARNING: DATA RACE
Read at 0x00c0000f300b by goroutine 116:
  testing.(*common).destination()
      /home/testbot/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.1.linux-arm64/src/testing/testing.go:1055 +0x17c
  testing.(*common).log()
      /home/testbot/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.1.linux-arm64/src/testing/testing.go:1027 +0x98
  testing.(*common).Log()
      /home/testbot/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.1.linux-arm64/src/testing/testing.go:1180 +0x6c
  go.viam.com/rdk/logging.(*testAppender).Write()
      /__w/rdk/rdk/logging/testing.go:73 +0x534
  go.viam.com/rdk/logging.(*impl).Write()
      /__w/rdk/rdk/logging/impl.go:313 +0x44c
  go.viam.com/rdk/logging.(*impl).Infow()
      /__w/rdk/rdk/logging/impl.go:483 +0x98
  go.viam.com/rdk/module/modmanager.(*Manager).startModule.(*Manager).newOnUnexpectedExitHandler.func2()
      /__w/rdk/rdk/module/modmanager/manager.go:939 +0x628
  go.viam.com/utils/pexec.(*managedProcess).manage.func3()
      /home/testbot/go/pkg/mod/go.viam.com/utils@v0.4.19/pexec/managed_process.go:351 +0xfc

Previous write at 0x00c0000f300b by goroutine 27:
  testing.tRunner.func1()
      /home/testbot/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.1.linux-arm64/src/testing/testing.go:1921 +0x5e8
  runtime.deferreturn()
      /home/testbot/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.1.linux-arm64/src/runtime/panic.go:589 +0x5c
  testing.(*T).Run.gowrap1()
      /home/testbot/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.1.linux-arm64/src/testing/testing.go:1997 +0x3c

Goroutine 116 (running) created at:
  go.viam.com/utils/pexec.(*managedProcess).manage()
      /home/testbot/go/pkg/mod/go.viam.com/utils@v0.4.19/pexec/managed_process.go:348 +0x2bc
  go.viam.com/utils/pexec.(*managedProcess).start.func1()
      /home/testbot/go/pkg/mod/go.viam.com/utils@v0.4.19/pexec/managed_process.go:288 +0x54
  go.viam.com/utils.ManagedGo.func1()
      /home/testbot/go/pkg/mod/go.viam.com/utils@v0.4.19/runtime.go:184 +0x34
  go.viam.com/utils.PanicCapturingGoWithCallback.func1()
      /home/testbot/go/pkg/mod/go.viam.com/utils@v0.4.19/runtime.go:176 +0x6c

Goroutine 27 (finished) created at:
  testing.(*T).Run()
      /home/testbot/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.1.linux-arm64/src/testing/testing.go:1997 +0x6e0
  testing.runTests.func1()
      /home/testbot/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.1.linux-arm64/src/testing/testing.go:2477 +0x74
  testing.tRunner()
      /home/testbot/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.1.linux-arm64/src/testing/testing.go:1934 +0x164
  testing.runTests()
      /home/testbot/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.1.linux-arm64/src/testing/testing.go:2475 +0x734
  testing.(*M).Run()
      /home/testbot/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.1.linux-arm64/src/testing/testing.go:2337 +0xaf4
  main.main()
      _testmain.go:73 +0x100
==================

That data race is complaining about the fact that a reader is trying to check if the testing object is done while the testing object is writing done to be true. In other words, a test is logging at manager.go:939 after the test has finished.

We should modify our test appender to be aware of when the associated test has completed.

Testing

Verified that if I put a time.Sleep(20 * time.Millisecond) before the Restart context canceled log and run go test -v -count=100 ./module/modmanager -run TestCrashShortCircuit -race, I get a race like the one shown above (or a panic complaining about a test logging after the test completes). I do not get the same race when running with my changes.

benjirewis · 2026-05-20T13:23:02Z

Test failures are from tests that assert on the now-missing logs; looking into it.

dgottlieb · 2026-05-20T14:08:48Z

@benjirewis just throwing out a half baked idea (given the test failure fallout): instead of removing the logs, you could have the problematic test pass in a custom logger that finds a way to avoid the problem. Either with an heuristic (e.g: set the level to errors only) or with a more formal switch/Close method on test end that informs the logger to stop emitting logs. Or some combination of the two.

This reverts commit 9375949.

benjirewis · 2026-05-20T16:20:31Z

with a more formal switch/Close method on test end that informs the logger to stop emitting logs

Thanks! I think this is a good solution and should better help avoid this issue in the future. Threw some synchronization into testAppender that seems to get rid of my local race reproduction. Let me know what you think.

dgottlieb · 2026-05-20T17:00:41Z

 func (tapp *testAppender) Write(entry zapcore.Entry, fields []zapcore.Field) error {
 	tapp.tb.Helper()
+
+	log := func(toPrint []string) {


This looks a bit unwieldy. Was there a reason to only hold the lock around the tapp.tb.Log call? Particularly given you've already went for a RWMutex. All of the work here seems to be CPU (read: non-blocking) stuff, so locking is broadly safe. And because it's RW, we'll fully utilize the CPU to the same degree we already are.

Was there a reason to only hold the lock around the tapp.tb.Log call?

Switched to RLocking around the whole method. I had started off by minimally holding the lock around the read of completed and the call that would cause the race with test completion (tapp.tb.Log). So, no, not much of a reason beyond making a "small" change. RLocking around the entire method also avoids creating a log entry that we're never going to log, which is nice.

Also some questions that are just for my own edification:

This looks a bit unwieldy.

As in the callback with mutex logic inside could be easily miswritten or misunderstood?

All of the work here seems to be CPU (read: non-blocking) stuff, so locking is broadly safe

In other words: because there's no network or disk IO, for example, in this method, there's no chance of holding the lock for any longer than it takes to the CPU to build up a list of strings for fields and encode the zap entry?

And because it's RW, we'll fully utilize the CPU to the same degree we already are

Because multiple concurrent Writes can RLock?

As in the callback with mutex logic inside could be easily miswritten or misunderstood?

To me, having to define a lambda to wrap a small set of instructions for a small handful of early returns I consider unwieldy. Sometimes unwieldy is the only solution that meets other constraints.

Sometimes there's a risk of misunderstanding code written that way (e.g: the processResource lambda -- which further has a section of goutils.PanicCapturingGo). But that's when the function gets large. In this case it's small enough to be hard to misread.

For what it's worth, I think having three early returns that each have to remember to log is also unwieldy. But in that case I preferred the early returns with unwieldy penalty over if blocks (and perhaps even nested if-blocks).

In other words: because there's no network or disk IO, for example, in this method, there's no chance of holding the lock for any longer than it takes to the CPU to build up a list of strings for fields and encode the zap entry?

Yep exactly. And when reasoning about I/O in the worst case sense "holding the lock for any longer than" could actually mean "hold the lock forever because the internet died".

Because multiple concurrent Writes can RLock?

Yep

dgottlieb · 2026-05-20T17:01:31Z

 func NewTestAppender(tb testing.TB) Appender {
-	return &testAppender{tb}
+	tapp := &testAppender{tb: tb}
+	tb.Cleanup(func() {


What's the consequence of every test appender buying into this behavior?

(Probably negligibly) increased latency of Write calls and inability for any test to observe any log that is emitted after the test completes.

w.r.t. a doNotLogAfterCompletion parameter or something similar (not that that's what you're suggesting) I'm struggling to come up with a test where you'd want a logger that is allowed to emit logs after the test completes.

(Probably negligibly) increased latency of Write calls and inability for any test to observe any log that is emitted after the test completes.

Not worried about that one -- as you so annotated.

I'm struggling to come up with a test where you'd want a logger that is allowed to emit logs after the test completes.

Getting closer -- I think the philosophical "problem" this patch creates is that we can mask tests that leak goroutines (which subsequently log). Whether tests leak goroutines directly or tests correctly call "Close", but Close is buggy.

goleak itself is pretty poor at identifying leaks because it only works on the test suite boundary. And not per-test. So it's almost necessary for short-lived goroutine leaks to require being the last test run to show up.

dgottlieb

So I assume one thing worth checking is the test in question is failing because it's supposed to be leaking a goroutine and not waiting for it.

Assuming that's true, I'm fine with disabling the testing log output at test finish. But I do want to make sure the choice is intentional to be giving up coverage it is empircally giving us of (mostly test-specific) goroutine leaks.

Remove OUE logs that may emit after tests complete

9375949

benjirewis requested a review from jmatth May 19, 2026 19:12

viambot added the safe to test This pull request is marked safe to test from a trusted zone label May 19, 2026

benjirewis added 2 commits May 20, 2026 12:07

Do not log in test appender if test is completed

23d59e7

Revert "Remove OUE logs that may emit after tests complete"

8db108d

This reverts commit 9375949.

benjirewis changed the title ~~RSDK-13386 Remove OUE logs that may emit after tests complete~~ RSDK-13386 Do not emit logs from testAppender if the associated test has completed May 20, 2026

Revert manager test

00f84b6

viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels May 20, 2026

benjirewis requested a review from dgottlieb May 20, 2026 16:25

dgottlieb reviewed May 20, 2026

View reviewed changes

RLock all of Write

b238020

viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels May 20, 2026

dgottlieb approved these changes May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RSDK-13386 Do not emit logs from `testAppender` if the associated test has completed#6025

RSDK-13386 Do not emit logs from `testAppender` if the associated test has completed#6025
benjirewis wants to merge 5 commits into
viamrobotics:mainfrom
benjirewis:avoid-late-oue-logs

benjirewis commented May 19, 2026 •

edited

Loading

Uh oh!

benjirewis commented May 20, 2026

Uh oh!

dgottlieb commented May 20, 2026 •

edited

Loading

Uh oh!

benjirewis commented May 20, 2026

Uh oh!

dgottlieb May 20, 2026

Uh oh!

benjirewis May 20, 2026

Uh oh!

dgottlieb May 20, 2026

Uh oh!

dgottlieb May 20, 2026

Uh oh!

benjirewis May 20, 2026

Uh oh!

dgottlieb May 20, 2026

Uh oh!

dgottlieb left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

benjirewis commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Testing

Uh oh!

benjirewis commented May 20, 2026

Uh oh!

dgottlieb commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benjirewis commented May 20, 2026

Uh oh!

dgottlieb May 20, 2026

Choose a reason for hiding this comment

Uh oh!

benjirewis May 20, 2026

Choose a reason for hiding this comment

Uh oh!

dgottlieb May 20, 2026

Choose a reason for hiding this comment

Uh oh!

dgottlieb May 20, 2026

Choose a reason for hiding this comment

Uh oh!

benjirewis May 20, 2026

Choose a reason for hiding this comment

Uh oh!

dgottlieb May 20, 2026

Choose a reason for hiding this comment

Uh oh!

dgottlieb left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

benjirewis commented May 19, 2026 •

edited

Loading

dgottlieb commented May 20, 2026 •

edited

Loading