Skip to content

Conversation

@vflaux
Copy link
Contributor

@vflaux vflaux commented Sep 22, 2025

What does it do ?

Ensure tests using log capture with testutils.LogsUnderTestWithLogLevel() are not run in parallel.
Also changes some test to use the testutils package for log capture.

This introduce a helper function testutils.IsParallel().
A parallel test panic when t.SetEnv() is called. This is used to determine if the test is marked to run in parallel or not.
This is a hack as the testing package do not expose this info.

Motivation

Some tests that capture logs run in parallel which may cause tests to fails randomly.

More

  • Yes, this PR title follows Conventional Commits
  • Yes, I added unit tests
  • Yes, I updated end user documentation accordingly

Ensure tests using using log capture with testutils are not run in parallel.
Also changes some test to use the testutils package for log capture.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 22, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ivankatliarchuk for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the internal Issues or PRs related to internal code label Sep 22, 2025
@k8s-ci-robot k8s-ci-robot added provider Issues or PRs related to a provider source needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 22, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @vflaux. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 22, 2025
@mloiseleur
Copy link
Collaborator

The use case and the idea make sense to me.
For the implementation, I am unsure 🤔.

@mloiseleur
Copy link
Collaborator

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 23, 2025
@ivankatliarchuk
Copy link
Member

Are there any data race issues with logging capture at the moment? The solution is a bit tricky, not sure if this is a recommended practice.

The t.SetEnv() panic approach is brittle and could break with Go runtime changes, so it looks quite fragile.

In future go version, it could be, that all tests will be running in parallel. So most likely instead we should be fixing data races where possible.

  1. proposal: testing: run tests in parallel by default golang/go#21214
  2. proposal: testing: run tests in parallel by default golang/go#73805

There are no examples that currently failing. But most recent issues I have witnessed where data races occur

@vflaux
Copy link
Contributor Author

vflaux commented Sep 23, 2025

I do agree that the implementation is not great. It would be better that the testing package exposed this. Maybe worth opening a proposal on the go project.

I created this mainly to detect potential issues in the current codebase where log capture is used, and I wanted to share it.
These tests should be run sequentially as we use a singleton logger throughout the code. We could run theses test in parallel if we were able to pass a logger to the different objects under test.

I don't have failing CI examples but I did encounter some issue while writing tests locally (fixed by removing t.Parallel).

some libraries do not work well with -race detectors. Example stretchr/testify#187

It's a good example of a use case where checking if parallel execution is enabled of not would be useful. Testify could check and fail with a meaningful message in this case instead of reporting wrong results.

While I'm not fully convinced that we should merge IsParallel() as it is, I do think we should keep the changes in the tests.
Wdyt?

@ivankatliarchuk
Copy link
Member

Maybe worth opening a proposal on the go project.

Here is golang/go#70017

I dunno. From my perspective as much as possible things should be immutable.

These tests should be run sequentially as we use a singleton logger throughout the code.

We not using singleton logger. Example

func LogsUnderTestWithLogLevel(level log.Level, t *testing.T) *test.Hook {
it creates a new logger for each test run. Possibly the solution needs to be adjusted.

Logrus library may have some data race issues. But I thought we did manage to fix all relevant issues in this codebase.

@vflaux
Copy link
Contributor Author

vflaux commented Sep 24, 2025

LogsUnderTestWithLogLevel() calls log.AddHook(), log.SetOutput() and log.SetLevel() that all use log.std internally.
All log.Info(), log.Debug() etc. also use log.std which is a singleton (https://github.com/sirupsen/logrus/blob/cb253f3080f18ec7e55b4c8f15b62fe0a806f130/exported.go#L11).

If tests using LogsUnderTestWithLogLevel() run in parallel, the expected log record may actually come from another test.

@ivankatliarchuk
Copy link
Member

Agree that there is still shared state. So why not to fix the design of those tests, but instead build hacks around the problem? Just to consider; Tests share unavoidable global state, and we are testing integration-like behavior

Or for example, we do not need to change the log level for each tests, it could always be DEBUG, to catch all.

It depends on what the “race” is, logger is global at the moment, and -race flag is detected concurrect access, which is expected. Sometimes disabling parallel execution (t.Parallel()) is the right fix, and sometimes it’s just hiding a deeper issue.

If we remove -race detector, the t.Parallel is going to be safe. Go’s -race detector doesn’t just look for corruption, it flags any concurrent access to the same memory location without explicit synchronization.

If we still want to allow t.Parallel() in some tests, but serialize only the logger ones, we can introduce a global mutex.

var logTestMu sync.Mutex

func TestLoggerIntegration(t *testing.T) {
    logTestMu.Lock()
    defer logTestMu.Unlock()
/....
}

This change addresses the issue before the problem is actually made visible. Instead, we should make the data races reliably visible and reproducible in CI, ensuring consistency. Once the races are exposed in a reproducible way, we can then focus on implementing exact proposed fix or some other one. I'm not a huge fan of t.Paralel() especially when CI is not beefy, but disabling parallel execution as first option without making the problem visible does not sound like a right first fix, and sometimes it’s just hiding a deeper issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. internal Issues or PRs related to internal code ok-to-test Indicates a non-member PR verified by an org member that is safe to test. provider Issues or PRs related to a provider size/M Denotes a PR that changes 30-99 lines, ignoring generated files. source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants