Skip to content

Conversation

@aletourneau
Copy link

@aletourneau aletourneau commented Nov 7, 2025

Increase wait timeout from 5s to 60s to reduce intermittent failures.

I have noticed, at least locally, that it is frequently required to wait a lot more than the current 5s, as shown in the debug logs.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • [ N/A] Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Debug / Valgrind logs

$ valgrind --leak-check=full ./bin/flb-rt-core_chunk_trace 
==1274892== Memcheck, a memory error detector
==1274892== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1274892== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==1274892== Command: ./bin/flb-rt-core_chunk_trace
==1274892== 
Test trace...                                   [2025/11/07 14:13:46.344880856] [ info] [fluent bit] version=4.2.0, commit=f879a93bc7, pid=1274892
[2025/11/07 14:13:46.384589051] [ info] [storage] ver=1.5.3, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/11/07 14:13:46.385211200] [ info] [simd    ] disabled
[2025/11/07 14:13:46.385578287] [ info] [cmetrics] version=1.0.5
[2025/11/07 14:13:46.385980015] [ info] [ctraces ] version=0.6.6
[2025/11/07 14:13:46.399130338] [ info] [input:emitter:trace-emitter] initializing
[2025/11/07 14:13:46.399790620] [ info] [input:emitter:trace-emitter] storage_strategy='memory' (memory only)
[2025/11/07 14:13:46.435622122] [ info] [input:emitter:trace-emitter] thread instance initialized
[2025/11/07 14:13:46.463921842] [ info] [sp] stream processor started
[2025/11/07 14:13:46.467972372] [ info] [engine] Shutdown Grace Period=1, Shutdown Input Grace Period=0
[2025/11/07 14:13:46.538441225] [ info] [fluent bit] version=4.2.0, commit=f879a93bc7, pid=1274892
[2025/11/07 14:13:46.538894287] [ info] [storage] ver=1.5.3, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/11/07 14:13:46.539035786] [ info] [simd    ] disabled
[2025/11/07 14:13:46.539141456] [ info] [cmetrics] version=1.0.5
[2025/11/07 14:13:46.539289590] [ info] [ctraces ] version=0.6.6
[2025/11/07 14:13:46.540957336] [ info] [input:dummy:dummy.0] initializing
[2025/11/07 14:13:46.587519012] [ info] [output:stdout:stdout.0] worker #0 started
[2025/11/07 14:13:46.541093806] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2025/11/07 14:13:46.582230401] [ info] [sp] stream processor started
[2025/11/07 14:13:46.585862300] [ info] [engine] Shutdown Grace Period=1, Shutdown Input Grace Period=0
[0] dummy.0: [[1762542827.056090136, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1762542828.033435617, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1762542829.033474779, {}], {"message"=>"dummy"}]
[2025/11/07 14:13:50.47970231] [ info] [test] flush record
[0] dummy.0: [[1762542830.033873764, {}], {"message"=>"dummy"}]
[2025/11/07 14:13:50.594354534] [ info] [test] collected records, waited 4 seconds
[2025/11/07 14:13:50.605990204] [ warn] [engine] service will shutdown in max 1 seconds
[2025/11/07 14:13:50.607959037] [ info] [engine] pausing all inputs..
[2025/11/07 14:13:50.610133972] [ info] [input] pausing dummy.0
[2025/11/07 14:13:51.53749179] [ info] [engine] service has stopped (0 pending tasks)
[2025/11/07 14:13:51.54001796] [ info] [input] pausing dummy.0
[2025/11/07 14:13:51.58154295] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2025/11/07 14:13:51.69416871] [ info] [output:stdout:stdout.0] thread worker #0 stopped
[2025/11/07 14:13:51.104944282] [ warn] [engine] service will shutdown in max 1 seconds
[2025/11/07 14:13:51.105167077] [ info] [engine] pausing all inputs..
[2025/11/07 14:13:52.33169863] [ info] [engine] service has stopped (0 pending tasks)
[ OK ]
SUCCESS: All unit tests have passed.
==1274892== 
==1274892== HEAP SUMMARY:
==1274892==     in use at exit: 0 bytes in 0 blocks
==1274892==   total heap usage: 5,157 allocs, 5,157 frees, 3,897,925 bytes allocated
==1274892== 
==1274892== All heap blocks were freed -- no leaks are possible
==1274892== 
==1274892== For lists of detected and suppressed errors, rerun with: -s
==1274892== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Output of delay required by multiple tests (10 executions)

$ for i in {1..10}; do ./bin/flb-rt-core_chunk_trace 2>&1 | grep collected; done 
[2025/11/07 14:17:06.980204356] [ info] [test] collected records, waited 10 seconds
[2025/11/07 14:17:19.55182541] [ info] [test] collected records, waited 6 seconds
[2025/11/07 14:17:30.53825194] [ info] [test] collected records, waited 4 seconds
[2025/11/07 14:17:40.56687053] [ info] [test] collected records, waited 3 seconds
[2025/11/07 14:17:55.55847414] [ info] [test] collected records, waited 8 seconds
[2025/11/07 14:18:05.55402091] [ info] [test] collected records, waited 3 seconds
[2025/11/07 14:18:19.55083289] [ info] [test] collected records, waited 7 seconds
[2025/11/07 14:18:43.57154926] [ info] [test] collected records, waited 17 seconds
[2025/11/07 14:18:53.56144980] [ info] [test] collected records, waited 3 seconds
[2025/11/07 14:19:03.55392549] [ info] [test] collected records, waited 3 seconds

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Tests
    • Enhanced test reliability with an adaptive wait mechanism that polls for results over a configurable duration instead of using fixed timeouts.

Signed-off-by: Alexandre Létourneau <letourneau.alexandre@gmail.com>
@coderabbitai
Copy link

coderabbitai bot commented Nov 7, 2025

Walkthrough

Modifies test timing behavior in the core chunk trace test by introducing a configurable maximum wait macro and replacing a fixed 5-second sleep with a loop that polls for records every second up to 60 seconds, while logging elapsed time.

Changes

Cohort / File(s) Summary
Test timing improvements
tests/runtime/core_chunk_trace.c
Introduces FLB_TEST_MAX_WAIT macro set to 60 seconds; replaces fixed 5-second sleep with polling loop that checks for records every second and logs total wait time

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

  • Single, focused file change in test code
  • Straightforward replacement of fixed delay with configurable polling loop
  • Logic remains contained and testable

Poem

🐰 A rabbit hops with patience true,
No more fixed waits of five—sixty's new!
Each second ticks while records align,
Flexible timings make tests shine.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and directly describes the main change: increasing the timeout in the core_chunk_trace runtime test from 5 to 60 seconds to reduce test flakiness.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4c158d and 32a5197.

📒 Files selected for processing (1)
  • tests/runtime/core_chunk_trace.c (2 hunks)
🔇 Additional comments (2)
tests/runtime/core_chunk_trace.c (2)

29-29: LGTM: Timeout constant is well-justified.

The 60-second maximum wait is appropriate given the PR's debug logs showing actual waits up to 17 seconds.


105-109: LGTM: Polling loop correctly implements early exit with timeout.

The implementation properly exits as soon as records are collected or after 60 seconds, and the logging provides useful timing visibility. This is a clear improvement over the fixed 5-second sleep.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant