Skip to content

Conversation

@CharlieR-o-o-t
Copy link
Contributor

@CharlieR-o-o-t CharlieR-o-o-t commented Aug 1, 2025

Fix endless loop on fluent-bit reload.

Fixes #10670, #10518, #9927, #9354

Short description:
mk_event_timeout_destroy() should be called only after flb_engine_shutdown(). Otherwise, the shutdown timer will be removed right after the chunk retry finishes. This may cause all inputs to stop, the engine to become blocked during reloads, and the shutdown logic to never be triggered because config->shutdown_fd was already released.

Testing
I have reported bug #10670 , it well documented with exact steps to reproduce.
Tested

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes
    • Improved shutdown process to ensure the shutdown timer remains active during the grace period and task flushing, leading to a more reliable and graceful engine shutdown.

Signed-off-by: Siarhei Rasiukevich <s.rasiukevich@gmail.com>
@coderabbitai
Copy link

coderabbitai bot commented Aug 1, 2025

Walkthrough

The change in src/flb_engine.c moves the destruction of the shutdown timer event from the initial handling of the FLB_ENGINE_SHUTDOWN event to the final shutdown sequence. This ensures the timer remains active during the shutdown grace period and task flushing, and is only destroyed right before the engine exits.

Changes

Cohort / File(s) Change Summary
Engine Shutdown Timer Handling
src/flb_engine.c
Relocated the destruction of the shutdown timer event from the early shutdown event handler to the final shutdown sequence, ensuring the timer persists throughout the shutdown grace period and task flushing.

Sequence Diagram(s)

sequenceDiagram
    participant Engine
    participant ShutdownTimer

    Engine->>Engine: Detect FLB_ENGINE_SHUTDOWN event
    Engine->>ShutdownTimer: Keep timer active (no destruction)
    Engine->>Engine: Process pending tasks and chunks
    Engine->>ShutdownTimer: Destroy timer (after tasks/chunks processed)
    Engine->>Engine: Complete shutdown and exit
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~7 minutes

Assessment against linked issues

Objective Addressed Explanation
Prevent endless loop and ensure proper handling of SIGHUP/SIGTERM and log processing after reload (#10670)
Ensure shutdown timer event is managed correctly during shutdown and reload (#10670)

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes found.

Poem

In the engine’s gentle hum at night,
A timer once vanished out of sight.
Now it lingers, calm and wise,
Guiding shutdown ‘til goodbyes.
No more loops that never end—
The rabbit winks, “All signals mend!”
🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b9741c and cafcca4.

📒 Files selected for processing (1)
  • src/flb_engine.c (1 hunks)
🔇 Additional comments (1)
src/flb_engine.c (1)

1104-1107: LGTM! Critical fix for shutdown timer lifecycle management.

This change properly fixes the endless loop issue by ensuring the shutdown timer event remains active throughout the entire shutdown grace period and task flushing phase. Moving the mk_event_timeout_destroy call to the final shutdown sequence (right before function return) prevents premature timer destruction that could cause the shutdown process to hang indefinitely.

The previous early destruction of the timer likely contributed to the endless loop bug described in issue #10670, where Fluent Bit would stop processing signals and fail to complete the reload/shutdown process properly.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@edsiper
Copy link
Member

edsiper commented Aug 2, 2025

@CharlieR-o-o-t thank you !

@edsiper edsiper merged commit 42e29d6 into fluent:master Aug 2, 2025
61 checks passed
@CharlieR-o-o-t
Copy link
Contributor Author

CharlieR-o-o-t commented Aug 7, 2025

Hello @edsiper ,
Looks like this fix not included in new release (Fluent Bit 4.0.7).
Do you know if it will be part of v4.0.8?

@cosmo0920
Copy link
Contributor

Ah, sorry. I've overlooked for this PR. I sent a backport PR in #10709.
It will be included in v4.0.8.

@zwscn2014
Copy link

Ah, sorry. I've overlooked for this PR. I sent a backport PR in #10709. It will be included in v4.0.8.

Thank you for including this PR in v4.0.8! Could you please let me know if there is an estimated release date for this version?

@CharlieR-o-o-t
Copy link
Contributor Author

CharlieR-o-o-t commented Aug 10, 2025

one more PR to fix: #10720

@cosmo0920 , could you take a look? that's important fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Endless loop on reload: Fluent Bit stops log processing and handling of SIGHUP/SIGTERM

4 participants