-
Notifications
You must be signed in to change notification settings - Fork 576
feat: Implement graceful shutdown for tool and sidecar containers #1729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add cleanup() method to ToolExecutionTracker to close Redis connection - Add cleanup() class method to LogPublisher to close Kombu and Redis connections - Update LogProcessor signal handler to set shutdown flag - Add shutdown flag check in monitor_logs() loop to exit gracefully - Add cleanup() method to LogProcessor and call it in finally block - Update ToolEntrypoint signal handler to raise SystemExit with proper exit code Both containers now properly handle SIGTERM/SIGINT signals from Docker/Kubernetes, update tool execution status, and clean up Redis connections before exiting.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughThese changes implement graceful shutdown handling across multiple components. A global shutdown signal mechanism is introduced in LogProcessor that monitors for termination requests and triggers cleanup in LogPublisher and ToolExecutionTracker before exit. The signal handler now exits with a signal-based status code. Changes
Sequence DiagramsequenceDiagram
participant OS as Operating System
participant Handler as Signal Handler
participant LP as LogProcessor
participant Tracker as ToolExecutionTracker
participant Pub as LogPublisher
participant Redis as Redis/Kombu
OS->>Handler: Signal received (SIGTERM/SIGINT)
Handler->>LP: Set _shutdown_requested = True
Handler->>Handler: raise SystemExit(128 + signum)
rect rgba(200, 150, 255, 0.2)
Note over LP: monitor_logs() detects shutdown
LP->>LP: Check _shutdown_requested
LP->>Tracker: Update status to FAILED<br/>Error: "Process terminated by signal"
end
rect rgba(100, 200, 255, 0.2)
Note over LP: cleanup() invoked (try/finally)
LP->>Tracker: cleanup()
Tracker->>Redis: Close connection
Redis-->>Tracker: Closed
LP->>Pub: cleanup()
Pub->>Redis: Release Kombu connection
Pub->>Redis: Close Redis client
Redis-->>Pub: Closed
end
LP->>LP: Exit monitor loop
Handler->>OS: Exit with signal code
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
unstract/core/src/unstract/core/pubsub_helper.py (1)
193-207: Cleanup implementation looks good, but checks are always truthy and considerlogging.exception.The
if cls.kombu_connandif cls.rchecks on lines 197 and 203 are always truthy since these are class-level attributes initialized unconditionally at class definition (lines 23-29). The checks won't protect against uninitialized connections.Also, per static analysis hints, consider using
logging.exceptioninstead oflogging.errorto include the stack trace for easier debugging.🔎 Suggested improvement
@classmethod def cleanup(cls) -> None: """Close publisher connections for graceful shutdown.""" try: - if cls.kombu_conn: - cls.kombu_conn.release() - logging.debug("LogPublisher Kombu connection released") + cls.kombu_conn.release() + logging.debug("LogPublisher Kombu connection released") except Exception as e: - logging.error(f"Failed to close Kombu connection: {e}") + logging.exception(f"Failed to close Kombu connection: {e}") try: - if cls.r: - cls.r.close() - logging.debug("LogPublisher Redis connection closed") + cls.r.close() + logging.debug("LogPublisher Redis connection closed") except Exception as e: - logging.error(f"Failed to close Redis connection: {e}") + logging.exception(f"Failed to close Redis connection: {e}")unstract/core/src/unstract/core/tool_execution_status.py (1)
218-225: Cleanup logic is correct; minor suggestions for consistency.The
if self.redis_clientcheck on line 221 is always truthy sinceredis_clientis initialized unconditionally in__init__(line 66). Consider usinglogger.exceptioninstead oflogger.errorto include the stack trace for debugging, consistent with the static analysis hint.🔎 Suggested improvement
def cleanup(self) -> None: """Close Redis connection for graceful shutdown.""" try: - if self.redis_client: - self.redis_client.close() - logger.debug("ToolExecutionTracker Redis connection closed") + self.redis_client.close() + logger.debug("ToolExecutionTracker Redis connection closed") except Exception as e: - logger.error(f"Failed to close Redis connection: {e}") + logger.exception(f"Failed to close Redis connection: {e}")tool-sidecar/src/unstract/tool_sidecar/log_processor.py (1)
267-277: Cleanup method structure is solid; considerlogger.exceptionfor stack traces.Good design to wrap each cleanup call in its own try/except, ensuring one failure doesn't prevent the other from executing. Per static analysis hints, consider using
logger.exceptionto include stack traces for debugging.🔎 Suggested improvement
def cleanup(self) -> None: """Clean up resources on shutdown.""" logger.info("Cleaning up resources...") try: self.tool_execution_tracker.cleanup() except Exception as e: - logger.error(f"Failed to cleanup ToolExecutionTracker: {e}") + logger.exception(f"Failed to cleanup ToolExecutionTracker: {e}") try: LogPublisher.cleanup() except Exception as e: - logger.error(f"Failed to cleanup LogPublisher: {e}") + logger.exception(f"Failed to cleanup LogPublisher: {e}")
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (4)
tool-sidecar/src/unstract/tool_sidecar/log_processor.pyunstract/core/src/unstract/core/pubsub_helper.pyunstract/core/src/unstract/core/tool_execution_status.pyunstract/sdk1/src/unstract/sdk1/tool/entrypoint.py
🧰 Additional context used
🧬 Code graph analysis (1)
unstract/core/src/unstract/core/pubsub_helper.py (2)
tool-sidecar/src/unstract/tool_sidecar/log_processor.py (1)
cleanup(267-277)unstract/core/src/unstract/core/tool_execution_status.py (1)
cleanup(218-225)
🪛 Ruff (0.14.10)
unstract/core/src/unstract/core/pubsub_helper.py
200-200: Do not catch blind exception: Exception
(BLE001)
201-201: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
206-206: Do not catch blind exception: Exception
(BLE001)
207-207: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
tool-sidecar/src/unstract/tool_sidecar/log_processor.py
272-272: Do not catch blind exception: Exception
(BLE001)
273-273: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
276-276: Do not catch blind exception: Exception
(BLE001)
277-277: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
unstract/core/src/unstract/core/tool_execution_status.py
224-224: Do not catch blind exception: Exception
(BLE001)
225-225: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (4)
unstract/sdk1/src/unstract/sdk1/tool/entrypoint.py (1)
15-23: LGTM!The signal handler correctly implements Unix convention for signal-based exit codes (128 + signal number). Raising
SystemExitallows anyfinallyblocks in the call stack to execute, enabling proper cleanup.tool-sidecar/src/unstract/tool_sidecar/log_processor.py (3)
31-42: Signal handler and class-level flag look good.The pattern of using a class-level flag for signal communication is appropriate for this single-process sidecar. Since Python's signal handlers run in the main thread, the flag access is safe without additional synchronization.
235-242: Graceful shutdown handling in monitor loop is well implemented.The shutdown check placement at the start of the loop iteration ensures timely response to termination signals. Updating the execution status to
FAILEDwith a descriptive error message provides good observability for operators investigating terminated jobs.
344-347: LGTM!The
try/finallypattern ensurescleanup()is called regardless of howmonitor_logs()exits—whether through normal completion, exception, or signal-triggered break. This is the correct approach for resource cleanup.



Summary
Implements proper graceful shutdown handling for both the tool container and sidecar container when running in Docker/Kubernetes.
Problem
Both signal handlers in
entrypoint.pyandlog_processor.pyonly logged signals but didn't actually stop execution or clean up resources. When Docker/Kubernetes sends SIGTERM, the containers continued running until forcibly killed.Changes
1. ToolExecutionTracker (
unstract/core/src/unstract/core/tool_execution_status.py)cleanup()method to close Redis connection2. LogPublisher (
unstract/core/src/unstract/core/pubsub_helper.py)cleanup()class method to close Kombu and Redis connections3. LogProcessor (
tool-sidecar/src/unstract/tool_sidecar/log_processor.py)_shutdown_requestedclass-level flagmonitor_logs()checks the flag and exits gracefully, updating tool execution status to FAILEDcleanup()method to close all connectionsmain()now callsprocessor.cleanup()in afinallyblock4. ToolEntrypoint (
unstract/sdk1/src/unstract/sdk1/tool/entrypoint.py)SystemExit(128 + signum)for proper exit with standard signal exit codeWhy Both Handlers Are Needed
Both signal handlers are required because they run in separate containers within the same Kubernetes pod:
entrypoint.py): Runs the actual tool codelog_processor.py): Monitors tool logs and publishes to RedisWhen Kubernetes terminates a pod, both containers receive SIGTERM independently.
Testing
docker kill --signal=SIGTERMto containers and verify graceful shutdown