-
Notifications
You must be signed in to change notification settings - Fork 555
ci: Improve the stability of CI tests #2467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughRefactors ManagedProcess to replace a multi-process stdout tee/sed pipeline with a single subprocess and a dedicated output thread that reads, prefixes, logs, and optionally prints lines. Adds lifecycle changes for startup, termination, thread joining, and safe stream closure. Introduces _proc and _output_thread fields and updates termination logic. Changes
Sequence Diagram(s)sequenceDiagram
participant TR as Test Runner
participant MP as ManagedProcess
participant SP as Subprocess (Popen)
participant OT as Output Thread
participant LOG as Log File
participant STD as Stdout (optional)
TR->>MP: __enter__() / start()
MP->>SP: Popen(cmd, text=True, stdout=PIPE, stderr=STDOUT)
MP->>OT: start(process_output(SP.stdout))
OT->>SP: Non-blocking read lines
loop For each line
OT->>LOG: write("[CMD] line")
alt live print enabled
OT->>STD: print(line)
end
end
TR-->>MP: __exit__()
MP->>SP: terminate process tree
MP->>SP: wait()
MP->>OT: join(timeout=5s)
MP->>SP: close stdout safely
MP->>MP: cleanup data dir
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (7)
tests/utils/managed_process.py (7)
18-22
: Select on pipes is not portable; thread import is fine
select.select
on file descriptors works on POSIX but not on Windows pipes. Since this utility may run in CI across different runners, consider a portable approach (e.g., a blocking reader loop in a dedicated thread) or guard theselect
usage with a platform fallback.Would you like a simplified, portable output loop (blocking read in the thread) that avoids
select
entirely?
121-136
: Avoid closing stdout while the output thread may still be alive; improve logging style
- After
join(timeout=5)
, guard the stream close to only happen if the thread finished; otherwise, skip closing to avoid spurious exceptions.- Prefer structured logging with
exc_info=True
over f-strings for exceptions.Apply this diff:
# Wait for output processing thread to finish if ( hasattr(self, "_output_thread") and self._output_thread and self._output_thread.is_alive() ): self._logger.info("Waiting for output thread to finish...") self._output_thread.join(timeout=5) # Give it 5 seconds to finish - # Now safely close file descriptors - if self._proc and self._proc.stdout: - try: - self._proc.stdout.close() - except Exception as e: - self._logger.warning(f"Error closing stdout: {e}") + # Now safely close file descriptors (only if output thread has finished) + if ( + self._proc + and self._proc.stdout + and (not getattr(self, "_output_thread", None) or not self._output_thread.is_alive()) + ): + try: + self._proc.stdout.close() + except Exception: + self._logger.warning("Error closing stdout", exc_info=True) + elif getattr(self, "_output_thread", None) and self._output_thread.is_alive(): + self._logger.warning("Output thread did not finish within timeout; skipping stdout close")
171-181
: Use line buffering and explicit encoding for more robust text handling
- Add
bufsize=1
to request line-buffering on our side and improve responsiveness.- Specify
encoding="utf-8"
anderrors="replace"
to avoid decode errors causing thread crashes.Apply this diff:
self._proc = subprocess.Popen( self.command, env=self.env or os.environ.copy(), cwd=self.working_dir, stdin=subprocess.DEVNULL, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, + bufsize=1, # line-buffered to improve responsiveness in text mode start_new_session=True, # Isolate process group to prevent kill 0 from affecting parent - text=True, # Use text mode for easier line processing + text=True, # Use text mode for easier line processing + encoding="utf-8", + errors="replace", )
224-225
: Prefer structured logging with traceback over f-stringsUse
exc_info=True
to preserve stack traces and avoid eager string formatting.Apply this diff:
- self._logger.warning(f"Line processing error: {e}") + self._logger.warning("Line processing error", exc_info=True) @@ - self._logger.warning(f"Output processing error: {e}") + self._logger.warning("Output processing error", exc_info=True)Also applies to: 231-233
234-235
: Name the output thread for easier diagnosticsA thread name helps during debugging and when analyzing thread dumps.
Apply this diff:
- self._output_thread = threading.Thread(target=process_output, daemon=True) + self._output_thread = threading.Thread( + target=process_output, + name=f"ManagedProcessOutput-{self._command_name}", + daemon=True, + ) self._output_thread.start()
297-317
: Avoid variable shadowing; rename loop var to improve readability
_proc
(loop var) shadowsself._proc
. Rename to avoid confusion.Apply this diff:
- for _proc in psutil.process_iter(["name", "cmdline"]): + for ps_proc in psutil.process_iter(["name", "cmdline"]): try: if ( - _proc.name() == self._command_name - or _proc.name() in self.stragglers + ps_proc.name() == self._command_name + or ps_proc.name() in self.stragglers ): self._logger.info( - "Terminating Existing %s %s", _proc.name(), _proc.pid + "Terminating Existing %s %s", ps_proc.name(), ps_proc.pid ) - terminate_process_tree(_proc.pid, self._logger) + terminate_process_tree(ps_proc.pid, self._logger) for cmdline in self.straggler_commands: - if cmdline in " ".join(_proc.cmdline()): + if cmdline in " ".join(ps_proc.cmdline()): self._logger.info( "Terminating Existing CmdLine %s %s %s", - _proc.name(), - _proc.pid, - _proc.cmdline(), + ps_proc.name(), + ps_proc.pid, + ps_proc.cmdline(), ) - terminate_process_tree(_proc.pid, self._logger) + terminate_process_tree(ps_proc.pid, self._logger)
171-236
: Optional simplification: drop select and use a blocking line reader in the dedicated threadGiven the output is handled in its own thread and the process is terminated before joining, a blocking reader loop avoids the complexity and portability issues of
select
on file descriptors:
- Portable across OSes (Windows pipes are not
select
-able).- Simpler and less error-prone.
Proposed sketch (for reference; replace the loop body accordingly):
with open(self._log_path, "w", encoding="utf-8") as f: for line in iter(self._proc.stdout.readline, ""): formatted = f"[{self._command_name.upper()}] {line.rstrip()}" if self.display_output: print(formatted, flush=True) f.write(formatted + "\n") f.flush()This works because:
- After
terminate_process_tree(...)
, the child's stdout pipe is closed;readline()
returns "" and the thread exits promptly.- No
select
, no platform caveats.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
tests/utils/managed_process.py
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
tests/utils/managed_process.py (2)
88-91
: New internal fields look goodAdding
_proc
and_output_thread
simplifies lifecycle management and avoids the fragile multi-process tee/sed pipeline.
116-120
: Orderly shutdown sequence LGTMTerminating the process tree first and waiting for the main process is the right order to stop output generation before handling the output thread and stream closure.
Do you have a gitlab pipeline with trtllm tests? Current pipeline only ran vllm tests |
@tanmayv25 can we get this merged to see if it helps stabilize? |
Unfortunately it is failing request migration tests. Need to spend some time to look into it. |
Overview:
The process piping in certain scenario was causing unexpected hangs. Using a non-blocking thread to read and write the data from the process.
Summary by CodeRabbit
Tests
Refactor