[WIP][Bugfix] Fix xgrammar nanobind leaked objects at shutdown#34690
[WIP][Bugfix] Fix xgrammar nanobind leaked objects at shutdown#34690haosdent wants to merge 1 commit intovllm-project:mainfrom
Conversation
…ct#26363) Fix nanobind leak warnings for GrammarMatcher and CompiledGrammar objects when using xgrammar as the structured output backend. The root cause was that per-request xgrammar objects were not released during shutdown: Scheduler.shutdown() did not clear its requests dict, and LLMEngine.__del__() did not trigger the shutdown chain at all in in-process mode. Signed-off-by: haosdent <haosdent@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a nanobind object leak at shutdown when using xgrammar. The changes introduce proper cleanup logic in several places. In scheduler.py, self.requests is cleared during shutdown to release references to per-request objects. In llm_engine.py, engine_core.shutdown() is now called from LLMEngine.__del__, ensuring the shutdown sequence is triggered in-process, consistent with AsyncLLM. Finally, in structured_output/__init__.py, ThreadPoolExecutors are properly shut down to prevent dangling resources. These changes seem correct and effectively address the reported memory leak. My only concern is the reliance on __del__ for cleanup in LLMEngine, which can be unreliable.
| if engine_core := getattr(self, "engine_core", None): | ||
| engine_core.shutdown() |
There was a problem hiding this comment.
Using __del__ for cleanup is unreliable. It's not guaranteed to be called, especially during interpreter shutdown or if reference cycles exist. This can lead to resource leaks, which this PR aims to fix. Consider providing an explicit shutdown() method on LLMEngine or implementing it as a context manager for more deterministic cleanup.
Purpose
Fixes #26363
Fix nanobind leak warnings (
GrammarMatcher,CompiledGrammar) when using xgrammar as the structured output backend. The warnings appear at process exit:Root cause: Per-request xgrammar nanobind objects were not released during shutdown due to two gaps:
Scheduler.shutdown()did not clear itsself.requestsdict, soRequest->StructuredOutputRequest->XgrammarGrammar(holdingmatcherandctxnanobind objects) remained referenced until Python interpreter teardown, when nanobind metadata may already be freed.LLMEngine.__del__()did not callengine_core.shutdown(), so in in-process mode (VLLM_ENABLE_V1_MULTIPROCESSING=0) the entire cleanup chain was never triggered. (Compare:AsyncLLM.__del__()already callsself.shutdown().)Fix (3 targeted changes):
scheduler.py: Addself.requests.clear()inScheduler.shutdown()to release per-request xgrammar objects deterministically during shutdown.llm_engine.py: Addengine_core.shutdown()inLLMEngine.__del__()to ensure the cleanup chain runs in in-process mode (mirrors existingAsyncLLMpattern).structured_output/__init__.py: Shut downThreadPoolExecutors inclear_backend()to cancel pending grammar compilations, and setself.backend = Noneto make repeated calls idempotent.Test Plan
VLLM_ENABLE_V1_MULTIPROCESSING=0(in-process).pytest tests/v1/core/test_scheduler.pypytest tests/v1/engine/test_llm_engine.pypytest tests/v1/structured_output/Test Result
Reproduction script (default multiprocess mode):
leaked instances,leaked types,leaked functions-- all absent).Reproduction script (in-process mode,
VLLM_ENABLE_V1_MULTIPROCESSING=0):destroy_process_groupwarning remains (unrelated to xgrammar).Unit tests:
tests/v1/structured_output/: 27 passedtests/v1/core/test_scheduler.py: 86 passed, 1 skipped, 1 pre-existing failure (test_async_scheduling_pp_allows_rescheduling_with_output_placeholders-- fails on main as well)tests/v1/engine/test_llm_engine.py: 4 passed, 1 pre-existing failure (GPU memory issue in test environment), 1 pre-existing failure (HuggingFace download issue in test environment)