Skip to content

[WIP][Bugfix] Fix xgrammar nanobind leaked objects at shutdown#34690

Open
haosdent wants to merge 1 commit intovllm-project:mainfrom
haosdent:fix-26363
Open

[WIP][Bugfix] Fix xgrammar nanobind leaked objects at shutdown#34690
haosdent wants to merge 1 commit intovllm-project:mainfrom
haosdent:fix-26363

Conversation

@haosdent
Copy link
Contributor

@haosdent haosdent commented Feb 17, 2026

Purpose

Fixes #26363

Fix nanobind leak warnings (GrammarMatcher, CompiledGrammar) when using xgrammar as the structured output backend. The warnings appear at process exit:

nanobind: leaked 2 instances!
 - leaked instance of type "GrammarMatcher"
 - leaked instance of type "CompiledGrammar"
nanobind: leaked 2 types!
nanobind: leaked 16 functions!

Root cause: Per-request xgrammar nanobind objects were not released during shutdown due to two gaps:

  1. Scheduler.shutdown() did not clear its self.requests dict, so Request -> StructuredOutputRequest -> XgrammarGrammar (holding matcher and ctx nanobind objects) remained referenced until Python interpreter teardown, when nanobind metadata may already be freed.

  2. LLMEngine.__del__() did not call engine_core.shutdown(), so in in-process mode (VLLM_ENABLE_V1_MULTIPROCESSING=0) the entire cleanup chain was never triggered. (Compare: AsyncLLM.__del__() already calls self.shutdown().)

Fix (3 targeted changes):

  • scheduler.py: Add self.requests.clear() in Scheduler.shutdown() to release per-request xgrammar objects deterministically during shutdown.
  • llm_engine.py: Add engine_core.shutdown() in LLMEngine.__del__() to ensure the cleanup chain runs in in-process mode (mirrors existing AsyncLLM pattern).
  • structured_output/__init__.py: Shut down ThreadPoolExecutors in clear_backend() to cancel pending grammar compilations, and set self.backend = None to make repeated calls idempotent.

Test Plan

  1. Run the reproduction script from issue [Bug]: xgrammar cleanup leakage #26363 -- verify no nanobind leak warnings appear at exit.
  2. Test both modes: default multiprocess and VLLM_ENABLE_V1_MULTIPROCESSING=0 (in-process).
  3. Run existing unit tests:
    • pytest tests/v1/core/test_scheduler.py
    • pytest tests/v1/engine/test_llm_engine.py
    • pytest tests/v1/structured_output/

Test Result

Reproduction script (default multiprocess mode):

  • Script completed successfully, generated correct structured JSON output.
  • No nanobind leak warnings (leaked instances, leaked types, leaked functions -- all absent).

Reproduction script (in-process mode, VLLM_ENABLE_V1_MULTIPROCESSING=0):

  • Script completed successfully, generated correct structured JSON output.
  • No nanobind leak warnings. Only a pre-existing PyTorch NCCL destroy_process_group warning remains (unrelated to xgrammar).

Unit tests:

  • tests/v1/structured_output/: 27 passed
  • tests/v1/core/test_scheduler.py: 86 passed, 1 skipped, 1 pre-existing failure (test_async_scheduling_pp_allows_rescheduling_with_output_placeholders -- fails on main as well)
  • tests/v1/engine/test_llm_engine.py: 4 passed, 1 pre-existing failure (GPU memory issue in test environment), 1 pre-existing failure (HuggingFace download issue in test environment)

…ct#26363)

Fix nanobind leak warnings for GrammarMatcher and CompiledGrammar
objects when using xgrammar as the structured output backend.

The root cause was that per-request xgrammar objects were not released
during shutdown: Scheduler.shutdown() did not clear its requests dict,
and LLMEngine.__del__() did not trigger the shutdown chain at all in
in-process mode.

Signed-off-by: haosdent <haosdent@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a nanobind object leak at shutdown when using xgrammar. The changes introduce proper cleanup logic in several places. In scheduler.py, self.requests is cleared during shutdown to release references to per-request objects. In llm_engine.py, engine_core.shutdown() is now called from LLMEngine.__del__, ensuring the shutdown sequence is triggered in-process, consistent with AsyncLLM. Finally, in structured_output/__init__.py, ThreadPoolExecutors are properly shut down to prevent dangling resources. These changes seem correct and effectively address the reported memory leak. My only concern is the reliance on __del__ for cleanup in LLMEngine, which can be unreliable.

Comment on lines +423 to +424
if engine_core := getattr(self, "engine_core", None):
engine_core.shutdown()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using __del__ for cleanup is unreliable. It's not guaranteed to be called, especially during interpreter shutdown or if reference cycles exist. This can lead to resource leaks, which this PR aims to fix. Consider providing an explicit shutdown() method on LLMEngine or implementing it as a context manager for more deterministic cleanup.

@haosdent haosdent changed the title [Bugfix] Fix xgrammar nanobind leaked objects at shutdown [WIP][Bugfix] Fix xgrammar nanobind leaked objects at shutdown Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working structured-output v1

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Bug]: xgrammar cleanup leakage

1 participant