Skip to content

Conversation

@rdspring1
Copy link
Collaborator

@rdspring1 rdspring1 commented Sep 4, 2025

This PR enables the direct bindings for nvfuser executor if DIRECT_BINDINGS_SUPPORTED_VERSION >= 0.2.34.

  • Legacy bindings will remain in the nvfuser pip wheel under import nvfuser, but it will be deprecated and no longer receive new operations.
  • dockers/ubuntu-cuda/Dockerfile, scripts/validate_build.py, thunder/tests/conftest.py, thunder/executors/nvfuserex.py contain import nvfuser for version checks and FusionCache.reset(). These files will need to be updated when legacy bindings are removed.

What is direct bindings?

  • TL;DR: It is direct pybind11 mapping from nvfuser CPP API to python.
  • The python API is the same as legacy bindings, so Thunder is minimally affected.
  • Direct bindings does not have a static Fusion Definition cache like legacy bindings.

Why direct bindings?

  • The legacy bindings utilized an intermediate representation from nvFuser's underlying FusionIR where only the definition ops were exposed in this intermediate representation to allow for graph caching. By moving to direct bindings, we are allowing for the possibility of adding more bindings to expose the scheduling of operations from python to allow users to more directly interact with nvFuser internals from python. One example where exposing scheduling operations is useful is multi-gpu support where the mesh dimensions of a tensor are annotated via a scheduling operation.
  • It is difficult to expose multi-gpu scheduling primitives with legacy bindings.
  • The static Fusion Definition cache can be a memory hog and cumbersome to update.

@rdspring1 rdspring1 force-pushed the nvf_direct branch 2 times, most recently from 48aa575 to 27d9531 Compare September 4, 2025 22:49
rdspring1 added a commit to NVIDIA/Fuser that referenced this pull request Sep 5, 2025
This PR is a set of fixes for issues found from
Lightning-AI/lightning-thunder#2502.
* Increase `max_length` default in `FusionDefinition` to `9999`
* Rename `num_dims` function to `ndim` property in TensorView
* Use size of logical domain without reductions as number of dimensions
instead of `TensorView::nDims()`
# TODO Review splititng very large fusions or removing the max length restriction completely
# See "Very large nvFuser fusions hit max_length"
fd = FusionDefinition(max_length=MAX_LENGTH)
fd = FusionDefinition()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvfuser 0.2.32 bumped the default max length for legacy bindings to 9999 while the direct bindings does not have a fusion cache.

else:
self.set_fuel(FUEL_LEVEL.UNLIMITED)

env_var_save_serde = os.getenv("ENABLE_NVFUSER_SERIALIZATION", None)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct bindings does not support serialization. Removing it for simplicity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious to know if there is a plan to support this in future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serialization would need to be added on top of [RFC] Create LRU cache for direct bindings. #4893. It isn't on the roadmap but it has been accounted for.

@rdspring1 rdspring1 force-pushed the nvf_direct branch 5 times, most recently from 204100a to 296167e Compare September 6, 2025 23:26
@rdspring1 rdspring1 changed the title [WIP] Enable direct bindings in Thunder Enable direct bindings in Thunder Sep 7, 2025
@rdspring1 rdspring1 marked this pull request as ready for review September 7, 2025 00:33
@rdspring1 rdspring1 requested review from kshitij12345 and removed request for kshitij12345 September 7, 2025 00:33
Copy link
Collaborator

@kshitij12345 kshitij12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @rdspring1

else:
self.set_fuel(FUEL_LEVEL.UNLIMITED)

env_var_save_serde = os.getenv("ENABLE_NVFUSER_SERIALIZATION", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious to know if there is a plan to support this in future.

@rdspring1
Copy link
Collaborator Author

@t-vi This PR is ready for your review. Also, what are the 10 required tests that haven't run yet?

@t-vi
Copy link
Collaborator

t-vi commented Oct 6, 2025

@rdspring1 unfortunately, the CI is not up currently. I'm running the tests manually now and will merge when they pass.

Copy link
Collaborator

@t-vi t-vi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t-vi t-vi enabled auto-merge (squash) October 6, 2025 15:47
@t-vi t-vi disabled auto-merge October 6, 2025 15:49
@t-vi t-vi merged commit 8b542cf into Lightning-AI:main Oct 6, 2025
39 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants