-
Notifications
You must be signed in to change notification settings - Fork 108
Enable direct bindings in Thunder #2502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
48aa575 to
27d9531
Compare
This PR is a set of fixes for issues found from Lightning-AI/lightning-thunder#2502. * Increase `max_length` default in `FusionDefinition` to `9999` * Rename `num_dims` function to `ndim` property in TensorView * Use size of logical domain without reductions as number of dimensions instead of `TensorView::nDims()`
27d9531 to
b320598
Compare
| # TODO Review splititng very large fusions or removing the max length restriction completely | ||
| # See "Very large nvFuser fusions hit max_length" | ||
| fd = FusionDefinition(max_length=MAX_LENGTH) | ||
| fd = FusionDefinition() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvfuser 0.2.32 bumped the default max length for legacy bindings to 9999 while the direct bindings does not have a fusion cache.
| else: | ||
| self.set_fuel(FUEL_LEVEL.UNLIMITED) | ||
|
|
||
| env_var_save_serde = os.getenv("ENABLE_NVFUSER_SERIALIZATION", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Direct bindings does not support serialization. Removing it for simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious to know if there is a plan to support this in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serialization would need to be added on top of [RFC] Create LRU cache for direct bindings. #4893. It isn't on the roadmap but it has been accounted for.
204100a to
296167e
Compare
296167e to
4167b4c
Compare
b4616ff to
10616ea
Compare
kshitij12345
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @rdspring1
| else: | ||
| self.set_fuel(FUEL_LEVEL.UNLIMITED) | ||
|
|
||
| env_var_save_serde = os.getenv("ENABLE_NVFUSER_SERIALIZATION", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious to know if there is a plan to support this in future.
10616ea to
646e084
Compare
|
@t-vi This PR is ready for your review. Also, what are the 10 required tests that haven't run yet? |
|
@rdspring1 unfortunately, the CI is not up currently. I'm running the tests manually now and will merge when they pass. |
t-vi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rdspring1 @crcrpar @kshitij12345
This PR enables the direct bindings for nvfuser executor if
DIRECT_BINDINGS_SUPPORTED_VERSION >= 0.2.34.import nvfuser, but it will be deprecated and no longer receive new operations.dockers/ubuntu-cuda/Dockerfile,scripts/validate_build.py,thunder/tests/conftest.py,thunder/executors/nvfuserex.pycontainimport nvfuserfor version checks andFusionCache.reset(). These files will need to be updated when legacy bindings are removed.What is direct bindings?
Why direct bindings?