Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable PyTorch Models to Share Weights #4123

Merged
merged 9 commits into from
Apr 1, 2022
Merged

Conversation

dyastremsky
Copy link
Contributor

@dyastremsky dyastremsky commented Mar 28, 2022

This PR creates tests for an associated PyTorch backend change, which allows a Triton user to enable multiple instances of a model on the same device to share weights. This is turned off by default and can be enabled via a model config parameter, "ENABLE_WEIGHT_SHARING."

Enabling weight reuse can reduce memory usage of model loading and inference. It should not be used with models that maintain state (due to the reusing of weights).

Related backend code and documentation change: triton-inference-server/pytorch_backend#54

qa/L0_libtorch_shared_weights/test.sh Outdated Show resolved Hide resolved
qa/L0_libtorch_shared_weights/test.sh Show resolved Hide resolved
CoderHam
CoderHam previously approved these changes Mar 30, 2022
@dyastremsky dyastremsky merged commit b30700f into main Apr 1, 2022
@dyastremsky dyastremsky deleted the dyas-share-weights branch April 1, 2022 16:34
@joaopcm1996
Copy link

Does weight sharing mean that multiple instances of a model will not truly run in parallel in their separate CUDA streams?

@dyastremsky
Copy link
Contributor Author

@joaopcm1996 It should not have an impact. Since the models are already trained, we're just sharing the constant (read-only) values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants