-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpu: nvidia: Refactor to native parameters for matmul #2111
gpu: nvidia: Refactor to native parameters for matmul #2111
Conversation
6da1758
to
c190dbf
Compare
c190dbf
to
2341c03
Compare
@@ -116,24 +122,25 @@ struct cudnn_matmul_t : cudnn_matmul_base_t { | |||
} | |||
return true; | |||
} | |||
|
|||
std::shared_ptr<cublas_params> params_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason why we want to have shared ownership over the params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be created in the primitive descriptor and it is used by the implementation, attempting to pass a unique_ptr back and forward from impl to primitive might not be desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refactoring looks very nice to me, a long-awaited one to be honest. Thanks!
2341c03
to
a950e3e
Compare
a950e3e
to
b209e92
Compare
Description
This refactor is required to address current issues in nvidia backend matmul. Currently the use of runtime dimensions requires native-specific member objects to be initialised, which are specific to the dimensions (Tensor descriptors, matrix layouts). This is handled currently by cleaning up the members and re-initializing on execution. This is not thread safe.
The refactor proposes abstraction of the members to separate structs, which in the non-runtime dimension case would be initialised with the primitive descriptor and passed to the primitive on its respective creation. When not using runtime dimensions the parameters are cleaned up when the primitive is destroyed.
In the runtime dimension case, the parameters are initialised only with the non native specific members. Inside the execute call a copy is created of the new params struct and only then are the native specific structs initialised for the copy. When using runtime dimensions the params are cleaned up at the end of the host_task running the matmul.
Checklist
General
make test
andmake test_benchdnn_*
) pass locally for each commit?