-
Notifications
You must be signed in to change notification settings - Fork 75k
Matrix Triangular Solve GPU op #5010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
@c0g, thanks for your PR! By analyzing the history of the files in this pull request, we identified @tensorflower-gardener, @ebrevdo and @josh11b to be potential reviewers. |
| Licensed under the Apache License, Version 2.0 (the "License"); | ||
| you may not use this file except in compliance with the License. | ||
| You may obtain a copy of the License at | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't change whitespace in lines you aren't otherwise modifying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please undo these changes.
| } | ||
|
|
||
| int64 GetCostPerUnit(const TensorShapes& input_matrix_shapes) const final | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably need to multiply this by Eigen's AddCost / MulCost to get appropriate estimates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I messed up the formatting on the original op due to changing white space. Everything in MatrixTriangularSolveOp is original code from TF.
I'm happy to modify the original code to include AddCost/MulCost.
| { | ||
| double rows = static_cast<double>(input_matrix_shapes[0].dim_size(0)); | ||
| double num_rhss = static_cast<double>(input_matrix_shapes[1].dim_size(1)); | ||
| double cost = rows * rows * num_rhss; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you doing this in double precision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the original tensorflow op code (see above). Possibly because double can represent larger numbers than int? I doubt anyone will try a matrix multiply large enough to overflow a 64 bit int though. Happy to change to int64.
|
|
||
| if __name__ == "__main__": | ||
| tf.test.main() | ||
| tf.test.main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undo whitespace changes
| } else { | ||
| output.noalias() = triangle.solve(rhs); | ||
| } | ||
| trans = perftools::gputools::blas::Transpose::kNoTranspose; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use longer variable names instead of 'trans', 'lda', 'ldb', etc?
then in ThenBlasTrsm call, you can do e.g. upper_or_lower /* uplo */, ..., etc.
easier for other users to read.
| cublas_m, cublas_n, 1.0, matrix_ptr, lda, &out_ptr, | ||
| ldb) | ||
| .ok(); | ||
| // LOG(INFO) << blas_launch_status; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented out code
| stream | ||
| ->ThenBlasTrsm(perftools::gputools::blas::Side::kRight, uplo, trans, | ||
| perftools::gputools::blas::Diagonal::kNonUnit, | ||
| cublas_m, cublas_n, 1.0, matrix_ptr, lda, &out_ptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be Scalar(1) instead of 1.0 (which is a double)
|
Thanks for the PR! A few comments. |
| import tensorflow as tf | ||
|
|
||
|
|
||
| class MatrixTriangularSolveOpTest(tf.test.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not being tested with GPU support. You need to modify this line:
from tf_py_test to cuda_py_test.
|
Ah in that case let's leave it as double. You're right. On Oct 17, 2016 9:20 AM, "c0g" notifications@github.com wrote:
|
|
Do we still need changes? |
|
Jenkins, test this please. |
|
I have not yet made the requested changes.
|
|
@ebrevdo : I reverted to the original file to fix white space then made your other requested changes. |
|
I don't see the revert. Please push? |
|
@drpngx you're right I have no idea what happened to them. The changes should now be up. Sorry for the lag/commit spam. |
|
Jenkins, test this please. |
|
Seems to be passing now, only looks to be waiting on verification of @ebrevdo requested changes. |
| double rows = static_cast<double>(input_matrix_shapes[0].dim_size(0)); | ||
| double num_rhss = static_cast<double>(input_matrix_shapes[1].dim_size(1)); | ||
| double cost = rows * rows * num_rhss; | ||
| double cost = rows * rows * num_rhss * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry! Should be gone now.
ebrevdo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nit; otherwise LGTM.
|
|
||
| #if GOOGLE_CUDA | ||
| #include "tensorflow/core/platform/stream_executor.h" | ||
| #endif // GOOGLE_CUDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, one last thing: we need an extra space before the //
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
Jenkins, test this please. |
Adds a GPU version of the triangular solver op using cuBLAS trsm.