Matrix Triangular Solve GPU op#5010
Conversation
|
Can one of the admins verify this patch? |
|
@c0g, thanks for your PR! By analyzing the history of the files in this pull request, we identified @tensorflower-gardener, @ebrevdo and @josh11b to be potential reviewers. |
| Licensed under the Apache License, Version 2.0 (the "License"); | ||
| you may not use this file except in compliance with the License. | ||
| You may obtain a copy of the License at | ||
|
|
There was a problem hiding this comment.
please don't change whitespace in lines you aren't otherwise modifying
| } | ||
|
|
||
| int64 GetCostPerUnit(const TensorShapes& input_matrix_shapes) const final | ||
| { |
There was a problem hiding this comment.
you probably need to multiply this by Eigen's AddCost / MulCost to get appropriate estimates
There was a problem hiding this comment.
I messed up the formatting on the original op due to changing white space. Everything in MatrixTriangularSolveOp is original code from TF.
I'm happy to modify the original code to include AddCost/MulCost.
| { | ||
| double rows = static_cast<double>(input_matrix_shapes[0].dim_size(0)); | ||
| double num_rhss = static_cast<double>(input_matrix_shapes[1].dim_size(1)); | ||
| double cost = rows * rows * num_rhss; |
There was a problem hiding this comment.
why are you doing this in double precision?
There was a problem hiding this comment.
This is the original tensorflow op code (see above). Possibly because double can represent larger numbers than int? I doubt anyone will try a matrix multiply large enough to overflow a 64 bit int though. Happy to change to int64.
|
|
||
| if __name__ == "__main__": | ||
| tf.test.main() | ||
| tf.test.main() No newline at end of file |
| } else { | ||
| output.noalias() = triangle.solve(rhs); | ||
| } | ||
| trans = perftools::gputools::blas::Transpose::kNoTranspose; |
There was a problem hiding this comment.
can you use longer variable names instead of 'trans', 'lda', 'ldb', etc?
then in ThenBlasTrsm call, you can do e.g. upper_or_lower /* uplo */, ..., etc.
easier for other users to read.
| cublas_m, cublas_n, 1.0, matrix_ptr, lda, &out_ptr, | ||
| ldb) | ||
| .ok(); | ||
| // LOG(INFO) << blas_launch_status; |
| stream | ||
| ->ThenBlasTrsm(perftools::gputools::blas::Side::kRight, uplo, trans, | ||
| perftools::gputools::blas::Diagonal::kNonUnit, | ||
| cublas_m, cublas_n, 1.0, matrix_ptr, lda, &out_ptr, |
There was a problem hiding this comment.
I think this needs to be Scalar(1) instead of 1.0 (which is a double)
|
Thanks for the PR! A few comments. |
| @@ -24,15 +24,17 @@ | |||
| class MatrixTriangularSolveOpTest(tf.test.TestCase): | |||
There was a problem hiding this comment.
This is not being tested with GPU support. You need to modify this line:
from tf_py_test to cuda_py_test.
|
Ah in that case let's leave it as double. You're right. On Oct 17, 2016 9:20 AM, "c0g" notifications@github.com wrote:
|
|
Do we still need changes? |
|
Jenkins, test this please. |
|
I have not yet made the requested changes.
|
|
@ebrevdo : I reverted to the original file to fix white space then made your other requested changes. |
|
I don't see the revert. Please push? |
|
@drpngx you're right I have no idea what happened to them. The changes should now be up. Sorry for the lag/commit spam. |
|
Jenkins, test this please. |
|
Seems to be passing now, only looks to be waiting on verification of @ebrevdo requested changes. |
| double rows = static_cast<double>(input_matrix_shapes[0].dim_size(0)); | ||
| double num_rhss = static_cast<double>(input_matrix_shapes[1].dim_size(1)); | ||
| double cost = rows * rows * num_rhss; | ||
| double cost = rows * rows * num_rhss * |
There was a problem hiding this comment.
Sorry! Should be gone now.
ebrevdo
left a comment
There was a problem hiding this comment.
small nit; otherwise LGTM.
|
|
||
| #if GOOGLE_CUDA | ||
| #include "tensorflow/core/platform/stream_executor.h" | ||
| #endif // GOOGLE_CUDA |
There was a problem hiding this comment.
Sorry, one last thing: we need an extra space before the //
|
Jenkins, test this please. |
Adds a GPU version of the triangular solver op using cuBLAS trsm.