Support for axis parameter in linalg.gemm #10864

asmushetzel · 2018-05-09T14:49:11Z

Description

This PR adds an optional axis parameter to the linalg.gemm/linalg.gemm2 operators that specifies the axis that indexes the matrix rows. Default is axis = -2 which is the behavior so far for this operators (matrices are encoded by the last two dimensions).
The rationale behind this PR is that in some important uses cases (example is the attention mechanism in the transformer model for neural machine translation) situations requiring a batched matrix-matrix multiply with a non-standard axis for the matrix rows arise naturally. Such computations can be always be performed by an explicit swap-axis/transpose operator followed by a batch-dot, but this adds significant computation overhead. As the underlying blas-libraries are able to deal natively with non-consecutive matrix representations w/out performance impact, it is useful to leverage this and expose a higher level switch such that such additional transpositions can be omitted.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

[ X] Changes are complete (i.e. I finished coding on this PR)
[ X] All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
[X ] Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
[ X] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

asmushetzel · 2018-05-09T17:14:25Z

One check failed on one node and this is something completely unrelated :-(

piiswrong · 2018-05-10T19:50:51Z

please rebase and try again

piiswrong · 2018-05-10T19:51:19Z

This this behavior for axis common for other frameworks?

asmushetzel · 2018-05-11T13:17:08Z

Other frameworks apparently do not offer this additional flexibility. You would have to go through explicit transpose/swap-axis like functions.

asmushetzel · 2018-05-11T16:33:39Z

The build system is currently unstable and produces random failures :-(

piiswrong

Is the axis parameter similar to http://deeplearning.net/software/theano/library/tensor/basic.html#theano.tensor.tensordot ?

Could you make more clarifications in the docs?

piiswrong · 2018-05-15T18:34:23Z

src/operator/tensor/la_op.h

@@ -53,13 +54,17 @@ struct LaMatrixMacParam : public dmlc::Parameter<LaMatrixMacParam> {
    DMLC_DECLARE_FIELD(beta)
      .set_default(1.0)
      .describe("Scalar factor multiplied with C.");
+    DMLC_DECLARE_FIELD(axis)
+      .set_default(-2)
+      .describe("Axis corresponding to the matrix rows.");


This is a little confusing. Is it the rows of the resulting matrix?

I changed the operator description and added an example. Let me know if this clarifies things.

It's not similar to the theano's tensordot

I see. I'm not familiar with the context, but I want to make sure we are not creating new conventions unless we have to.
Does this solve a different use case from theano's tensordot formulation?

asmushetzel · 2018-05-28T09:22:23Z

It does solve a different use case. linalg.gemm deals with a batch of gemm operations. The extension in this PR relaxes the constraint to have only the leading dimension used as a batch coordinate (by allowing that the coordinate associated with the matrix rows is at a different axis).

Batching and tensordot are two different concepts. That is why Theano also has a batched_tensordot() operator. While a single matrix-matrix product can be formulated either as gemm or as a tensordot, a batch of gemms can not be formulated as a single tensordot. Suppose we have a batch-gemm on shapes A= (I, M, K) and B = (I, K, N) where the first coordinate is the batch coordinate, then batch_gemm will return a shape (I, M, N) while tensordot(A, B, [[2], [1]]) will return a shape (I, I, M, N), i.e. it is a completely different computation.
So this PR allows more flexibility about which axis are batch coordinates, but it is not anything that brings in any tensordot-functionalities.

piiswrong · 2018-05-29T18:15:06Z

Ok. Thanks for the explanation.

asmushetzel force-pushed the gemm_axis branch from e8e309e to a63ef27 Compare May 9, 2018 15:05

asmushetzel force-pushed the gemm_axis branch 2 times, most recently from b566c4a to 529389e Compare May 11, 2018 12:21

asmushetzel force-pushed the gemm_axis branch from 529389e to 851a918 Compare May 11, 2018 14:11

piiswrong reviewed May 15, 2018

View reviewed changes

Support for axis parameter in linalg.gemm

48173a6

asmushetzel force-pushed the gemm_axis branch from 851a918 to 48173a6 Compare May 21, 2018 21:59

piiswrong merged commit 4ac76c8 into apache:master May 29, 2018

rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018

Support for axis parameter in linalg.gemm (apache#10864)

27e2d60

zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018

Support for axis parameter in linalg.gemm (apache#10864)

e6d1dff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for axis parameter in linalg.gemm #10864

Support for axis parameter in linalg.gemm #10864

asmushetzel commented May 9, 2018

asmushetzel commented May 9, 2018

piiswrong commented May 10, 2018

piiswrong commented May 10, 2018

asmushetzel commented May 11, 2018

asmushetzel commented May 11, 2018

piiswrong left a comment

piiswrong May 15, 2018

asmushetzel May 21, 2018

asmushetzel May 21, 2018

piiswrong May 24, 2018

asmushetzel commented May 28, 2018

piiswrong commented May 29, 2018

Support for axis parameter in linalg.gemm #10864

Support for axis parameter in linalg.gemm #10864

Conversation

asmushetzel commented May 9, 2018

Description

Checklist

Essentials

asmushetzel commented May 9, 2018

piiswrong commented May 10, 2018

piiswrong commented May 10, 2018

asmushetzel commented May 11, 2018

asmushetzel commented May 11, 2018

piiswrong left a comment

Choose a reason for hiding this comment

piiswrong May 15, 2018

Choose a reason for hiding this comment

asmushetzel May 21, 2018

Choose a reason for hiding this comment

asmushetzel May 21, 2018

Choose a reason for hiding this comment

piiswrong May 24, 2018

Choose a reason for hiding this comment

asmushetzel commented May 28, 2018

piiswrong commented May 29, 2018