Skip to content

Conversation

@yoavg
Copy link
Contributor

@yoavg yoavg commented May 25, 2017

This introduces two batchable versions of matmul: one in which the first argument is part of the signature, and a second in which it is not. The first version is triggered for cases where the first argument is shared with >2 matmul nodes.

@neubig
Copy link
Contributor

neubig commented May 26, 2017

Hmm, I really don't like the special handling of matrix multiplies here.
Even if we do introduce something like this I think we should try to be more general.

@yoavg
Copy link
Contributor Author

yoavg commented May 26, 2017

We can easily count how many different nodes a given node is an arg of, instead of how many matmuls. But why is it better? Where is this relevant besides matmul (and maybe affine transform in the future)?

@neubig
Copy link
Contributor

neubig commented May 26, 2017

Ones that are highly relevant in addition to matmul are affine transform, conv2d, tensor contraction, etc. (I'm probably forgetting several). All of these will be much faster if you share parameters vs. iterating over them.

Everything else with an arity over 2 is somewhat relevant. Sharing parameters will reduce the need for a memory copy at least.

Basically my main priority is either that all nodes be treated the same, or alternatively that we have a couple of equivalence classes like "benefit largely from grouping" or "don't benefit much from grouping", so when we implement a new node we can specify which one it belongs to.

@neubig
Copy link
Contributor

neubig commented May 26, 2017

I thought about this a little more. What about keeping a reference count for each node, then providing this reference count to the autobatch_profile() function? For operations where grouping is beneficial, the profile can then decide to group together arguments based on their reference count. For example, we may want to group together all arguments that have a reference count equal to the maximum of all the incoming arguments. This will work for matmul of course (the parameter matrix should have more references than others), and will work for AffineTransform as well, because the weight matrix and bias vector should both have the same reference count most of the time. How does this sound? If it seems reasonable I can try it.

@yoavg
Copy link
Contributor Author

yoavg commented May 26, 2017

I don't fully understand the details of your proposal, but it sounds like extending the count from being first args of matmul to being an arg of anything, and a cleverer way of setting the threshold. sure, sounds good for me, go for it.

another option I thought about (which is a little less automatic) is to add a "shared" flag to nodes. Param nodes will have this on by default, for others it can be turned on. The autobatch_sig and autobatch_concat methods will look at this flag for the args and act accordingly. But if you can get your proposal to work, lets go for it.

@neubig
Copy link
Contributor

neubig commented May 26, 2017

Cool, I'll give this a shot.

@neubig neubig changed the title introduce a version of matmul which does not depend on first arg [WIP] make autobatching matrix multiplies more flexible Sep 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants