Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any Explanations about conv_tbc ? Thanks ~~ #172

Closed
mali-nuist opened this issue Jun 6, 2018 · 2 comments
Closed

Any Explanations about conv_tbc ? Thanks ~~ #172

mali-nuist opened this issue Jun 6, 2018 · 2 comments

Comments

@mali-nuist
Copy link

mali-nuist commented Jun 6, 2018

I read the source code but i can't figure out what the operation of conv_tbc is doing , Any explanations ?
For example:
If the shape of input tensor is [10,20](batch_size is 10 and max sentence length is 20) , and then the word embedding resulted in the embed tensor size of [10,20,256](embed size is 256). Then the transposed tensor of size [20,10,256] will be fed into the cov as: x = conv(x)(the conv input channel is 256 and output channel is 512) , which resulted in a new tensor of shape[20, 10, 512] . What the conv is doing ? it seems that the conv treated the embedding axis as channel ?

@myleott
Copy link
Contributor

myleott commented Jun 6, 2018

conv_tbc is the same as torch.nn.Conv1d, but accepts a different input shape.

The input shape for nn.Conv1d is batch x channels x time (BCT), which would require a transpose since the rest of the network operates with time x batch x channel (TBC) tensors. conv_tbc takes time x batch x channel (TBC) input directly.

@mali-nuist
Copy link
Author

thanks for you reply ! i will check the documentation about the torch.nn.Conv1d

yfyeung pushed a commit to yfyeung/fairseq that referenced this issue Dec 6, 2023
renxida added a commit to llvm/torch-mlir that referenced this issue Jan 24, 2024
convolution with [time,batch,channel] ordering, as opposed to the
default [batch, channel, time]. Currently implementing by transposing
the input and output, but may need to get its own implementation in the
future because this is supposed to be an op that gives a speedup. This
is used by fairseq
(facebookresearch/fairseq#172).

(in case you were wondering like me, this is different from transposed
convolution. Transposed convolution has fractional strides).

---------

Co-authored-by: Xida Ren <xida.ren.dev@gmail.com>
Co-authored-by: Frederik Harwath <frederik.harwath@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants