Funnel-Transformer is a new self-attention model that gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, Funnel-Transformer usually has a higher capacity given the same FLOPs. In addition, with a decoder, Funnel-Transformer is able to recover the token-level deep representation for each token from the reduced hidden sequence, which enables standard pretraining.
For a detailed description of technical details and experimental results, please refer to our paper:
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Zihang Dai*, Guokun Lai*, Yiming Yang, Quoc V. Le
(*: equal contribution)
Preprint 2020
- The corresponding source code and instructions are in the
data-scrips
folder, which specifies how to access the raw data we used in this work.
- The corresponding source code is in the
tensorflow
folder, which was developed and exactly used for TPU pretraining & finetuning as presented in the paper. - The TensorFlow funetuning code mainly supports TPU finetuining on GLUE benchmark, text classification, SQuAD and RACE.
- Please refer to
tensorflow/README.md
for details.
- The source code is in the
pytorch
folder, which only serves as an example PyTorch implementation of Funnel-Transformer. - Hence, the PyTorch code only supports GPU finetuning for the GLUE benchmark & text classification.
- Please refer to
pytorch/README.md
for details.
Model Size | PyTorch | TensorFlow | TensorFlow-Full |
---|---|---|---|
B10-10-10H1024 | Link | Link | Link |
B8-8-8H1024 | Link | Link | Link |
B6-6-6H768 | Link | Link | Link |
B6-3x2-3x2H768 | Link | Link | Link |
B4-4-4H768 | Link | Link | Link |
Each .tar.gz
file contains three items:
- A TensorFlow or PyTorch checkpoint (
model.ckpt-*
ormodel.ckpt.pt
) checkpoint containing the pre-trained weights (Note: The TensorFlow checkpoint actually corresponds to 3 files). - A Word Piece model (
vocab.uncased.txt
) used for (de)tokenization. - A config file (
net_config.json
ornet_config.pytorch.json
) which specifies the hyperparameters of the model.
You also can use download_all_ckpts.sh
to download all checkpoints mentioned above.
For how to use the pretrained models, please refer to tensorflow/README.md
or pytorch/README.md
respectively.