Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Nystromformer #14659

Merged
merged 62 commits into from
Jan 11, 2022
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
4ea35ec
Initial commit
novice03 Dec 7, 2021
c3161cf
Config and modelling changes
novice03 Dec 7, 2021
c585c49
Modelling and test changes
novice03 Dec 7, 2021
b2d4d43
Code quality fixes
novice03 Dec 7, 2021
226069d
Modeling changes and conversion script
novice03 Dec 10, 2021
b4058a0
Minor modeling changes and conversion script
novice03 Dec 30, 2021
a890a24
Modeling changes
novice03 Jan 1, 2022
99b56a5
Correct modeling, add tests and documentation
novice03 Jan 2, 2022
bc4cbec
Code refactor
novice03 Jan 2, 2022
fcab712
Remove tokenizers
novice03 Jan 2, 2022
f39df6c
Merge branch 'add_nystromformer' of https://github.com/novice03/trans…
novice03 Jan 2, 2022
28db895
Code refactor
novice03 Jan 2, 2022
1e5c17d
Update __init__.py
novice03 Jan 2, 2022
6670757
Fix bugs
novice03 Jan 2, 2022
519329f
Update src/transformers/__init__.py
novice03 Jan 2, 2022
e3579da
Update src/transformers/__init__.py
novice03 Jan 2, 2022
d7444f1
Update src/transformers/models/nystromformer/__init__.py
novice03 Jan 2, 2022
8740d6d
Update docs/source/model_doc/nystromformer.mdx
novice03 Jan 2, 2022
4438125
Update src/transformers/models/nystromformer/configuration_nystromfor…
novice03 Jan 2, 2022
577511e
Update src/transformers/models/nystromformer/configuration_nystromfor…
novice03 Jan 2, 2022
b13c5dd
Update src/transformers/models/nystromformer/configuration_nystromfor…
novice03 Jan 2, 2022
bd0da9e
Update src/transformers/models/nystromformer/configuration_nystromfor…
novice03 Jan 2, 2022
51b2a93
Update src/transformers/models/nystromformer/convert_nystromformer_or…
novice03 Jan 2, 2022
a4efb44
Update src/transformers/models/nystromformer/configuration_nystromfor…
novice03 Jan 2, 2022
eac563c
Update modeling and test_modeling
novice03 Jan 2, 2022
163702c
Code refactor
novice03 Jan 2, 2022
ceb83f2
Merge branch 'master' into add_nystromformer
novice03 Jan 2, 2022
089945f
.rst to .mdx
novice03 Jan 2, 2022
091c7ad
doc changes
novice03 Jan 2, 2022
2553d19
Doc changes
novice03 Jan 2, 2022
0129631
Update modeling_nystromformer.py
novice03 Jan 2, 2022
9c74356
Doc changes
novice03 Jan 2, 2022
fd9f168
Fix copies
novice03 Jan 2, 2022
4332b82
Apply suggestions from code review
novice03 Jan 2, 2022
016cb9d
Apply suggestions from code review
novice03 Jan 2, 2022
d208c78
Update configuration_nystromformer.py
novice03 Jan 2, 2022
b394eea
Fix copies
novice03 Jan 2, 2022
05d2d9b
Update tests/test_modeling_nystromformer.py
novice03 Jan 3, 2022
9d648fa
Update test_modeling_nystromformer.py
novice03 Jan 3, 2022
552cc12
Merge branch 'master' into add_nystromformer
novice03 Jan 3, 2022
ca3e934
Merge branch 'huggingface:master' into add_nystromformer
novice03 Jan 5, 2022
c362660
Merge branch 'huggingface:master' into add_nystromformer
novice03 Jan 10, 2022
875a0bc
Apply suggestions from code review
novice03 Jan 10, 2022
748092a
Fix code style
novice03 Jan 10, 2022
55edee2
Update modeling_nystromformer.py
novice03 Jan 10, 2022
6fc762d
Update modeling_nystromformer.py
novice03 Jan 10, 2022
c6a8d35
Fix code style
novice03 Jan 10, 2022
3e45dba
Merge branch 'huggingface:master' into add_nystromformer
novice03 Jan 10, 2022
f971632
Reformat modeling file
novice03 Jan 10, 2022
e891e05
Merge branch 'huggingface:master' into add_nystromformer
novice03 Jan 10, 2022
b678f5c
Update modeling_nystromformer.py
novice03 Jan 10, 2022
7268ec3
Merge branch 'huggingface:master' into add_nystromformer
novice03 Jan 10, 2022
02fe323
Modify NystromformerForMultipleChoice
novice03 Jan 10, 2022
b4770d7
Merge branch 'add_nystromformer' of https://github.com/novice03/trans…
novice03 Jan 10, 2022
5f3a389
Fix code quality
novice03 Jan 10, 2022
e0e4be6
Apply suggestions from code review
novice03 Jan 10, 2022
55cbd32
Code style changes and torch.no_grad()
novice03 Jan 10, 2022
5a0b891
Merge branch 'huggingface:master' into add_nystromformer
novice03 Jan 10, 2022
7e9e50a
make style
novice03 Jan 10, 2022
7da8e1c
Merge branch 'huggingface:master' into add_nystromformer
novice03 Jan 10, 2022
bd77cfe
Merge branch 'add_nystromformer' of https://github.com/novice03/trans…
novice03 Jan 10, 2022
9a5663b
Apply suggestions from code review
novice03 Jan 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,7 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/master/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
novice03 marked this conversation as resolved.
Show resolved Hide resolved
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,8 @@
title: MPNet
- local: model_doc/mt5
title: MT5
- local: model_doc/nystromformer
title: Nyströmformer
- local: model_doc/gpt
title: OpenAI GPT
- local: model_doc/gpt2
Expand Down
1 change: 1 addition & 0 deletions docs/source/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,7 @@ Flax), PyTorch, and/or TensorFlow.
| MobileBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| MPNet | ✅ | ✅ | ✅ | ✅ | ❌ |
| mT5 | ✅ | ✅ | ✅ | ✅ | ✅ |
| Nystromformer | ❌ | ❌ | ✅ | ❌ | ❌ |
| OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ |
| OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ |
| Pegasus | ✅ | ✅ | ✅ | ✅ | ✅ |
Expand Down
71 changes: 71 additions & 0 deletions docs/source/model_doc/nystromformer.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
<!--Copyright 2021 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Nystromformer

## Overview

The Nystromformer model was proposed in *<Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention>
novice03 marked this conversation as resolved.
Show resolved Hide resolved
<<https://arxiv.org/abs/2102.03902>>*__ by <Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn
Fung, Yin Li, and Vikas Singh>.
novice03 marked this conversation as resolved.
Show resolved Hide resolved

The abstract from the paper is the following:

*<Transformers have emerged as a powerful tool for a broad range of natural language processing tasks. A key component
novice03 marked this conversation as resolved.
Show resolved Hide resolved
that drives the impressive performance of Transformers is the self-attention mechanism that encodes the influence or
dependence of other tokens on each specific token. While beneficial, the quadratic complexity of self-attention on the
input sequence length has limited its application to longer sequences -- a topic being actively studied in the
community. To address this limitation, we propose Nyströmformer -- a model that exhibits favorable scalability as a
function of sequence length. O ur idea is based on adapting the Nyström method to approximate standard self-attention
novice03 marked this conversation as resolved.
Show resolved Hide resolved
with O(n) complexity. The scalability of Nyströmformer enables application to longer sequences with thousands of
tokens. We perform evaluations on multiple downstream tasks on the GLUE benchmark and IMDB reviews with standard
sequence length, and find that our Nyströmformer performs comparably, or in a few cases, even slightly better, than
standard self-attention. On longer sequence tasks in the Long Range Arena (LRA) benchmark, Nyströmformer performs
favorably relative to other efficient self-attention methods. Our code is available at this https URL.>*
novice03 marked this conversation as resolved.
Show resolved Hide resolved

This model was contributed by *<novice03> <https://huggingface.co/<novice03>>*__. The original code can be found *here
<<https://github.com/mlpen/Nystromformer>>*__.
novice03 marked this conversation as resolved.
Show resolved Hide resolved

## NystromformerConfig

[[autodoc]] NystromformerConfig
- all
novice03 marked this conversation as resolved.
Show resolved Hide resolved

## NystromformerModel

[[autodoc]] NystromformerModel
- forward

## NystromformerForMaskedLM

[[autodoc]] NystromformerForMaskedLM
- forward

## NystromformerForSequenceClassification

[[autodoc]] NystromformerForSequenceClassification
- forward

## NystromformerForMultipleChoice

[[autodoc]] NystromformerForMultipleChoice
- forward

## NystromformerForTokenClassification

[[autodoc]] NystromformerForTokenClassification
- forward

## NystromformerForQuestionAnswering

[[autodoc]] NystromformerForQuestionAnswering
- forward
29 changes: 29 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,10 @@
"models.mobilebert": ["MOBILEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "MobileBertConfig", "MobileBertTokenizer"],
"models.mpnet": ["MPNET_PRETRAINED_CONFIG_ARCHIVE_MAP", "MPNetConfig", "MPNetTokenizer"],
"models.mt5": ["MT5Config"],
"models.nystromformer": [
"NYSTROMFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP",
"NystromformerConfig",
],
"models.openai": ["OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP", "OpenAIGPTConfig", "OpenAIGPTTokenizer"],
"models.pegasus": ["PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP", "PegasusConfig", "PegasusTokenizer"],
"models.perceiver": ["PERCEIVER_PRETRAINED_CONFIG_ARCHIVE_MAP", "PerceiverConfig", "PerceiverTokenizer"],
Expand Down Expand Up @@ -1122,6 +1126,19 @@
]
)
_import_structure["models.mt5"].extend(["MT5EncoderModel", "MT5ForConditionalGeneration", "MT5Model"])
_import_structure["models.nystromformer"].extend(
[
"NYSTROMFORMER_PRETRAINED_MODEL_ARCHIVE_LIST",
"NystromformerForMaskedLM",
"NystromformerForMultipleChoice",
"NystromformerForQuestionAnswering",
"NystromformerForSequenceClassification",
"NystromformerForTokenClassification",
"NystromformerLayer",
"NystromformerModel",
"NystromformerPreTrainedModel",
]
)
_import_structure["models.openai"].extend(
[
"OPENAI_GPT_PRETRAINED_MODEL_ARCHIVE_LIST",
Expand Down Expand Up @@ -2294,6 +2311,7 @@
from .models.mobilebert import MOBILEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, MobileBertConfig, MobileBertTokenizer
from .models.mpnet import MPNET_PRETRAINED_CONFIG_ARCHIVE_MAP, MPNetConfig, MPNetTokenizer
from .models.mt5 import MT5Config
from .models.nystromformer import NYSTROMFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, NystromformerConfig
from .models.openai import OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP, OpenAIGPTConfig, OpenAIGPTTokenizer
from .models.pegasus import PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP, PegasusConfig, PegasusTokenizer
from .models.perceiver import PERCEIVER_PRETRAINED_CONFIG_ARCHIVE_MAP, PerceiverConfig, PerceiverTokenizer
Expand Down Expand Up @@ -3027,6 +3045,17 @@
MPNetPreTrainedModel,
)
from .models.mt5 import MT5EncoderModel, MT5ForConditionalGeneration, MT5Model
from .models.nystromformer import (
NYSTROMFORMER_PRETRAINED_MODEL_ARCHIVE_LIST,
NystromformerForMaskedLM,
NystromformerForMultipleChoice,
NystromformerForQuestionAnswering,
NystromformerForSequenceClassification,
NystromformerForTokenClassification,
NystromformerLayer,
NystromformerModel,
NystromformerPreTrainedModel,
)
from .models.openai import (
OPENAI_GPT_PRETRAINED_MODEL_ARCHIVE_LIST,
OpenAIGPTDoubleHeadsModel,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@
mobilebert,
mpnet,
mt5,
nystromformer,
openai,
pegasus,
perceiver,
Expand Down
3 changes: 3 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
CONFIG_MAPPING_NAMES = OrderedDict(
[
# Add configs here
("nystromformer", "NystromformerConfig"),
("imagegpt", "ImageGPTConfig"),
("qdqbert", "QDQBertConfig"),
("vision-encoder-decoder", "VisionEncoderDecoderConfig"),
Expand Down Expand Up @@ -116,6 +117,7 @@
CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
[
# Add archive maps here
("nystromformer", "NYSTROMFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("imagegpt", "IMAGEGPT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("qdqbert", "QDQBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("fnet", "FNET_PRETRAINED_CONFIG_ARCHIVE_MAP"),
Expand Down Expand Up @@ -190,6 +192,7 @@
MODEL_NAMES_MAPPING = OrderedDict(
[
# Add full (and cased) model names here
("nystromformer", "Nystromformer"),
("imagegpt", "ImageGPT"),
("qdqbert", "QDQBert"),
("vision-encoder-decoder", "Vision Encoder decoder"),
Expand Down
7 changes: 7 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
MODEL_MAPPING_NAMES = OrderedDict(
[
# Base model mapping
("nystromformer", "NystromformerModel"),
("imagegpt", "ImageGPTModel"),
("qdqbert", "QDQBertModel"),
("fnet", "FNetModel"),
Expand Down Expand Up @@ -150,6 +151,7 @@
MODEL_WITH_LM_HEAD_MAPPING_NAMES = OrderedDict(
[
# Model with LM heads mapping
("nystromformer", "NystromformerForMaskedLM"),
("qdqbert", "QDQBertForMaskedLM"),
("fnet", "FNetForMaskedLM"),
("gptj", "GPTJForCausalLM"),
Expand Down Expand Up @@ -277,6 +279,7 @@
MODEL_FOR_MASKED_LM_MAPPING_NAMES = OrderedDict(
[
# Model for Masked LM mapping
("nystromformer", "NystromformerForMaskedLM"),
("perceiver", "PerceiverForMaskedLM"),
("qdqbert", "QDQBertForMaskedLM"),
("fnet", "FNetForMaskedLM"),
Expand Down Expand Up @@ -349,6 +352,7 @@
MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
[
# Model for Sequence Classification mapping
("nystromformer", "NystromformerForSequenceClassification"),
("perceiver", "PerceiverForSequenceClassification"),
("qdqbert", "QDQBertForSequenceClassification"),
("fnet", "FNetForSequenceClassification"),
Expand Down Expand Up @@ -396,6 +400,7 @@
MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict(
[
# Model for Question Answering mapping
("nystromformer", "NystromformerForQuestionAnswering"),
("qdqbert", "QDQBertForQuestionAnswering"),
("fnet", "FNetForQuestionAnswering"),
("gptj", "GPTJForQuestionAnswering"),
Expand Down Expand Up @@ -444,6 +449,7 @@
MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
[
# Model for Token Classification mapping
("nystromformer", "NystromformerForTokenClassification"),
("qdqbert", "QDQBertForTokenClassification"),
("fnet", "FNetForTokenClassification"),
("layoutlmv2", "LayoutLMv2ForTokenClassification"),
Expand Down Expand Up @@ -479,6 +485,7 @@
MODEL_FOR_MULTIPLE_CHOICE_MAPPING_NAMES = OrderedDict(
[
# Model for Multiple Choice mapping
("nystromformer", "NystromformerForMultipleChoice"),
("qdqbert", "QDQBertForMultipleChoice"),
("fnet", "FNetForMultipleChoice"),
("rembert", "RemBertForMultipleChoice"),
Expand Down
62 changes: 62 additions & 0 deletions src/transformers/models/nystromformer/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# flake8: noqa
# There's no way to ignore "F401 '...' imported but unused" warnings in this
# module, but to preserve other warnings. So, don't check this module at all.

# Copyright 2021 The HuggingFace Team. All rights reserved.
novice03 marked this conversation as resolved.
Show resolved Hide resolved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

# rely on isort to merge the imports
from ...file_utils import _LazyModule, is_tokenizers_available, is_torch_available


_import_structure = {
"configuration_nystromformer": ["NYSTROMFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP", "NystromformerConfig"],
}

if is_torch_available():
_import_structure["modeling_nystromformer"] = [
"NYSTROMFORMER_PRETRAINED_MODEL_ARCHIVE_LIST",
"NystromformerForMaskedLM",
"NystromformerForMultipleChoice",
"NystromformerForQuestionAnswering",
"NystromformerForSequenceClassification",
"NystromformerForTokenClassification",
"NystromformerLayer",
"NystromformerModel",
"NystromformerPreTrainedModel",
]


if TYPE_CHECKING:
from .configuration_nystromformer import NYSTROMFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, NystromformerConfig

if is_torch_available():
from .modeling_nystromformer import (
NYSTROMFORMER_PRETRAINED_MODEL_ARCHIVE_LIST,
NystromformerForMaskedLM,
NystromformerForMultipleChoice,
NystromformerForQuestionAnswering,
NystromformerForSequenceClassification,
NystromformerForTokenClassification,
NystromformerLayer,
NystromformerModel,
NystromformerPreTrainedModel,
)


else:
import sys

sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure)
Loading