Skip to content

Commit f307068

Browse files
sguggerEduardoPach
authored andcommitted
Document check copies (huggingface#25291)
* Document check copies better and add tests * Include header in check for copies * Manual fixes * Try autofix * Fixes * Clean tests * Finalize doc * Remove debug print * More fixes
1 parent 7b7192b commit f307068

File tree

51 files changed

+380
-164
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+380
-164
lines changed

docs/source/en/add_new_model.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ own regarding how code should be written :-)
101101
1. The forward pass of your model should be fully written in the modeling file while being fully independent of other
102102
models in the library. If you want to reuse a block from another model, copy the code and paste it with a
103103
`# Copied from` comment on top (see [here](https://github.com/huggingface/transformers/blob/v4.17.0/src/transformers/models/roberta/modeling_roberta.py#L160)
104-
for a good example).
104+
for a good example and [there](pr_checks#check-copies) for more documentation on Copied from).
105105
2. The code should be fully understandable, even by a non-native English speaker. This means you should pick
106106
descriptive variable names and avoid abbreviations. As an example, `activation` is preferred to `act`.
107107
One-letter variable names are strongly discouraged unless it's an index in a for loop.

docs/source/en/pr_checks.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,3 +142,58 @@ Additional checks concern PRs that add new models, mainly that:
142142
- All checkpoints used actually exist on the Hub
143143
144144
-->
145+
146+
### Check copies
147+
148+
Since the Transformers library is very opinionated with respect to model code, and each model should fully be implemented in a single file without relying on other models, we have added a mechanism that checks whether a copy of the code of a layer of a given model stays consistent with the original. This way, when there is a bug fix, we can see all other impacted models and choose to trickle down the modification or break the copy.
149+
150+
<Tip>
151+
152+
If a file is a full copy of another file, you should register it in the constant `FULL_COPIES` of `utils/check_copies.py`.
153+
154+
</Tip>
155+
156+
This mechanism relies on comments of the form `# Copied from xxx`. The `xxx` should contain the whole path to the class of function which is being copied below. For instance, `RobertaSelfOutput` is a direct copy of the `BertSelfOutput` class, so you can see [here](https://github.com/huggingface/transformers/blob/2bd7a27a671fd1d98059124024f580f8f5c0f3b5/src/transformers/models/roberta/modeling_roberta.py#L289) it has a comment:
157+
158+
```py
159+
# Copied from transformers.models.bert.modeling_bert.BertSelfOutput
160+
```
161+
162+
Note that instead of applying this to a whole class, you can apply it to the relevant methods that are copied from. For instance [here](https://github.com/huggingface/transformers/blob/2bd7a27a671fd1d98059124024f580f8f5c0f3b5/src/transformers/models/roberta/modeling_roberta.py#L598) you can see how `RobertaPreTrainedModel._init_weights` is copied from the same method in `BertPreTrainedModel` with the comment:
163+
164+
```py
165+
# Copied from transformers.models.bert.modeling_bert.BertPreTrainedModel._init_weights
166+
```
167+
168+
Sometimes the copy is exactly the same except for names: for instance in `RobertaAttention`, we use `RobertaSelfAttention` insted of `BertSelfAttention` but other than that, the code is exactly the same. This is why `# Copied from` supports simple string replacements with the follwoing syntax: `Copied from xxx with foo->bar`. This means the code is copied with all instances of `foo` being replaced by `bar`. You can see how it used [here](https://github.com/huggingface/transformers/blob/2bd7a27a671fd1d98059124024f580f8f5c0f3b5/src/transformers/models/roberta/modeling_roberta.py#L304C1-L304C86) in `RobertaAttention` with the comment:
169+
170+
```py
171+
# Copied from transformers.models.bert.modeling_bert.BertAttention with Bert->Roberta
172+
```
173+
174+
Note that there shouldn't be any spaces around the arrow (unless that space is part of the pattern to replace of course).
175+
176+
You can add several patterns separated by a comma. For instance here `CamemberForMaskedLM` is a direct copy of `RobertaForMaskedLM` with two replacements: `Roberta` to `Camembert` and `ROBERTA` to `CAMEMBERT`. You can see [here](https://github.com/huggingface/transformers/blob/15082a9dc6950ecae63a0d3e5060b2fc7f15050a/src/transformers/models/camembert/modeling_camembert.py#L929) this is done with the comment:
177+
178+
```py
179+
# Copied from transformers.models.roberta.modeling_roberta.RobertaForMaskedLM with Roberta->Camembert, ROBERTA->CAMEMBERT
180+
```
181+
182+
If the order matters (because one of the replacements might conflict with a previous one), the replacements are executed from left to right.
183+
184+
<Tip>
185+
186+
If the replacements change the formatting (if you replace a short name by a very long name for instance), the copy is checked after applying the auto-formatter.
187+
188+
</Tip>
189+
190+
Another way when the patterns are just different casings of the same replacement (with an uppercased and a lowercased variants) is just to add the option `all-casing`. [Here](https://github.com/huggingface/transformers/blob/15082a9dc6950ecae63a0d3e5060b2fc7f15050a/src/transformers/models/mobilebert/modeling_mobilebert.py#L1237) is an example in `MobileBertForSequenceClassification` with the comment:
191+
192+
```py
193+
# Copied from transformers.models.bert.modeling_bert.BertForSequenceClassification with Bert->MobileBert all-casing
194+
```
195+
196+
In this case, the code is copied from `BertForSequenceClassification` by replacing:
197+
- `Bert` by `MobileBert` (for instance when using `MobileBertModel` in the init)
198+
- `bert` by `mobilebert` (for instance when defining `self.mobilebert`)
199+
- `BERT` by `MOBILEBERT` (in the constant `MOBILEBERT_INPUTS_DOCSTRING`)

src/transformers/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1169,6 +1169,7 @@
11691169
"BartForSequenceClassification",
11701170
"BartModel",
11711171
"BartPretrainedModel",
1172+
"BartPreTrainedModel",
11721173
"PretrainedBartModel",
11731174
]
11741175
)
@@ -5093,6 +5094,7 @@
50935094
BartForQuestionAnswering,
50945095
BartForSequenceClassification,
50955096
BartModel,
5097+
BartPreTrainedModel,
50965098
BartPretrainedModel,
50975099
PretrainedBartModel,
50985100
)

src/transformers/models/albert/modeling_flax_albert.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,6 @@ def setup(self):
173173
self.LayerNorm = nn.LayerNorm(epsilon=self.config.layer_norm_eps, dtype=self.dtype)
174174
self.dropout = nn.Dropout(rate=self.config.hidden_dropout_prob)
175175

176-
# Copied from transformers.models.bert.modeling_flax_bert.FlaxBertEmbeddings.__call__
177176
def __call__(self, input_ids, token_type_ids, position_ids, deterministic: bool = True):
178177
# Embed
179178
inputs_embeds = self.word_embeddings(input_ids.astype("i4"))

src/transformers/models/albert/tokenization_albert.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -183,10 +183,10 @@ def __init__(
183183
self.sp_model.Load(vocab_file)
184184

185185
@property
186-
def vocab_size(self):
186+
def vocab_size(self) -> int:
187187
return len(self.sp_model)
188188

189-
def get_vocab(self):
189+
def get_vocab(self) -> Dict[str, int]:
190190
vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
191191
vocab.update(self.added_tokens_encoder)
192192
return vocab

src/transformers/models/align/modeling_align.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -286,7 +286,7 @@ def align_loss(similarity: torch.Tensor) -> torch.Tensor:
286286
return (caption_loss + image_loss) / 2.0
287287

288288

289-
# Copied from transformers.models.efficientnet.modeling_efficientnet.round_filters with EfficientNet -> AlignVision
289+
# Copied from transformers.models.efficientnet.modeling_efficientnet.round_filters with EfficientNet->AlignVision
290290
def round_filters(config: AlignVisionConfig, num_channels: int):
291291
r"""
292292
Round number of filters based on depth multiplier.

src/transformers/models/bart/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@
4949
"BartForQuestionAnswering",
5050
"BartForSequenceClassification",
5151
"BartModel",
52+
"BartPreTrainedModel",
5253
"BartPretrainedModel",
5354
"PretrainedBartModel",
5455
]
@@ -107,6 +108,7 @@
107108
BartForQuestionAnswering,
108109
BartForSequenceClassification,
109110
BartModel,
111+
BartPreTrainedModel,
110112
BartPretrainedModel,
111113
PretrainedBartModel,
112114
)

src/transformers/models/bart/modeling_bart.py

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -502,7 +502,7 @@ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
502502
return hidden_states
503503

504504

505-
class BartPretrainedModel(PreTrainedModel):
505+
class BartPreTrainedModel(PreTrainedModel):
506506
config_class = BartConfig
507507
base_model_prefix = "model"
508508
supports_gradient_checkpointing = True
@@ -536,10 +536,18 @@ def dummy_inputs(self):
536536
return dummy_inputs
537537

538538

539-
class PretrainedBartModel(BartPretrainedModel):
539+
class PretrainedBartModel(BartPreTrainedModel):
540540
def __init_subclass__(self):
541541
warnings.warn(
542-
"The class `PretrainedBartModel` has been depreciated, please use `BartPretrainedModel` instead.",
542+
"The class `PretrainedBartModel` has been depreciated, please use `BartPreTrainedModel` instead.",
543+
FutureWarning,
544+
)
545+
546+
547+
class BartPretrainedModel(BartPreTrainedModel):
548+
def __init_subclass__(self):
549+
warnings.warn(
550+
"The class `PretrainedBartModel` has been depreciated, please use `BartPreTrainedModel` instead.",
543551
FutureWarning,
544552
)
545553

@@ -700,7 +708,7 @@ def __init_subclass__(self):
700708
"""
701709

702710

703-
class BartEncoder(BartPretrainedModel):
711+
class BartEncoder(BartPreTrainedModel):
704712
"""
705713
Transformer encoder consisting of *config.encoder_layers* self attention layers. Each layer is a
706714
[`BartEncoderLayer`].
@@ -882,7 +890,7 @@ def custom_forward(*inputs):
882890
)
883891

884892

885-
class BartDecoder(BartPretrainedModel):
893+
class BartDecoder(BartPreTrainedModel):
886894
"""
887895
Transformer decoder consisting of *config.decoder_layers* layers. Each layer is a [`BartDecoderLayer`]
888896
@@ -1169,7 +1177,7 @@ def custom_forward(*inputs):
11691177
"The bare BART Model outputting raw hidden-states without any specific head on top.",
11701178
BART_START_DOCSTRING,
11711179
)
1172-
class BartModel(BartPretrainedModel):
1180+
class BartModel(BartPreTrainedModel):
11731181
_tied_weights_keys = ["encoder.embed_tokens.weight", "decoder.embed_tokens.weight"]
11741182

11751183
def __init__(self, config: BartConfig):
@@ -1296,7 +1304,7 @@ def forward(
12961304
@add_start_docstrings(
12971305
"The BART Model with a language modeling head. Can be used for summarization.", BART_START_DOCSTRING
12981306
)
1299-
class BartForConditionalGeneration(BartPretrainedModel):
1307+
class BartForConditionalGeneration(BartPreTrainedModel):
13001308
base_model_prefix = "model"
13011309
_tied_weights_keys = ["encoder.embed_tokens.weight", "decoder.embed_tokens.weight", "lm_head.weight"]
13021310
_keys_to_ignore_on_load_missing = ["final_logits_bias"]
@@ -1471,7 +1479,7 @@ def _reorder_cache(past_key_values, beam_idx):
14711479
""",
14721480
BART_START_DOCSTRING,
14731481
)
1474-
class BartForSequenceClassification(BartPretrainedModel):
1482+
class BartForSequenceClassification(BartPreTrainedModel):
14751483
_tied_weights_keys = ["encoder.embed_tokens.weight", "decoder.embed_tokens.weight"]
14761484

14771485
def __init__(self, config: BartConfig, **kwargs):
@@ -1601,7 +1609,7 @@ def forward(
16011609
""",
16021610
BART_START_DOCSTRING,
16031611
)
1604-
class BartForQuestionAnswering(BartPretrainedModel):
1612+
class BartForQuestionAnswering(BartPreTrainedModel):
16051613
_tied_weights_keys = ["encoder.embed_tokens.weight", "decoder.embed_tokens.weight"]
16061614

16071615
def __init__(self, config):
@@ -1719,7 +1727,7 @@ def forward(
17191727
)
17201728

17211729

1722-
class BartDecoderWrapper(BartPretrainedModel):
1730+
class BartDecoderWrapper(BartPreTrainedModel):
17231731
"""
17241732
This wrapper class is a helper class to correctly load pretrained checkpoints when the causal language model is
17251733
used in combination with the [`EncoderDecoderModel`] framework.
@@ -1739,7 +1747,7 @@ def forward(self, *args, **kwargs):
17391747
""",
17401748
BART_START_DOCSTRING,
17411749
)
1742-
class BartForCausalLM(BartPretrainedModel):
1750+
class BartForCausalLM(BartPreTrainedModel):
17431751
_tied_weights_keys = ["lm_head.weight"]
17441752

17451753
def __init__(self, config):

src/transformers/models/bit/modeling_bit.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -300,7 +300,7 @@ def forward(self, pixel_values: Tensor) -> Tensor:
300300

301301

302302
# Copied from transformers.models.convnext.modeling_convnext.drop_path
303-
def drop_path(input, drop_prob: float = 0.0, training: bool = False):
303+
def drop_path(input: torch.Tensor, drop_prob: float = 0.0, training: bool = False) -> torch.Tensor:
304304
"""
305305
Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
306306

src/transformers/models/blenderbot/modeling_flax_blenderbot.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@
2222
import flax.linen as nn
2323
import jax
2424
import jax.numpy as jnp
25-
import numpy as np
2625
from flax.core.frozen_dict import FrozenDict, freeze, unfreeze
2726
from flax.linen import combine_masks, make_causal_mask
2827
from flax.linen.attention import dot_product_attention_weights
@@ -205,7 +204,7 @@
205204

206205

207206
# Copied from transformers.models.bart.modeling_flax_bart.shift_tokens_right
208-
def shift_tokens_right(input_ids: np.array, pad_token_id: int, decoder_start_token_id: int) -> np.ndarray:
207+
def shift_tokens_right(input_ids: jnp.array, pad_token_id: int, decoder_start_token_id: int) -> jnp.ndarray:
209208
"""
210209
Shift input ids one token to the right.
211210
"""

0 commit comments

Comments
 (0)