Add BartModel #2745

sshleifer · 2020-02-05T17:10:03Z

This ports BART, a "sequence-to-sequence model trained with denoising as pretraining objective." from https://github.com/pytorch/fairseq/tree/master/examples/bart
The decoder is left-to-right, the encoder is biredictional. As such, the code only uses a causal attention mask in the decoder.

TODO:

conversion of pretrained weights
some unit testing
inference produces the same results as the fairseq version.
decide on signature/splitting of encoder, decoder arguments (see

transformers/src/transformers/modeling_encoder_decoder.py

Line 240 in 808bbd5

def prepare_model_kwargs(**kwargs):

)

Docstrings
More comments for code readers

Future PRs

example with correct pretraining objective
BartForSummarization.from_pretrained('bart-large-cnn')

LysandreJik

I'm re-reviewing this to add a few comments related to the documentation and what should be updated for this model to be correctly displayed in the docs.

Left a few comments at the appropriate places, you will have to adapt for the three models (base, masked lm and sequence classification)

LysandreJik · 2020-02-18T17:12:31Z

src/transformers/modeling_bart.py

+    Inputs:
+        **input_ids**: ``torch.LongTensor`` of shape ``(batch_size, sequence_length)``:
+            Indices of input sequence tokens in the vocabulary. Use BartTokenizer.encode to produce them.
+            Padding will be ignored by default should you provide it.
+            Indices can be obtained using :class:`transformers.BartTokenizer.encode(text)`.
+            Also see :func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
+        **attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
+            Mask to avoid performing attention on padding token indices in the encoder inputs.
+            Default: a mask will be created that ignore config.pad_token_id
+            Mask values selected in ``[0, 1]``:
+            ``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
+        **decoder_input_ids**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
+            only use for translation and summarization. Otherwise use the default which shifts the encoder's
+            input_ids right
+        **decoder_attention_mask**  `optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
+           default behavior ignore pad tokens and future tokens.
+             See diagram 1 in the paper for more info on the default strategy
+
+    read `prepare_bart_inputs` for more information on the default behavior.


We've switched to a more uniform format, where you would have to

rename the section from "Inputs" to "Args" for readthedocs/sphinx (our doc generator) to understand it

link to the glossary

If possible re-use as similar docstrings as possible to the other models. Using different docstrings with different vocabulary is bound to confuse users.

(Optional) the glossary currently doesn't contain any information related to the seq2seq models. It would be great if there was, but it is a lengthy process so it might be a better idea to do it once BART is wrapped up. Let me know if this is something that would be interesting for you.

You can check an example in the BERT file.

LysandreJik · 2020-02-18T17:13:16Z

src/transformers/modeling_bart.py

+@add_start_docstrings(
+    "The bare BART Model outputting raw hidden-states without any specific head on top.",
+    BART_START_DOCSTRING,
+    BART_INPUTS_DOCSTRING,
+)


We only link to the start docstrings now in the add_start_docstrings decorator.

LysandreJik · 2020-02-18T17:15:35Z

src/transformers/modeling_bart.py

+    def get_output_embeddings(self):
+        return _make_linear_from_emb(self.shared)
+
+    def forward(


We now link to the inputs in the forward method, cf. BERT file

LysandreJik · 2020-02-18T17:17:56Z

src/transformers/modeling_bart.py

+    r"""
+        **lm_labels**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size, sequence_length)``:
+            Labels for computing the masked language modeling loss.
+            Indices should either be in ``[0, ..., config.vocab_size]`` or -100 (see ``input_ids`` docstring).
+            Tokens with indices set to ``-100`` are ignored (masked), the loss is only computed for the tokens with labels
+            in ``[0, ..., config.vocab_size]``.
+
+    Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs:
+        **loss**: (`optional`, returned when ``lm_labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``:
+            Masked language modeling loss.
+        **prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
+            Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
+        **hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
+            list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
+            of shape ``(batch_size, sequence_length, hidden_size)``:
+            Hidden-states of the model at the output of each layer plus the initial embedding outputs.
+        **attentions**: (`optional`, returned when ``config.output_attentions=True``)
+            list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
+            Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
+
+    Examples::
+
+        tokenizer = BartTokenizer.from_pretrained('bart-large')
+        model = BartForMaskedLM.from_pretrained('bart-large')
+        input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0)  # Batch size 1
+        outputs = model(input_ids=input_ids, lm_labels=input_ids)
+        loss, prediction_scores = outputs[:2]
+
+    """
+    base_model_prefix = "model"


Additional inputs/outputs not detailed in the START/INPUTS docstrings are now added to the forward method as well, cf. BertForPreTraining.

Also, mind how the format changed from "attentions: (optional, returned when config.output_attentions=True)" to "attentions (:obj:tuple(torch.FloatTensor), optional, returned when config.output_attentions=True):".

I believe you can copy and paste most of it.

src/transformers/tokenization_bart.py

src/transformers/modeling_utils.py

* Results same as fairseq * Wrote a ton of tests * Struggled with api signatures * added some docs

Liyang90 · 2020-10-02T20:10:02Z

src/transformers/configuration_bart.py

+        self.dropout = dropout
+
+        # Classifier stuff
+        self.classif_dropout = classifier_dropout


Won't this name mismatch cause the saved value (saved by save_pretrained()) not being loaded to the config by the from_pretrained() method?

I have no clue what problem you are trying to describe. Please file an issue with a pasteable code snippet that has a different output than you expected.

ok filed #7591

sshleifer added 30 commits January 23, 2020 11:32

3 new files

b5c20db

Lots of fairseq copy paste

8420dc5

typo idiocy

22ccda1

Copy paste code that we know we wont use

03d2cf3

before consider Roberta way

d99326e

add tokenization: identical to Roberta

43c7e21

register in configuration auto

24fb639

mid consolidation of fairseq heirarchy

61409b4

Forward works, but shapes are wrong

0b79f39

copy pasted tests

dcf2b88

matching state dict after upgrade

92e487f

Merge remote-tracking branch 'upstream/master' into bart

e0c54ed

rm typo

3871a7a

del maybe layernorm

dbe83c9

Delete more maybe_layer_norm

2373e8a

Moved code round

0dda528

fixed base tests, some notimpl for attention module

69327e4

config cleanup

d630887

Some test cleanup

3cbc6ca

Merge branch 'master' into bart

51ab277

fixed attn weights shape failure with big copy paste

9e694a7

passing hidden_states shape test

f355e36

initializer_factor, passing more tests

3971d97

Merge remote-tracking branch 'upstream/master' into bart

f42997f

utests pass

56c4744

del unused file

26656a0

whitespace

4d77a7c

Merge remote-tracking branch 'upstream/master' into bart

0ce724a

trailing comma

38e057f

make style, quality

a48f89e

LysandreJik self-requested a review February 18, 2020 17:07

More coverage

9e66bbc

LysandreJik reviewed Feb 18, 2020

View reviewed changes

pnpnpn suggested changes Feb 18, 2020

View reviewed changes

src/transformers/tokenization_bart.py Show resolved Hide resolved

sshleifer added 16 commits February 18, 2020 18:18

More test coverage (test_chg branch)

d546db4

kill dead

3a37397

Failing tokenizer test

9b97322

some docs

0f2819c

Docs work, but are innacurate

12b83b9

newlne

12becba

merge upstream

77578ac

Style

5990cfe

Fix decoder_attention_mask test

6cff072

Adopt roberta behavior

e032d06

lower tolerance tests

5592784

Tests passing

0e0b9b1

Delete input_prep test, its trivial

4a4723e

test passing in mask

4a212a2

more coverage

086b17a

Merge remote-tracking branch 'upstream/master' into bart

feaf207

patrickvonplaten approved these changes Feb 20, 2020

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

sshleifer added 3 commits February 20, 2020 17:06

Docs accurate

2c8225a

revert generation change

300df06

improved docs

6db143e

sshleifer merged commit 53ce385 into huggingface:master Feb 20, 2020

sshleifer deleted the bart branch February 20, 2020 23:11

andr-ec mentioned this pull request Feb 27, 2020

AttributeError: 'Model2Model' object has no attribute 'prepare_model_kwargs' in 2.5.1 #3038

Closed

2 tasks

jplu pushed a commit to jplu/transformers that referenced this pull request Mar 25, 2020

New BartModel (huggingface#2745)

bf241a0

* Results same as fairseq * Wrote a ton of tests * Struggled with api signatures * added some docs

Liyang90 reviewed Oct 2, 2020

View reviewed changes

jinmang2 mentioned this pull request Dec 30, 2020

BART 발표를 위한 링크 모음 jinmang2/bring_it_on#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BartModel #2745

Add BartModel #2745

sshleifer commented Feb 5, 2020 •

edited

Loading

LysandreJik left a comment

LysandreJik Feb 18, 2020

LysandreJik Feb 18, 2020

LysandreJik Feb 18, 2020

LysandreJik Feb 18, 2020

LysandreJik Feb 18, 2020

Liyang90 Oct 2, 2020

sshleifer Oct 2, 2020

Liyang90 Oct 5, 2020

Add BartModel #2745

Add BartModel #2745

Conversation

sshleifer commented Feb 5, 2020 • edited Loading

TODO:

Future PRs

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sshleifer commented Feb 5, 2020 •

edited

Loading