Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BartModel #2745

Merged
merged 168 commits into from
Feb 20, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
168 commits
Select commit Hold shift + click to select a range
b5c20db
3 new files
sshleifer Jan 23, 2020
8420dc5
Lots of fairseq copy paste
sshleifer Jan 24, 2020
22ccda1
typo idiocy
sshleifer Jan 24, 2020
03d2cf3
Copy paste code that we know we wont use
sshleifer Jan 24, 2020
d99326e
before consider Roberta way
sshleifer Jan 24, 2020
43c7e21
add tokenization: identical to Roberta
sshleifer Jan 25, 2020
24fb639
register in configuration auto
sshleifer Jan 25, 2020
61409b4
mid consolidation of fairseq heirarchy
sshleifer Jan 26, 2020
0b79f39
Forward works, but shapes are wrong
sshleifer Jan 27, 2020
dcf2b88
copy pasted tests
sshleifer Jan 27, 2020
92e487f
matching state dict after upgrade
sshleifer Jan 30, 2020
e0c54ed
Merge remote-tracking branch 'upstream/master' into bart
sshleifer Jan 31, 2020
3871a7a
rm typo
sshleifer Jan 31, 2020
dbe83c9
del maybe layernorm
sshleifer Feb 2, 2020
2373e8a
Delete more maybe_layer_norm
sshleifer Feb 2, 2020
0dda528
Moved code round
sshleifer Feb 2, 2020
69327e4
fixed base tests, some notimpl for attention module
sshleifer Feb 3, 2020
d630887
config cleanup
sshleifer Feb 3, 2020
3cbc6ca
Some test cleanup
sshleifer Feb 4, 2020
51ab277
Merge branch 'master' into bart
sshleifer Feb 4, 2020
9e694a7
fixed attn weights shape failure with big copy paste
sshleifer Feb 4, 2020
f355e36
passing hidden_states shape test
sshleifer Feb 4, 2020
3971d97
initializer_factor, passing more tests
sshleifer Feb 5, 2020
f42997f
Merge remote-tracking branch 'upstream/master' into bart
sshleifer Feb 5, 2020
56c4744
utests pass
sshleifer Feb 5, 2020
26656a0
del unused file
sshleifer Feb 5, 2020
4d77a7c
whitespace
sshleifer Feb 5, 2020
0ce724a
Merge remote-tracking branch 'upstream/master' into bart
sshleifer Feb 5, 2020
38e057f
trailing comma
sshleifer Feb 5, 2020
a48f89e
make style, quality
sshleifer Feb 5, 2020
2ad6e7b
remove fairseq dep
sshleifer Feb 5, 2020
a772509
Undo error type change
sshleifer Feb 5, 2020
831fd14
whitespace
sshleifer Feb 5, 2020
be62f89
fix fstring
sshleifer Feb 5, 2020
6726d33
black
sshleifer Feb 5, 2020
c0e9510
type hinting only in comments for py35
sshleifer Feb 5, 2020
28bcf61
more style
sshleifer Feb 5, 2020
1f0b885
isort
sshleifer Feb 5, 2020
6aea2b8
fix NameError
sshleifer Feb 5, 2020
4e7279c
del methods
sshleifer Feb 5, 2020
28345b4
del methods
sshleifer Feb 5, 2020
effa170
F.gelu
sshleifer Feb 5, 2020
8c7df3a
small
sshleifer Feb 5, 2020
cee5051
style
sshleifer Feb 5, 2020
586098d
Working conversion script
sshleifer Feb 6, 2020
67b02c6
cleaning
sshleifer Feb 6, 2020
3811209
test init more directly
sshleifer Feb 6, 2020
28c977b
more variance checks
sshleifer Feb 6, 2020
b79509d
hardcoding expected results
sshleifer Feb 6, 2020
5bc3081
undo stupid change
sshleifer Feb 6, 2020
df6edc3
idiot
sshleifer Feb 6, 2020
5eaade8
delete torch version
sshleifer Feb 6, 2020
edc492e
cleanup, passing
sshleifer Feb 6, 2020
7c090b0
cleanup, passing
sshleifer Feb 6, 2020
1d6cde6
passing
sshleifer Feb 6, 2020
1c06538
Style
sshleifer Feb 6, 2020
73cad04
more deletion
sshleifer Feb 6, 2020
a68c20e
cleanup style, passing
sshleifer Feb 6, 2020
5d1bc99
resize_embeddings test passing
sshleifer Feb 6, 2020
c23a07b
AutoTokenizer support
sshleifer Feb 7, 2020
67ef42f
one file
sshleifer Feb 7, 2020
7a4a6e2
Fix class ordering
sshleifer Feb 7, 2020
f80ce45
conversion broken
sshleifer Feb 7, 2020
42e061b
some old changes
sshleifer Feb 7, 2020
28b1f80
conversion scripts work
sshleifer Feb 7, 2020
4e008e6
fix s3 linking
sshleifer Feb 7, 2020
60bd737
no cnn model
sshleifer Feb 7, 2020
a4edf2e
no cnn
sshleifer Feb 7, 2020
e1d106d
One kwarg for encoder_decoder_attention
sshleifer Feb 8, 2020
a9b979f
cleanup
sshleifer Feb 8, 2020
4b97345
BROKEN
sshleifer Feb 8, 2020
ed642cc
half fixed
sshleifer Feb 8, 2020
87ddeae
Merge remote-tracking branch 'upstream/master' into bart
sshleifer Feb 8, 2020
4628b7d
hoist split_kwargs
sshleifer Feb 9, 2020
a653c78
split out testing BartForSequenceClassification
sshleifer Feb 9, 2020
ab594b4
cleanup
sshleifer Feb 9, 2020
73f49a6
lmhead test passing
sshleifer Feb 9, 2020
f7d88db
calc loss in SeqClassification model
sshleifer Feb 9, 2020
8f04dd5
Fix newlines
sshleifer Feb 9, 2020
459aeaf
ci
sshleifer Feb 9, 2020
9ecee5b
comment public API
sshleifer Feb 10, 2020
aadf762
comments
sshleifer Feb 10, 2020
bac8348
reverted API changes
sshleifer Feb 10, 2020
66310db
style
sshleifer Feb 10, 2020
3f03344
isort
sshleifer Feb 10, 2020
808bbd5
Revert "isort"
sshleifer Feb 10, 2020
92b5f6e
some cleanup
sshleifer Feb 10, 2020
21ac214
Merge remote-tracking branch 'upstream/master' into bart
sshleifer Feb 10, 2020
2196cc2
Sty works here
sshleifer Feb 10, 2020
960af22
cleanup
sshleifer Feb 10, 2020
a812adc
fix slow tests
sshleifer Feb 10, 2020
a8a7839
long
sshleifer Feb 10, 2020
49f60d7
cleanup
sshleifer Feb 10, 2020
376a358
cleanup
sshleifer Feb 10, 2020
8ecdd0d
rename BartForMaskedLM
sshleifer Feb 10, 2020
537af62
use masked loss
sshleifer Feb 10, 2020
02b56df
Merge remote-tracking branch 'upstream/master' into bart
sshleifer Feb 10, 2020
765c98a
Merge remote-tracking branch 'upstream/master' into bart
sshleifer Feb 11, 2020
4e1a5e0
Factor in RobertaTokenizer changes
sshleifer Feb 12, 2020
4339102
delete reorder_ functions
sshleifer Feb 12, 2020
3ce6c1e
pop ignore keys
sshleifer Feb 12, 2020
fd3d991
Fix S3 URLs
sshleifer Feb 12, 2020
b22b368
Conform to t5 API
sshleifer Feb 12, 2020
e5c3485
no head passing
sshleifer Feb 12, 2020
e2827b1
mnli passing
sshleifer Feb 13, 2020
4d49735
Generate works, other stuff broken
sshleifer Feb 13, 2020
ac1657b
Only test inputs embeds fails
sshleifer Feb 13, 2020
2a1260a
caching might work
sshleifer Feb 13, 2020
afbfdeb
Merge remote-tracking branch 'upstream/master' into bart
sshleifer Feb 13, 2020
82877e7
Merge remote-tracking branch 'upstream/master' into bart
sshleifer Feb 13, 2020
8252075
Naming changes, tests pass besides embeds
sshleifer Feb 13, 2020
6bacd55
Dont support inputs embeds
sshleifer Feb 13, 2020
71c345f
New signatures, mnli passing
sshleifer Feb 13, 2020
67a4cee
MNLI PASSING, still two masks
sshleifer Feb 13, 2020
6fd50b3
Padding test passes
sshleifer Feb 13, 2020
db3bc84
One combined attn mask
sshleifer Feb 14, 2020
264f6d1
temp revert
sshleifer Feb 14, 2020
ba25b7a
Passing shape test
sshleifer Feb 14, 2020
8f1e8b4
Style
sshleifer Feb 14, 2020
6124967
passing
sshleifer Feb 14, 2020
c01e719
cleanup
sshleifer Feb 14, 2020
5dfc207
test_shift_tokens_right
sshleifer Feb 14, 2020
dafdac8
Move public API to bottom of file
sshleifer Feb 16, 2020
40f7f79
cleanup return types
sshleifer Feb 16, 2020
e7ea674
Share create_position_ids_from_input_ids with roberta
sshleifer Feb 16, 2020
36e1adc
Initialize SequenceClassification correctly
sshleifer Feb 16, 2020
de2ced0
working. About to hoist inputs
sshleifer Feb 16, 2020
8b5bb52
tests pass with new API
sshleifer Feb 16, 2020
c2973d4
py35 compat: type hint in comment
sshleifer Feb 17, 2020
dbe0f4e
more require_torch
sshleifer Feb 17, 2020
a42ac9c
Fix merge conflict
sshleifer Feb 17, 2020
5faa0dd
Redo cached_states rename
sshleifer Feb 18, 2020
6a08f84
Make masks if user doesnt supply. Passing.
sshleifer Feb 18, 2020
c439e19
style
sshleifer Feb 18, 2020
6205ba6
Delete epically slow test
sshleifer Feb 18, 2020
cda9ced
style
sshleifer Feb 18, 2020
f3b4f21
start docs
sshleifer Feb 18, 2020
85c3b77
More docs
sshleifer Feb 18, 2020
5292ab3
style
sshleifer Feb 18, 2020
e2353c3
style
sshleifer Feb 18, 2020
16d2e2e
more docs
sshleifer Feb 18, 2020
2ede7ab
sty
sshleifer Feb 18, 2020
cb425f3
passing
sshleifer Feb 18, 2020
360db12
passing
sshleifer Feb 18, 2020
3c6f62d
style
sshleifer Feb 18, 2020
2d69571
passing
sshleifer Feb 18, 2020
de98500
some attention cleanup
sshleifer Feb 18, 2020
35d421b
docstrings
sshleifer Feb 18, 2020
9e66bbc
More coverage
sshleifer Feb 18, 2020
d546db4
More test coverage (test_chg branch)
sshleifer Feb 18, 2020
3a37397
kill dead
sshleifer Feb 18, 2020
9b97322
Failing tokenizer test
sshleifer Feb 19, 2020
0f2819c
some docs
sshleifer Feb 19, 2020
12b83b9
Docs work, but are innacurate
sshleifer Feb 19, 2020
12becba
newlne
sshleifer Feb 20, 2020
77578ac
merge upstream
sshleifer Feb 20, 2020
5990cfe
Style
sshleifer Feb 20, 2020
6cff072
Fix decoder_attention_mask test
sshleifer Feb 20, 2020
e032d06
Adopt roberta behavior
sshleifer Feb 20, 2020
5592784
lower tolerance tests
sshleifer Feb 20, 2020
0e0b9b1
Tests passing
sshleifer Feb 20, 2020
4a4723e
Delete input_prep test, its trivial
sshleifer Feb 20, 2020
4a212a2
test passing in mask
sshleifer Feb 20, 2020
086b17a
more coverage
sshleifer Feb 20, 2020
feaf207
Merge remote-tracking branch 'upstream/master' into bart
sshleifer Feb 20, 2020
2c8225a
Docs accurate
sshleifer Feb 20, 2020
300df06
revert generation change
sshleifer Feb 20, 2020
6db143e
improved docs
sshleifer Feb 20, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Docs work, but are innacurate
  • Loading branch information
sshleifer committed Feb 19, 2020
commit 12b83b9575d5b1e204beb98a3c8f89363d200142
3 changes: 2 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,4 +99,5 @@ The library currently contains PyTorch and Tensorflow implementations, pre-train
model_doc/camembert
model_doc/albert
model_doc/xlmroberta
model_doc/flaubert
model_doc/flaubert
model_doc/bart
7 changes: 7 additions & 0 deletions docs/source/pretrained_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,13 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
| | | | FlauBERT large architecture |
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Bart | ``bart-large`` | | 12-layer, 1024-hidden, 16-heads, 406M parameters |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bart-large-mnli`` | | Adds a 2 layer classification head with 1 million parameters |
| | | | bart-large base architecture with a classification head |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+


.. <https://huggingface.co/transformers/examples.html>`__
3 changes: 3 additions & 0 deletions src/transformers/configuration_bart.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@


class BartConfig(PretrainedConfig):
r"""
Configuration class for Bart. Parameters are renamed from the fairseq implementation
"""
model_type = "bart"
pretrained_config_archive_map = BART_PRETRAINED_CONFIG_ARCHIVE_MAP

Expand Down
169 changes: 79 additions & 90 deletions src/transformers/modeling_bart.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,50 +36,42 @@
}

BART_START_DOCSTRING = r"""
"BART is a sequence to sequence model which uses a standard Transformer based Translation architecture.

This model is a PyTorch `torch.nn.Module`_ sub-class. Use it as a regular PyTorch Module and
refer to the PyTorch documentation for all matter related to general usage and behavior.

.. _`Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer`:
https://arxiv.org/abs/1910.10683

.. _`torch.nn.Module`:
https://pytorch.org/docs/stable/nn.html#module
Paper: BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
https://arxiv.org/abs/1910.13461
Authors: Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer
(Submitted on 29 Oct 2019)
Code Ported from https://github.com/pytorch/fairseq/tree/master/examples/bart
An encoder decoder transformer pre-trained in a text-to-text denoising generative setting.
'BART is a An encoder decoder transformer pre-trained in a text-to-text denoising generative setting.'
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`_ sub-class. Use it as a regular PyTorch Module and
refer to the PyTorch documentation for all matters related to general usage and behavior.

`Paper <https://arxiv.org/abs/1910.13461>`_: BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Authors: Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer
(Submitted on 29 Oct 2019) `Paper` `Paper`
Code Ported from https://github.com/pytorch/fairseq/tree/master/examples/bart

Parameters:
config (:class:`~transformers.BartConfig`): Model configuration class with all the parameters of the model.
Initializing with a config file does not load the weights associated with the model, only the configuration.
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.

"""

BART_INPUTS_DOCSTRING = r"""
Inputs:
**input_ids**: ``torch.LongTensor`` of shape ``(batch_size, sequence_length)``:
Indices of input sequence tokens in the vocabulary. Use BartTokenizer.encode to produce them.
Args:
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary. Use BartTokenizer.encode to produce them.
Padding will be ignored by default should you provide it.
Indices can be obtained using :class:`transformers.BartTokenizer.encode(text)`.
Also see :func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
**attention_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
Mask to avoid performing attention on padding token indices in the encoder inputs.
Default: a mask will be created that ignore config.pad_token_id

attention_mask (:obj:`torch.Tensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Warning: this parameter is different from other attention_mask parameters and should be used with caution.
OLD
Mask to avoid performing attention on padding token indices. (in input_ids)
Mask values selected in ``[0, 1]``:
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
**decoder_input_ids**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
decoder_input_ids: (:obj:`torch.LongTensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`, defaults to :obj:`None`):
only use for translation and summarization. Otherwise use the default which shifts the encoder's
input_ids right
**decoder_attention_mask** `optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
default behavior ignore pad tokens and future tokens.
decoder_attention_mask (:obj:`torch.Tensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`, defaults to :obj:`None`):
default behavior (if None is passed is to ignore pad tokens and future tokens.)
See diagram 1 in the paper for more info on the default strategy

read `prepare_bart_inputs` for more information on the default behavior.

"""
LARGE_NEGATIVE = -1e4

Expand Down Expand Up @@ -841,7 +833,6 @@ def _filter_out_falsey_values(tup) -> Tuple:
"The bare BART Model outputting raw hidden-states without any specific head on top.", BART_START_DOCSTRING,
)
class BartModel(PretrainedBartModel):
""""""

def __init__(self, config: BartConfig):
super().__init__(config)
Expand All @@ -856,15 +847,6 @@ def __init__(self, config: BartConfig):

self.init_weights()

def get_input_embeddings(self):
return self.shared

def set_input_embeddings(self, value):
self.shared = value

def get_output_embeddings(self):
return _make_linear_from_emb(self.shared)

@add_start_docstrings_to_callable(BART_INPUTS_DOCSTRING)
def forward(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now link to the inputs in the forward method, cf. BERT file

self,
Expand Down Expand Up @@ -902,6 +884,17 @@ def forward(
encoder_outputs = _filter_out_falsey_values(encoder_outputs) # type: tuple
return decoder_outputs + encoder_outputs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for language generation we would need the following variables from decoder_outputs + encoder_outputs:
the variable output from decoder_outputs[0], which will be transformed to logits be the lm_head in BartForMaskedLM, the variable cached_states from decoder_outputs[2] for faster decoding, the variable encoder_hidden_states from encoder_outputs[0] and the variable inputs which shouldn't change and is only needed for the first decoding step and afterwards by the prepare_bart_inputs fn.


def get_input_embeddings(self):
return self.shared

def set_input_embeddings(self, value):
self.shared = value

def get_output_embeddings(self):
return _make_linear_from_emb(self.shared)




@add_start_docstrings(
"The bare BART Model with a language modeling head", BART_START_DOCSTRING,
Expand All @@ -927,29 +920,32 @@ def forward(
**unused
):
r"""
**lm_labels**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size, sequence_length)``:
Labels for computing the masked language modeling loss.
Indices should either be in ``[0, ..., config.vocab_size]`` or -100 (see ``input_ids`` docstring).
Tokens with indices set to ``-100`` are ignored (masked), the loss is only computed for the tokens
with labels
in ``[0, ..., config.vocab_size]``.

Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs:
**loss**: (`optional`, returned when ``lm_labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``:
Masked language modeling loss.
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
of shape ``(batch_size, sequence_length, hidden_size)``:
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads,
sequence_length, sequence_length)``:
Attentions weights after the attention softmax, used to compute the weighted average in the
self-attention heads.
masked_lm_labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Labels for computing the masked language modeling loss.
Indices should either be in ``[0, ..., config.vocab_size]`` or -100 (see ``input_ids`` docstring).
Tokens with indices set to ``-100`` are ignored (masked), the loss is only computed for the tokens
with labels
in ``[0, ..., config.vocab_size]``.

Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.RobertaConfig`) and inputs:
masked_lm_loss (`optional`, returned when ``masked_lm_labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``:
Masked language modeling loss.
prediction_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`)
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
of shape :obj:`(batch_size, sequence_length, hidden_size)`.

Examples::
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
heads.

Examples::

tokenizer = BartTokenizer.from_pretrained('bart-large')
model = BartForMaskedLM.from_pretrained('bart-large')
Expand Down Expand Up @@ -1008,46 +1004,39 @@ def forward(
labels=None,
):
r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
Labels for computing the sequence classification/regression loss.
Indices should be in :obj:`[0, ..., config.num_labels - 1]`.
If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss),
If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (
:class:`~transformers.BartConfig`) and inputs:
loss (:obj:`torch.FloatTensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`label` is
provided):
Classification loss.
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
Labels for computing the sequence classification/regression loss.
Indices should be in :obj:`[0, ..., config.num_labels - 1]`.
If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy).

Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.BartConfig`) and inputs:
loss (:obj:`torch.FloatTensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`label` is provided):
Classification loss (cross entropy)
logits (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.num_labels)`):
Classification (or regression if config.num_labels==1) scores (before SoftMax).
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when
``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of
each layer)
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when
``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.

attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Attentions weights after the attention softmax, used to compute the weighted average in the
self-attention
heads.

Examples::
Examples::

from transformers import BartTokenizer, BartForSequenceClassification
import torch
from transformers import BartTokenizer, BartForSequenceClassification
import torch

tokenizer = BartTokenizer.from_pretrained('bart-large')
model = BartForSequenceClassification.from_pretrained('bart-large')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute",
add_special_tokens=True)).unsqueeze(0) # Batch size 1
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
outputs = model(input_ids, labels=labels)
loss, logits = outputs[:2]
tokenizer = BartTokenizer.from_pretrained('bart-large')
model = BartForSequenceClassification.from_pretrained('bart-large')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute",
add_special_tokens=True)).unsqueeze(0) # Batch size 1
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
outputs = model(input_ids, labels=labels)
loss, logits = outputs[:2]

"""
outputs = self.model.forward(
Expand Down