-
Notifications
You must be signed in to change notification settings - Fork 0
Add ModelTesterMixin
, UI improvement
#1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LXMERT | ||
---------------------------------------------------- | ||
|
||
Overview | ||
~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds LXMERT to the documentation. I've left the Overview
blank for now.
self.num_hidden_layers = { | ||
"vision": r_layers, | ||
"cross_encoder": x_layers, | ||
"language": l_layers | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very different to what we usually do, but as it's the first multi-transformer architecture we can have a bit more freedom. I think this approach makes sense, but we'll have to discuss with other team members before the final merge in master
if we decide to merge this.
_CONFIG_FOR_DOC = "LxmertConfig" | ||
_TOKENIZER_FOR_DOC = "LxmertTokenizer" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our documentation format has slightly changed, I've updated it here.
return gelu(x) | ||
|
||
@dataclass | ||
class LxmertModelOutput(ModelOutput): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the model outputs in this file, as they're model-specific. Renamed and re-ordered the parameters.
language_output (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`): | ||
Sequence of hidden-states at the output of the last layer of the language encoder. | ||
vision_output (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`): | ||
Sequence of hidden-states at the output of the last layer of the visual encoder. | ||
pooled_output (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, hidden_size)`): | ||
Last layer hidden-state of the first token of the sequence (classification, CLS, token) | ||
further processed by a Linear layer and a Tanh activation function. The Linear | ||
language_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``): | ||
Tuple of :obj:`torch.FloatTensor` (one for input features + one for the output of each cross-modality layer) | ||
of shape :obj:`(batch_size, sequence_length, hidden_size)`. | ||
vision_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``): | ||
Tuple of :obj:`torch.FloatTensor` (one for input features + one for the output of each cross-modality layer) | ||
of shape :obj:`(batch_size, sequence_length, hidden_size)`. | ||
language_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``): | ||
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape | ||
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`. | ||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | ||
heads. | ||
vision_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``): | ||
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape | ||
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`. | ||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | ||
heads. | ||
cross_encoder_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``): | ||
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape | ||
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`. | ||
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | ||
heads. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several things we have to respect with these model outputs:
- There should be absolutely no different behavior between using the traditional tuple output and this model output, if using
.to_tuple
. This means that:
model(**inputs) == model(**inputs, return_dict=True).to_tuple()
We're adding a test to the common tests about this here.
- Model outputs should therefore have the same order as the regular tuple outputs.
- I've re-ordered it as such: CATEGORY_1[language, vision, misc], CATEGORY_2[language, vision, misc], etc.
- The documented args should have the same order as the args defined in the model output.
total_loss = 0.0 | ||
total_loss = torch.tensor(0.0, device=device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We deal in tensor or tuple of tensor outputs only.
self.loss = CrossEntropyLoss(ignore_index=-100) | ||
self.loss = CrossEntropyLoss() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ignore_index is -100 by default
self.parent.assertEqual(result.last_hidden_state_l.shape, (self.batch_size, self.seq_length, self.hidden_size)) | ||
self.parent.assertEqual(result.language_output.shape, (self.batch_size, self.seq_length, self.hidden_size)) | ||
self.parent.assertEqual( | ||
result.last_hidden_state_v.shape, (self.batch_size, self.num_visual_features, self.hidden_size) | ||
result.vision_output.shape, (self.batch_size, self.num_visual_features, self.hidden_size) | ||
) | ||
self.parent.assertEqual(result.pooled_output_x_encoder.shape, (self.batch_size, self.hidden_size)) | ||
self.parent.assertEqual(result.pooled_output.shape, (self.batch_size, self.hidden_size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope you'll agree with me that this output is cleaner and simpler to understand for users :)
class LxmertModelTest(unittest.TestCase): | ||
class LxmertModelTest(ModelTesterMixin, unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boom!
model = LxmertModel.from_pretrained(model_name) | ||
self.assertIsNotNone(model) | ||
|
||
def test_attention_outputs(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to re-implement that test here, as well as the hidden states test since the behavior is different to other single-transformer models.
* neFLOs calculation, logging, and reloading (#1) * testing distributed consecutive batches * fixed AttributeError from DataParallel * removed verbosity * rotate with use_mtime=True * removed print * fixed interaction with gradient accumulation * indent formatting * distributed neflo counting * fixed typo * fixed typo * mean distributed losses * exporting log history * moved a few functions * floating_point_ops clarification for transformers with parameter-reuse * code quality * double import * made flo estimation more task-agnostic * only logging flos if computed * code quality * unused import * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Sylvain review * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * black Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This PR aims to add the mixin
ModelTesterMixin
to LXMERT, to ensure that it behaves correctly. In doing so there has been a few bugfixes, but also a slight refactor of LXMERT's UI. This affects only user-facing method of user-facing models (LxmertModel
,LxmertForPretraining
,LxmertForQuestionAnswering
).Some things to note:
The noteworthy changes are detailed in comments below for an easier review.
Only PyTorch for now; if you agree with these changes I'll do TensorFlow as well.