Add `ModelTesterMixin`, UI improvement #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

eltoto1219 merged 1 commit into lxmert_model from lxmert-patch-lysandrejik

Aug 13, 2020

Collaborator

LysandreJik commented Aug 13, 2020 •

edited

Loading

This PR aims to add the mixin ModelTesterMixin to LXMERT, to ensure that it behaves correctly. In doing so there has been a few bugfixes, but also a slight refactor of LXMERT's UI. This affects only user-facing method of user-facing models (LxmertModel, LxmertForPretraining, LxmertForQuestionAnswering).

Some things to note:

the UI is extremely important. The UI need to align with the rest of the library or users will be lost and either open issues on a regular basis to understand, or simply not use the model at all.
The UI is the most important component of the model, as it cannot be easily changed over time. Once the UI is set, introducing any changes to it results in breaking changes which is a tremendous pain for users. The internals of the models may change over time, however, as long as the resulting behavior doesn't change.

The noteworthy changes are detailed in comments below for an easier review.

Only PyTorch for now; if you agree with these changes I'll do TensorFlow as well.


          ModelTesterMixin

819b6fc

LysandreJik requested a review from eltoto1219

August 13, 2020 07:14

LysandreJik commented

View reviewed changes

docs/source/model_doc/lxmert.rst

Comment on lines +1 to +8

    
              LXMERT

              ----------------------------------------------------

              Overview

              ~~~~~~~~~~~~~~~~~~~~~

Collaborator Author

LysandreJik Aug 13, 2020

This adds LXMERT to the documentation. I've left the Overview blank for now.

src/transformers/configuration_lxmert.py

Comment on lines +179 to +183

    
                      self.num_hidden_layers = {

                          "vision": r_layers,

                          "cross_encoder": x_layers,

                          "language": l_layers

                      }

Collaborator Author

LysandreJik Aug 13, 2020

This is very different to what we usually do, but as it's the first multi-transformer architecture we can have a bit more freedom. I think this approach makes sense, but we'll have to discuss with other team members before the final merge in master if we decide to merge this.

src/transformers/modeling_lxmert.py

Comment on lines +37 to +39

    
              _CONFIG_FOR_DOC = "LxmertConfig"

              _TOKENIZER_FOR_DOC = "LxmertTokenizer"

Collaborator Author

LysandreJik Aug 13, 2020

Our documentation format has slightly changed, I've updated it here.

src/transformers/modeling_lxmert.py

    
                      return gelu(x)

              @dataclass

              class LxmertModelOutput(ModelOutput):

Collaborator Author

LysandreJik Aug 13, 2020

Moved the model outputs in this file, as they're model-specific. Renamed and re-ordered the parameters.

src/transformers/modeling_lxmert.py

Comment on lines +61 to +89

    
                      language_output (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):

                          Sequence of hidden-states at the output of the last layer of the language encoder.

                      vision_output (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):

                          Sequence of hidden-states at the output of the last layer of the visual encoder.

                      pooled_output (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, hidden_size)`):

                          Last layer hidden-state of the first token of the sequence (classification, CLS, token)

                          further processed by a Linear layer and a Tanh activation function. The Linear

                      language_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):

                          Tuple of :obj:`torch.FloatTensor` (one for input features + one for the output of each cross-modality layer)

                          of shape :obj:`(batch_size, sequence_length, hidden_size)`.

                      vision_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):

                          Tuple of :obj:`torch.FloatTensor` (one for input features + one for the output of each cross-modality layer)

                          of shape :obj:`(batch_size, sequence_length, hidden_size)`.

                      language_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):

                          Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape

                          :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.

                          Attentions weights after the attention softmax, used to compute the weighted average in the self-attention

                          heads.

                      vision_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):

                          Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape

                          :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.

                          Attentions weights after the attention softmax, used to compute the weighted average in the self-attention

                          heads.

                      cross_encoder_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):

                          Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape

                          :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.

                          Attentions weights after the attention softmax, used to compute the weighted average in the self-attention

                          heads.

                  """

Collaborator Author

LysandreJik Aug 13, 2020

Several things we have to respect with these model outputs:

There should be absolutely no different behavior between using the traditional tuple output and this model output, if using .to_tuple. This means that:

model(**inputs) == model(**inputs, return_dict=True).to_tuple()

We're adding a test to the common tests about this here.

Model outputs should therefore have the same order as the regular tuple outputs.
I've re-ordered it as such: CATEGORY_1[language, vision, misc], CATEGORY_2[language, vision, misc], etc.
The documented args should have the same order as the args defined in the model output.

src/transformers/modeling_lxmert.py

Comment on lines -998 to +1122

    
                      total_loss = 0.0

                      total_loss = torch.tensor(0.0, device=device)

Collaborator Author

LysandreJik Aug 13, 2020

We deal in tensor or tuple of tensor outputs only.

src/transformers/modeling_lxmert.py

Comment on lines -1072 to +1190

    
                      self.loss = CrossEntropyLoss(ignore_index=-100)

                      self.loss = CrossEntropyLoss()

Collaborator Author

LysandreJik Aug 13, 2020

The ignore_index is -100 by default

tests/test_modeling_lxmert.py

Comment on lines -249 to +258

    
                      self.parent.assertEqual(result.last_hidden_state_l.shape, (self.batch_size, self.seq_length, self.hidden_size))

                      self.parent.assertEqual(result.language_output.shape, (self.batch_size, self.seq_length, self.hidden_size))

                      self.parent.assertEqual(

                          result.last_hidden_state_v.shape, (self.batch_size, self.num_visual_features, self.hidden_size)

                          result.vision_output.shape, (self.batch_size, self.num_visual_features, self.hidden_size)

                      )

                      self.parent.assertEqual(result.pooled_output_x_encoder.shape, (self.batch_size, self.hidden_size))

                      self.parent.assertEqual(result.pooled_output.shape, (self.batch_size, self.hidden_size))

Collaborator Author

LysandreJik Aug 13, 2020

I hope you'll agree with me that this output is cleaner and simpler to understand for users :)

tests/test_modeling_lxmert.py

Comment on lines -521 to +516

    
              class LxmertModelTest(unittest.TestCase):

              class LxmertModelTest(ModelTesterMixin, unittest.TestCase):

Collaborator Author

LysandreJik Aug 13, 2020

boom!

tests/test_modeling_lxmert.py

    
                          model = LxmertModel.from_pretrained(model_name)

                          self.assertIsNotNone(model)

                  def test_attention_outputs(self):

Collaborator Author

LysandreJik Aug 13, 2020

Needed to re-implement that test here, as well as the hidden states test since the behavior is different to other single-transformer models.

eltoto1219 merged this pull request into lxmert_model

LysandreJik mentioned this pull request

Adding the LXMERT pretraining model (MultiModal languageXvision) to HuggingFace's suite of models huggingface/transformers#5793

Merged

eltoto1219 pushed a commit that referenced this pull request


          Floating-point operations logging in trainer (huggingface#6768)

01d340a

* neFLOs calculation, logging, and reloading (#1)

* testing distributed consecutive batches

* fixed AttributeError from DataParallel

* removed verbosity

* rotate with use_mtime=True

* removed print

* fixed interaction with gradient accumulation

* indent formatting

* distributed neflo counting

* fixed typo

* fixed typo

* mean distributed losses

* exporting log history

* moved a few functions

* floating_point_ops clarification for transformers with parameter-reuse

* code quality

* double import

* made flo estimation more task-agnostic

* only logging flos if computed

* code quality

* unused import

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Sylvain review

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* black

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet