Skip to content

[GroundingDino] Fix grounding dino loss 🚨 #31828

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 59 commits into from
Feb 27, 2025

Conversation

EduardoPach
Copy link
Contributor

@EduardoPach EduardoPach commented Jul 7, 2024

What does this PR do?

Fixes #31434

As the original repo doesn't provide the loss implementation I'm using the one implemented here as a baseline since it was mentioned by the original repo, on this issue IDEA-Research/GroundingDINO#241, as a reliable source if one wants to train a GroundingDino model

TODO:

  • Test GroundingDinoMatcher and GroundingDinoLoss are working properly

Explanation of the Issue and Solution

So the issue was that GroundingDinoLoss and GroundingDinoHungarianMatcher were just a copy from DeformableDetr which is used for closed-set object detection (i.e. a fixed set of categories). Whereas in GroundingDino there's no limited amount of categories and the output logits are d_model dimensional where the first seq_len elements have a specified value and the subsequent are nan. The main differences are:

  1. class_labels are associated with the text prompt used
  2. The logits are asscoaited with the tokens of the text so it's not necessarily 1-to-1

For instance if an image with bounding boxes with fishes and jellyfishes using a prompt "fish. jellyfish." fish should have class_label 0 assigned to it and jellyfish should have 1 assigned. If the position of jellyfish and fish in the prompt swapped then the class_labels would swap as well. Moreover, jellyfish is represented by two tokens ([20919, 7529]) and fish by one token ([3869]) therefore we need to select the appropriate logits for each class.

As the original implementation doesn't provide the training loop or the loss implementation, but does recommend other implementations for training GroundingDino on this issue IDEA-Research/GroundingDINO#241, I took as baseline the implementation from Open-GroundingDino as it supports both visual grounding and object detection and they've trained their own GroundingDino using their code base achieving good performance.

Things added in this PR are:

  • build_label_maps which generates a list of torch.Tensor with lenght batch_size mapping each category to its corresponding tokens based on the input_ids
  • build_text_mask just expand the attention_mask to select the appropriate tokens when computing GroundingDino.loss_labels
  • Added enc_topk_proposals, encoder_logits and encoder_pred_boxes to GroundingDinoModelOutput and GroundingDinoObjectDetectionOutput to compute first stage loss
  • Added class_loss_coefficient (with correct default value) and class_loss_reduction to GroundingDinoConfig. class_loss_reduction was added because in sigmoid_focal_loss from the baseline implementation they reduced loss_ce with a simple sum, but that makes the losses imbalanced most of the time and in the original implementation they do have a sigmoid_focal_loss implemented, but using mean reduction, therefore I made I decided to make it configurable and use the sum one for testing reasons
  • Modifications to GroundingDinoLoss and GroundingDinoHungarianMatcher

Also added a new integration test called test_grounding_dino_loss where I compare the loss obtained from 2 sample images with the baseline implementation from Open-GroundingDino.

c.c. @amyeroberts

@EduardoPach EduardoPach changed the title WIP - [GroundingDino] Fix grounding dino loss [GroundingDino] Fix grounding dino loss Jul 14, 2024
@EduardoPach
Copy link
Contributor Author

@amyeroberts FYI for some reason, when testing locally, test_cross_attention_mask is failing on this branch, but when I tested using the main branch it was also failing (locally)

@EduardoPach
Copy link
Contributor Author

c.c. @amyeroberts

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

Overall looks good. A breaking change, so we should append the PR title with 🚨, but I think this is acceptable as it's aligning ourselves with the recommended loss calculation.

Comment on lines 155 to 157
input_ids = torch.tensor([101, 3869, 1012, 11420, 1012, 1012, 102])
input_ids = input_ids.unsqueeze(0).expand(self.batch_size, -1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why switch to hard coded input ids?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise tests that use labels and, therefore, compute a loss would complain as build_label_maps (here) would return None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you can add a comment to explain this

@SangbumChoi
Copy link
Contributor

@EduardoPach Thanks for the working this loss. Just sharing more well developed code for finetuning GroundingDINO https://github.com/open-mmlab/mmdetection/blob/main/configs/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365.py

@EduardoPach EduardoPach requested a review from amyeroberts July 23, 2024 11:30
@EduardoPach EduardoPach changed the title [GroundingDino] Fix grounding dino loss [GroundingDino] Fix grounding dino loss 🚨 Jul 23, 2024
@EduardoPach
Copy link
Contributor Author

c.c. @amyeroberts

@EduardoPach
Copy link
Contributor Author

Maybe @NielsRogge could have a look?

Copy link
Contributor

@NielsRogge NielsRogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

Comment on lines 155 to 157
input_ids = torch.tensor([101, 3869, 1012, 11420, 1012, 1012, 102])
input_ids = input_ids.unsqueeze(0).expand(self.batch_size, -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you can add a comment to explain this

@EduardoPach
Copy link
Contributor Author

Cough cough, c.c @amyeroberts

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the continued work on this!

Main comments are about the precision and clarity of language in the docstrings and comments. It's important to write messages such that someone new to the code can understand.

Copy link
Contributor

@SangbumChoi SangbumChoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good. Let me check also in this PR #32483 for stable training convergence.

def sigmoid_focal_loss(inputs, targets, num_boxes, alpha: float = 0.25, gamma: float = 2):
# Similar to the one used in `DeformableDetr` but we pass `num_queries`, as `logits` are flattened
# due to masked selection, and support different `reduction` modes.
def sigmoid_focal_loss(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/longzw1997/Open-GroundingDino/blob/main/models/GroundingDINO/utils.py#L138

Even though there are customization in this code, I like the current version of sigmoid_focal_loss 👍🏼

@SangbumChoi
Copy link
Contributor

Screenshot 2024-08-23 at 9 54 40 PM
This is the result of current commit

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@EduardoPach
Copy link
Contributor Author

c.c. @amyeroberts

@@ -741,3 +780,53 @@ def test_cross_attention_mask(self):
self.assertTrue(torch.allclose(outputs1.logits, outputs_batched.logits[:1], atol=1e-3))
# For some reason 12 elements are > 1e-3, but the rest are fine
self.assertTrue(torch.allclose(outputs2.logits, outputs_batched.logits[1:], atol=1.8e-3))

def test_grounding_dino_loss(self):
ds = load_dataset("EduardoPacheco/aquarium-sample", split="train")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move this to huggingface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, wdyt @amyeroberts

Copy link
Member

@qubvel qubvel Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, it is great to have it tested!
Do we actually need to load the whole dataset here? Can't we copy annotations and upload one image somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qubvel by upload one image somewhere you mean to a hf dataset or to the tests fixtures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, no, maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ydshieh can you please help, do we have any place for test assets?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EduardoPach @qubvel I think we can do like this

# We will verify our results on an image of cute cats

Copy link
Collaborator

@ydshieh ydshieh Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. If IIRC, that dataset contains only 2 examples, right?

The easiest way is to have it as

hf-internal-testing/aquarium-sample

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qubvel I added you to hf-internal-testing (you have to accept the invitation) then you can create a copy of the above dataset.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing and all the work iterating on this!

Just two things before we're ready to merge:

  • This convo
  • Slow model tests -- Could you push an empty commit with the message [run_slow] grounding_dino?

@EduardoPach
Copy link
Contributor Author

Had to travel on the weekend I was supposed to finish this 😅 , should probably be able to give the final push this weekend. We can coordinate on discord if you want @SangbumChoi

@qubvel
Copy link
Member

qubvel commented Nov 25, 2024

Hi @EduardoPach, thanks for the update! Is it ready or do you have something to finish?

@EduardoPach
Copy link
Contributor Author

EduardoPach commented Nov 25, 2024

Hi @EduardoPach, thanks for the update! Is it ready or do you have something to finish?

Hey @qubvel, I'll take a final look into the test issues, and it will be ready. I should be able to do that this weekend.

@SangbumChoi
Copy link
Contributor

@EduardoPach Any future plan for working this? Otherwise I can take a look

@EduardoPach
Copy link
Contributor Author

@EduardoPach Any future plan for working this? Otherwise I can take a look

@SangbumChoi probably not in the near future. If you have the bandwidth I'd appreciate that 😄

@SangbumChoi
Copy link
Contributor

SangbumChoi commented Feb 19, 2025

@qubvel @NielsRogge Hi team long time no see. I have finished this PR can you confirm the changes? Failure is unrelated
cc. @EduardoPach

@qubvel
Copy link
Member

qubvel commented Feb 19, 2025

Hi @SangbumChoi! Thanks a lot for pushing this forward! I will review this week 👍

@qubvel qubvel self-requested a review February 25, 2025 11:18
Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @SangbumChoi! I left a few minor comments

@SangbumChoi
Copy link
Contributor

@qubvel After the merge I can continue work #32483 if this seems useful to HF team!

Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing it @SangbumChoi! It would be super helpful for the community to continue working on the fine-tuning example 🤗

@qubvel
Copy link
Member

qubvel commented Feb 27, 2025

run-slow: grounding_dino

Copy link
Contributor

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/grounding_dino']
quantizations: [] ...

@qubvel qubvel merged commit 222505c into huggingface:main Feb 27, 2025
23 of 24 checks passed
@EduardoPach
Copy link
Contributor Author

Amazing! Thanks for the final push @SangbumChoi!

garrett361 pushed a commit to garrett361/transformers that referenced this pull request Mar 4, 2025
* Starting to fix GroundingDinoLoss and GroundingDinoHungarianMatcher

* More updates

* More updates

* fixed: GroundingDinoLoss

* fixed: failing tests

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/grounding_dino/test_modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Addressed comments

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* add: cardinality loss and make box loss as copy from

* change: default for reduction loss is sum

* fix: vectorized generate fake box

* fix copies

* Addressed comments

* addressed comments

* addressed one-hot

* Update tests/models/grounding_dino/test_modeling_grounding_dino.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* Addressed comments

* fixed test

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

* Update tests/models/grounding_dino/test_modeling_grounding_dino.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Starting to fix GroundingDinoLoss and GroundingDinoHungarianMatcher

* More updates

* More updates

* fixed: GroundingDinoLoss

* add: cardinality loss and make box loss as copy from

* fix copies

* Revert "Update tests/models/grounding_dino/test_modeling_grounding_dino.py"

This reverts commit aa74c4c57c430e54cc74c414d6269edb65c73e83.

* [run-slow] groundigdino

* remove nestedtensor

* [run-slow] groundig_dino

* [run-slow] grounding_dino

* [run-slow] grounding_dino

* [run-slow] grounding_dino

* check

* check

* add: enconder intermediate outputs to ImageLoss forward

* add: GroundingDinoForObjectDetectionLoss in the loss directory

* make style

* fix the loss function

* remove class_reduction since it sum is default

* remove class_reduction

* Update src/transformers/loss/loss_grounding_dino.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* simple fix

* Update src/transformers/loss/loss_grounding_dino.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* minor fix

* Update src/transformers/loss/loss_for_object_detection.py

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: sangbumchoi <danielsejong55@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
garrett361 pushed a commit to garrett361/transformers that referenced this pull request Mar 4, 2025
* Starting to fix GroundingDinoLoss and GroundingDinoHungarianMatcher

* More updates

* More updates

* fixed: GroundingDinoLoss

* fixed: failing tests

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/grounding_dino/test_modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Addressed comments

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* add: cardinality loss and make box loss as copy from

* change: default for reduction loss is sum

* fix: vectorized generate fake box

* fix copies

* Addressed comments

* addressed comments

* addressed one-hot

* Update tests/models/grounding_dino/test_modeling_grounding_dino.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* Addressed comments

* fixed test

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

* Update tests/models/grounding_dino/test_modeling_grounding_dino.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Starting to fix GroundingDinoLoss and GroundingDinoHungarianMatcher

* More updates

* More updates

* fixed: GroundingDinoLoss

* add: cardinality loss and make box loss as copy from

* fix copies

* Revert "Update tests/models/grounding_dino/test_modeling_grounding_dino.py"

This reverts commit aa74c4c57c430e54cc74c414d6269edb65c73e83.

* [run-slow] groundigdino

* remove nestedtensor

* [run-slow] groundig_dino

* [run-slow] grounding_dino

* [run-slow] grounding_dino

* [run-slow] grounding_dino

* check

* check

* add: enconder intermediate outputs to ImageLoss forward

* add: GroundingDinoForObjectDetectionLoss in the loss directory

* make style

* fix the loss function

* remove class_reduction since it sum is default

* remove class_reduction

* Update src/transformers/loss/loss_grounding_dino.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* simple fix

* Update src/transformers/loss/loss_grounding_dino.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* minor fix

* Update src/transformers/loss/loss_for_object_detection.py

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: sangbumchoi <danielsejong55@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GroundingDino - Loss calculation exceptions
9 participants