-
Notifications
You must be signed in to change notification settings - Fork 29.8k
[GroundingDino] Fix grounding dino loss 🚨 #31828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GroundingDino] Fix grounding dino loss 🚨 #31828
Conversation
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
@amyeroberts FYI for some reason, when testing locally, |
c.c. @amyeroberts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
Overall looks good. A breaking change, so we should append the PR title with 🚨, but I think this is acceptable as it's aligning ourselves with the recommended loss calculation.
input_ids = torch.tensor([101, 3869, 1012, 11420, 1012, 1012, 102]) | ||
input_ids = input_ids.unsqueeze(0).expand(self.batch_size, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why switch to hard coded input ids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise tests that use labels
and, therefore, compute a loss would complain as build_label_maps
(here) would return None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you can add a comment to explain this
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
@EduardoPach Thanks for the working this loss. Just sharing more well developed code for finetuning GroundingDINO https://github.com/open-mmlab/mmdetection/blob/main/configs/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365.py |
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
c.c. @amyeroberts |
Maybe @NielsRogge could have a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
src/transformers/models/grounding_dino/configuration_grounding_dino.py
Outdated
Show resolved
Hide resolved
input_ids = torch.tensor([101, 3869, 1012, 11420, 1012, 1012, 102]) | ||
input_ids = input_ids.unsqueeze(0).expand(self.batch_size, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you can add a comment to explain this
Cough cough, c.c @amyeroberts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the continued work on this!
Main comments are about the precision and clarity of language in the docstrings and comments. It's important to write messages such that someone new to the code can understand.
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good. Let me check also in this PR #32483 for stable training convergence.
def sigmoid_focal_loss(inputs, targets, num_boxes, alpha: float = 0.25, gamma: float = 2): | ||
# Similar to the one used in `DeformableDetr` but we pass `num_queries`, as `logits` are flattened | ||
# due to masked selection, and support different `reduction` modes. | ||
def sigmoid_focal_loss( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/longzw1997/Open-GroundingDino/blob/main/models/GroundingDINO/utils.py#L138
Even though there are customization in this code, I like the current version of sigmoid_focal_loss
👍🏼
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
c.c. @amyeroberts |
@@ -741,3 +780,53 @@ def test_cross_attention_mask(self): | |||
self.assertTrue(torch.allclose(outputs1.logits, outputs_batched.logits[:1], atol=1e-3)) | |||
# For some reason 12 elements are > 1e-3, but the rest are fine | |||
self.assertTrue(torch.allclose(outputs2.logits, outputs_batched.logits[1:], atol=1.8e-3)) | |||
|
|||
def test_grounding_dino_loss(self): | |||
ds = load_dataset("EduardoPacheco/aquarium-sample", split="train") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should move this to huggingface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, wdyt @amyeroberts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, it is great to have it tested!
Do we actually need to load the whole dataset here? Can't we copy annotations and upload one image somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qubvel by upload one image somewhere you mean to a hf dataset or to the tests fixtures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, no, maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ydshieh can you please help, do we have any place for test assets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EduardoPach @qubvel I think we can do like this
# We will verify our results on an image of cute cats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. If IIRC, that dataset contains only 2 examples, right?
The easiest way is to have it as
hf-internal-testing/aquarium-sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qubvel I added you to hf-internal-testing
(you have to accept the invitation) then you can create a copy of the above dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing and all the work iterating on this!
Just two things before we're ready to merge:
- This convo
- Slow model tests -- Could you push an empty commit with the message
[run_slow] grounding_dino
?
Had to travel on the weekend I was supposed to finish this 😅 , should probably be able to give the final push this weekend. We can coordinate on discord if you want @SangbumChoi |
Hi @EduardoPach, thanks for the update! Is it ready or do you have something to finish? |
Hey @qubvel, I'll take a final look into the test issues, and it will be ready. I should be able to do that this weekend. |
@EduardoPach Any future plan for working this? Otherwise I can take a look |
@SangbumChoi probably not in the near future. If you have the bandwidth I'd appreciate that 😄 |
@qubvel @NielsRogge Hi team long time no see. I have finished this PR can you confirm the changes? Failure is unrelated |
Hi @SangbumChoi! Thanks a lot for pushing this forward! I will review this week 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @SangbumChoi! I left a few minor comments
src/transformers/models/grounding_dino/modeling_grounding_dino.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
…ach/transformers into eduardo/fix-groundingdino-loss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing it @SangbumChoi! It would be super helpful for the community to continue working on the fine-tuning example 🤗
run-slow: grounding_dino |
This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs: models: ['models/grounding_dino'] |
Amazing! Thanks for the final push @SangbumChoi! |
* Starting to fix GroundingDinoLoss and GroundingDinoHungarianMatcher * More updates * More updates * fixed: GroundingDinoLoss * fixed: failing tests * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/grounding_dino/test_modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Addressed comments * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * add: cardinality loss and make box loss as copy from * change: default for reduction loss is sum * fix: vectorized generate fake box * fix copies * Addressed comments * addressed comments * addressed one-hot * Update tests/models/grounding_dino/test_modeling_grounding_dino.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * Addressed comments * fixed test * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py * Update tests/models/grounding_dino/test_modeling_grounding_dino.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Starting to fix GroundingDinoLoss and GroundingDinoHungarianMatcher * More updates * More updates * fixed: GroundingDinoLoss * add: cardinality loss and make box loss as copy from * fix copies * Revert "Update tests/models/grounding_dino/test_modeling_grounding_dino.py" This reverts commit aa74c4c57c430e54cc74c414d6269edb65c73e83. * [run-slow] groundigdino * remove nestedtensor * [run-slow] groundig_dino * [run-slow] grounding_dino * [run-slow] grounding_dino * [run-slow] grounding_dino * check * check * add: enconder intermediate outputs to ImageLoss forward * add: GroundingDinoForObjectDetectionLoss in the loss directory * make style * fix the loss function * remove class_reduction since it sum is default * remove class_reduction * Update src/transformers/loss/loss_grounding_dino.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * simple fix * Update src/transformers/loss/loss_grounding_dino.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * minor fix * Update src/transformers/loss/loss_for_object_detection.py --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> Co-authored-by: sangbumchoi <danielsejong55@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Starting to fix GroundingDinoLoss and GroundingDinoHungarianMatcher * More updates * More updates * fixed: GroundingDinoLoss * fixed: failing tests * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/grounding_dino/test_modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Addressed comments * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * add: cardinality loss and make box loss as copy from * change: default for reduction loss is sum * fix: vectorized generate fake box * fix copies * Addressed comments * addressed comments * addressed one-hot * Update tests/models/grounding_dino/test_modeling_grounding_dino.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * Addressed comments * fixed test * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py * Update tests/models/grounding_dino/test_modeling_grounding_dino.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Starting to fix GroundingDinoLoss and GroundingDinoHungarianMatcher * More updates * More updates * fixed: GroundingDinoLoss * add: cardinality loss and make box loss as copy from * fix copies * Revert "Update tests/models/grounding_dino/test_modeling_grounding_dino.py" This reverts commit aa74c4c57c430e54cc74c414d6269edb65c73e83. * [run-slow] groundigdino * remove nestedtensor * [run-slow] groundig_dino * [run-slow] grounding_dino * [run-slow] grounding_dino * [run-slow] grounding_dino * check * check * add: enconder intermediate outputs to ImageLoss forward * add: GroundingDinoForObjectDetectionLoss in the loss directory * make style * fix the loss function * remove class_reduction since it sum is default * remove class_reduction * Update src/transformers/loss/loss_grounding_dino.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * simple fix * Update src/transformers/loss/loss_grounding_dino.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * minor fix * Update src/transformers/loss/loss_for_object_detection.py --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> Co-authored-by: sangbumchoi <danielsejong55@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
What does this PR do?
Fixes #31434
As the original repo doesn't provide the loss implementation I'm using the one implemented here as a baseline since it was mentioned by the original repo, on this issue IDEA-Research/GroundingDINO#241, as a reliable source if one wants to train a
GroundingDino
modelTODO:
GroundingDinoMatcher
andGroundingDinoLoss
are working properlyExplanation of the Issue and Solution
So the issue was that
GroundingDinoLoss
andGroundingDinoHungarianMatcher
were just a copy fromDeformableDetr
which is used for closed-set object detection (i.e. a fixed set of categories). Whereas inGroundingDino
there's no limited amount of categories and the output logits ared_model
dimensional where the firstseq_len
elements have a specified value and the subsequent arenan
. The main differences are:class_labels
are associated with the text prompt usedFor instance if an image with bounding boxes with fishes and jellyfishes using a prompt
"fish. jellyfish."
fish should haveclass_label
0 assigned to it and jellyfish should have 1 assigned. If the position of jellyfish and fish in the prompt swapped then theclass_label
s would swap as well. Moreover, jellyfish is represented by two tokens ([20919, 7529]
) and fish by one token ([3869]
) therefore we need to select the appropriate logits for each class.As the original implementation doesn't provide the training loop or the loss implementation, but does recommend other implementations for training
GroundingDino
on this issue IDEA-Research/GroundingDINO#241, I took as baseline the implementation from Open-GroundingDino as it supports both visual grounding and object detection and they've trained their ownGroundingDino
using their code base achieving good performance.Things added in this PR are:
build_label_maps
which generates a list oftorch.Tensor
with lenghtbatch_size
mapping each category to its corresponding tokens based on theinput_ids
build_text_mask
just expand theattention_mask
to select the appropriate tokens when computingGroundingDino.loss_labels
enc_topk_proposals
,encoder_logits
andencoder_pred_boxes
toGroundingDinoModelOutput
andGroundingDinoObjectDetectionOutput
to compute first stage lossclass_loss_coefficient
(with correct default value) andclass_loss_reduction
toGroundingDinoConfig
.class_loss_reduction
was added because insigmoid_focal_loss
from the baseline implementation they reducedloss_ce
with a simple sum, but that makes the losses imbalanced most of the time and in the original implementation they do have asigmoid_focal_loss
implemented, but usingmean
reduction, therefore I made I decided to make it configurable and use thesum
one for testing reasonsGroundingDinoLoss
andGroundingDinoHungarianMatcher
Also added a new integration test called
test_grounding_dino_loss
where I compare the loss obtained from 2 sample images with the baseline implementation fromOpen-GroundingDino
.c.c. @amyeroberts