Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding grounding dino #26087

Merged
merged 274 commits into from
Apr 11, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
274 commits
Select commit Hold shift + click to select a range
6e37211
Fixed typo when converting weigths to GroundingDINO vision backbone
EduardoPach Sep 18, 2023
0db05e0
Final modifications on modeling
EduardoPach Sep 20, 2023
a1eba2e
Removed unnecessary class
EduardoPach Sep 20, 2023
9cf7c3a
Fixed convert structure
EduardoPach Sep 20, 2023
9c55b24
Added image processing
EduardoPach Sep 24, 2023
ae570bb
make fixup partially completed
EduardoPach Sep 24, 2023
1f6475f
Now text_backbone_config has its own class
EduardoPach Oct 6, 2023
d763e04
Modified convert script
EduardoPach Oct 6, 2023
04022d4
Removed unnecessary config attribute
EduardoPach Oct 6, 2023
938f805
Added new function to generate sub sentence mask
EduardoPach Oct 13, 2023
6f08b04
Renamed parameters with gamma in the name as it's currently not allowed
EduardoPach Oct 13, 2023
7666253
Removed tokenization and image_processing scripts since we'll map fro…
EduardoPach Oct 13, 2023
046e0c5
Fixed some issues with configuration
EduardoPach Oct 13, 2023
70b248d
Just some modifications on conversion script
EduardoPach Oct 13, 2023
3bc92b7
Other modifications
EduardoPach Oct 13, 2023
4cae0ca
Copied deformable detr
EduardoPach Aug 22, 2023
149b462
First commit
EduardoPach Aug 23, 2023
92c31bf
Added bert to model
EduardoPach Aug 27, 2023
8f0a755
Bert validated
EduardoPach Aug 30, 2023
fb1c55c
Created Text and Fusion layers for Encoder
EduardoPach Aug 31, 2023
86131af
Adapted Encoder layer
EduardoPach Aug 31, 2023
8ad3226
Fixed typos
EduardoPach Sep 1, 2023
21e3fa2
Adjusted Encoder
EduardoPach Sep 4, 2023
5ddfa38
Converted encoder to hf
EduardoPach Sep 4, 2023
0512f7a
Modified Decoder Layer
EduardoPach Sep 5, 2023
d2cd35f
Modified main decoder class
EduardoPach Sep 6, 2023
cb2ad7f
Removed copy comments
EduardoPach Sep 6, 2023
eaf958d
Fixed forward from GroundingDINOModel and GroundingDINODecoder
EduardoPach Sep 11, 2023
88d07b3
Added all necessary layers, configurations and forward logic up to Gr…
EduardoPach Sep 12, 2023
f17bd3d
Added all layers to convertion
EduardoPach Sep 12, 2023
dcd1990
Fixed outputs for GroundingDINOModel and GroundingDINOForObjectDetection
EduardoPach Sep 12, 2023
39a161c
Fixed mask input to encoders and fixed nn.MultiheadAttention batch fi…
EduardoPach Sep 13, 2023
5ec72fb
Fixed forward from GroundingDINOTextEnhancerLayer
EduardoPach Sep 13, 2023
086f68a
Fixed output bug with GroundingDINODeformableLayer
EduardoPach Sep 13, 2023
f75cda2
Fixed bugs that prevent GroundingDINOForObjectDetection to run forwar…
EduardoPach Sep 15, 2023
8dbed3d
Fixed attentions to be passed correctly
EduardoPach Sep 18, 2023
a2af172
Passing temperature arg when creating Sine position embedding
EduardoPach Sep 18, 2023
759fc14
Removed copy comments
EduardoPach Sep 18, 2023
5196373
Added temperature argument for position embedding
EduardoPach Sep 18, 2023
900cff4
Fixed typo when converting weigths to GroundingDINO vision backbone
EduardoPach Sep 18, 2023
f23a54a
Final modifications on modeling
EduardoPach Sep 20, 2023
3090b2c
Removed unnecessary class
EduardoPach Sep 20, 2023
5c19e75
Fixed convert structure
EduardoPach Sep 20, 2023
aec2f68
Added image processing
EduardoPach Sep 24, 2023
b7a79cd
make fixup partially completed
EduardoPach Sep 24, 2023
685f1d6
Now text_backbone_config has its own class
EduardoPach Oct 6, 2023
d6e88fc
Modified convert script
EduardoPach Oct 6, 2023
0242e57
Removed unnecessary config attribute
EduardoPach Oct 6, 2023
af06c85
Added new function to generate sub sentence mask
EduardoPach Oct 13, 2023
43c0ce5
Renamed parameters with gamma in the name as it's currently not allowed
EduardoPach Oct 13, 2023
2bb7b70
Removed tokenization and image_processing scripts since we'll map fro…
EduardoPach Oct 13, 2023
98f3840
Fixed some issues with configuration
EduardoPach Oct 13, 2023
703eeff
Just some modifications on conversion script
EduardoPach Oct 13, 2023
c1c1467
Other modifications
EduardoPach Oct 13, 2023
bfb8829
Fix style
NielsRogge Oct 14, 2023
587589e
Improve fixup
NielsRogge Oct 14, 2023
f683611
Improve conversion script
NielsRogge Oct 14, 2023
3a0c742
Improve conversion script
NielsRogge Oct 14, 2023
6115547
Add GroundingDINOProcessor
NielsRogge Oct 14, 2023
cc1788f
More improvements
NielsRogge Oct 14, 2023
a6dea4a
Return token type ids
NielsRogge Oct 14, 2023
ae6e110
something
EduardoPach Oct 14, 2023
9fba8c2
Fix more tests
NielsRogge Oct 15, 2023
684a0bb
More improvements
NielsRogge Oct 15, 2023
3b2d576
More cleanup
NielsRogge Oct 15, 2023
88e5d02
More improvements
NielsRogge Oct 15, 2023
55390d1
Merge branch 'adding-grounding-dino' of https://github.com/EduardoPac…
EduardoPach Oct 16, 2023
8bae1bd
Fixed tests, improved modeling and config
EduardoPach Oct 16, 2023
f343f78
More improvements and fixing tests
EduardoPach Oct 17, 2023
033d903
Improved tests and modeling
EduardoPach Oct 18, 2023
baed29a
Improved tests and added image processor
EduardoPach Oct 21, 2023
50c5f67
Improved tests inference
EduardoPach Oct 22, 2023
d2922e1
More improvements
EduardoPach Oct 23, 2023
891c34d
More test improvements
EduardoPach Oct 26, 2023
eccaec9
Fixed last test
EduardoPach Oct 26, 2023
f32be01
Improved docstrings and comments
EduardoPach Oct 26, 2023
1c657e2
Fix style
NielsRogge Oct 27, 2023
1202ce8
Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
EduardoPach Nov 1, 2023
d62dd11
Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
EduardoPach Nov 1, 2023
bbf873b
Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
EduardoPach Nov 1, 2023
c69b8a2
Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
EduardoPach Nov 1, 2023
274752c
Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
EduardoPach Nov 1, 2023
91373e0
Better naming
EduardoPach Nov 1, 2023
4945883
Better naming
EduardoPach Nov 1, 2023
5882f5f
Added Copied statement
EduardoPach Nov 1, 2023
c96a1a1
Added Copied statement
EduardoPach Nov 1, 2023
558ad87
Moved param init from GroundingDINOBiMultiHeadAttention
EduardoPach Nov 1, 2023
5c32bdc
Better naming
EduardoPach Nov 1, 2023
c561087
Fixing clamp style
EduardoPach Nov 1, 2023
07d4c62
Better naming
EduardoPach Nov 2, 2023
ba37183
Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
EduardoPach Nov 2, 2023
c746e1d
Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
EduardoPach Nov 2, 2023
07b260d
Update src/transformers/models/grounding_dino/configuration_grounding…
EduardoPach Nov 2, 2023
898e072
Update src/transformers/models/grounding_dino/convert_grounding_dino_…
EduardoPach Nov 2, 2023
34b36a3
Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
EduardoPach Nov 2, 2023
e14d6ae
Improving conversion script
EduardoPach Nov 2, 2023
f867e50
Improved config
EduardoPach Nov 2, 2023
fc105be
Improved naming
EduardoPach Nov 2, 2023
ed1176e
Improved naming again
EduardoPach Nov 2, 2023
ef5c90f
Improved grouding-dino.md
EduardoPach Nov 2, 2023
b2fd868
Moved grounding dino to multimodal
EduardoPach Nov 2, 2023
c23497c
Update src/transformers/models/grounding_dino/convert_grounding_dino_…
EduardoPach Nov 3, 2023
a729a38
Fixed docstrings and style
EduardoPach Nov 3, 2023
aafcc34
Fix docstrings
NielsRogge Nov 13, 2023
e4bad9b
Remove timm attributes
NielsRogge Nov 13, 2023
e48d411
Reorder imports
NielsRogge Nov 13, 2023
a7f026f
More improvements
NielsRogge Nov 13, 2023
1930b2a
Add Grounding DINO to pipeline
NielsRogge Nov 13, 2023
6ac265c
Remove model from check_repo
NielsRogge Nov 13, 2023
93b8609
Added grounded post_process to GroundingDINOProcessor
EduardoPach Nov 14, 2023
6461389
Fixed style
EduardoPach Nov 14, 2023
e35f1c9
Fixed GroundingDINOTextPrenetConfig docstrings
EduardoPach Nov 14, 2023
695ffa5
Aligned inputs.keys() when both image and text are passed with model_…
EduardoPach Nov 16, 2023
7d16d7f
Added tests for GroundingDINOImageProcessor and GroundingDINOProcessor
EduardoPach Nov 16, 2023
98321e3
Testing post_process_grounded_object_detection from GroundingDINOProc…
EduardoPach Nov 16, 2023
3da62df
Fixed order
EduardoPach Nov 18, 2023
6be9a68
Marked test with require_torch
EduardoPach Nov 18, 2023
cc1ee60
Temporarily changed repo_id
EduardoPach Nov 18, 2023
8cf167e
More improvements
EduardoPach Nov 18, 2023
27edb8e
Merge remote-tracking branch 'upstream/main' into adding-grounding-dino
EduardoPach Nov 20, 2023
2927c13
Fix style
NielsRogge Nov 20, 2023
42ee6bc
Final improvements
EduardoPach Nov 20, 2023
85acfbc
Merge branch 'adding-grounding-dino' of https://github.com/EduardoPac…
EduardoPach Nov 20, 2023
e2b48b0
Improve annotators
NielsRogge Nov 23, 2023
5e1f0d9
Fix style
NielsRogge Nov 23, 2023
c9a8440
Add is_torch_available
NielsRogge Nov 23, 2023
f954f4b
Remove type hints
NielsRogge Nov 23, 2023
2eb2a98
vocab_tokens as one liner
EduardoPach Dec 8, 2023
625123a
Removed print statements
EduardoPach Dec 8, 2023
4553ad1
Renamed GroundingDINOTextPrenetConfig to GroundingDINOTextConfig
EduardoPach Dec 8, 2023
3b6b2c2
remove unnecessary comments
EduardoPach Dec 8, 2023
afb2649
Removed unnecessary tests on conversion script
EduardoPach Dec 8, 2023
4fdaf42
Renamed GroundingDINO to camel case GroundingDino
EduardoPach Dec 8, 2023
559de31
Fixed GroundingDinoProcessor docstrings
EduardoPach Dec 8, 2023
fef983e
loading MSDA kernels in the modeling file
EduardoPach Dec 8, 2023
fbf82be
Fix merge
NielsRogge Dec 11, 2023
9994ee0
Fix copies
NielsRogge Dec 11, 2023
14c839d
Replace nn.multiheadattention
NielsRogge Jan 31, 2024
5a6f258
Replace nn.multiheadattention
NielsRogge Feb 1, 2024
9fa83da
Fixed inputs for GroundingDinoMultiheadAttention & order of modules
Feb 4, 2024
06ba0ec
Fixed processing to avoid messing with inputs
Feb 4, 2024
9cda12e
Added more tips for GroundingDino
Feb 4, 2024
bde2c6a
Make style
Feb 4, 2024
01c382e
Chaning name to align with SAM
Feb 4, 2024
5d1f0e7
Replace final nn.multiheadattention
NielsRogge Feb 4, 2024
339915f
Fix model tests
NielsRogge Feb 4, 2024
1bb4886
Update year, remove GenerationTesterMixin
NielsRogge Feb 4, 2024
4bb58d3
Address comments
NielsRogge Feb 4, 2024
2c5d4ea
Address more comments
NielsRogge Feb 4, 2024
f21162c
Rename TextPrenet to TextModel
NielsRogge Feb 4, 2024
48f1734
Rename hidden_states
NielsRogge Feb 4, 2024
d3f45c3
Address more comments
NielsRogge Feb 4, 2024
3134d39
Address more comments
NielsRogge Feb 4, 2024
1485264
Address comment
NielsRogge Feb 4, 2024
c918fca
Merge branch 'adding-grounding-dino' of https://github.com/EduardoPac…
Feb 4, 2024
fc2251e
Merge branch 'adding-grounding-dino' of https://github.com/EduardoPac…
Feb 4, 2024
36c64be
Address more comments
NielsRogge Feb 4, 2024
8f338dd
Address merge
NielsRogge Feb 4, 2024
a46c4f0
Address comment
NielsRogge Feb 5, 2024
a8a6bea
Address comment
NielsRogge Feb 5, 2024
28686ec
Address comment
NielsRogge Feb 5, 2024
a3330ac
Make style
Feb 5, 2024
e9a45cb
Merge branch 'adding-grounding-dino' of https://github.com/EduardoPac…
Feb 5, 2024
21a1b4b
Added layer norm eps to layer norms
Feb 5, 2024
7292639
Address more comments
NielsRogge Feb 5, 2024
6e51931
More fixes
Feb 5, 2024
d5481bb
Fixed equivalence
Feb 5, 2024
1fcf142
Merge remote-tracking branch 'upstream/main' into adding-grounding-dino
NielsRogge Feb 5, 2024
098a59d
Make fixup
NielsRogge Feb 5, 2024
e005007
Remove print statements
NielsRogge Feb 5, 2024
daa29dc
Address comments
NielsRogge Feb 5, 2024
7d4c763
Address comments
NielsRogge Feb 7, 2024
f52dd2d
Address comments
NielsRogge Feb 7, 2024
afb5c6e
Address comments
NielsRogge Feb 8, 2024
4a88014
Address comments
NielsRogge Feb 8, 2024
34e37b4
Address comments
NielsRogge Feb 9, 2024
4854862
Add comment
NielsRogge Feb 9, 2024
c9fcadd
Address comment
NielsRogge Feb 9, 2024
580ce27
Fix merge
NielsRogge Feb 9, 2024
6366302
Remove overwriting of test
NielsRogge Feb 10, 2024
b5b1f1b
Fix bbox_embed
NielsRogge Feb 10, 2024
9faa6b4
Improve decoder_bbox_embed_share
NielsRogge Feb 10, 2024
e7761f7
Simplify outputs
NielsRogge Feb 10, 2024
09ae5c1
Updated post_process_grounded_object_detection
Feb 10, 2024
6036be6
Merge remote-tracking branch 'upstream/main' into adding-grounding-dino
Feb 12, 2024
9700e22
Renamed sources to feature_maps
Feb 13, 2024
66ebb6d
Improved tests for Grounding Dino ImageProcessor and Processor
Feb 15, 2024
1f4ffae
Merge remote-tracking branch 'upstream/main' into adding-grounding-dino
Feb 15, 2024
b27e7fb
Fixed test requirements and imports
Feb 15, 2024
17387df
Fixed image_processing
Feb 15, 2024
eed03aa
Fixed processor tests
Feb 15, 2024
3e8772a
Fixed imports for image processing tests
Feb 15, 2024
1682c0a
Merge remote-tracking branch 'upstream/main' into adding-grounding-dino
Feb 15, 2024
c549574
Fix copies
Feb 15, 2024
fdf7e82
Merge remote-tracking branch 'upstream/main' into adding-grounding-dino
EduardoPach Feb 17, 2024
126dd83
Updated modeling
EduardoPach Feb 17, 2024
d24335b
Fix style
EduardoPach Feb 17, 2024
7d6bd5b
Merge remote-tracking branch 'upstream/main' into adding-grounding-dino
EduardoPach Feb 21, 2024
0b8f4e8
Moved functions to correct position
EduardoPach Feb 21, 2024
eafd39f
Fixed copy issues
EduardoPach Feb 21, 2024
fb1e202
Update src/transformers/models/deformable_detr/modeling_deformable_de…
EduardoPach Feb 24, 2024
516de4a
Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
EduardoPach Feb 24, 2024
0ae3c5d
Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
EduardoPach Feb 24, 2024
7255f13
Keeping consistency custom cuda kernels for MSDA
EduardoPach Feb 24, 2024
2fb611a
Make GroundingDinoProcessor logic clearer
EduardoPach Feb 24, 2024
83004b7
Merge remote-tracking branch 'upstream/main' into adding-grounding-dino
EduardoPach Feb 24, 2024
c7a4ef0
Updated Grounding DINO checkpoints
EduardoPach Feb 24, 2024
baa1959
Merge remote-tracking branch 'upstream/main' into adding-grounding-dino
EduardoPach Feb 26, 2024
03137fd
Changed tests to correct structure
EduardoPach Mar 1, 2024
3ee2d78
Updated gpu-cpu equivalence test
EduardoPach Mar 4, 2024
8361ffc
Merge remote-tracking branch 'upstream/main' into adding-grounding-dino
EduardoPach Mar 4, 2024
fcfad83
fix copies
EduardoPach Mar 4, 2024
ed7a71e
Merge remote-tracking branch 'upstream/main' into adding-grounding-dino
EduardoPach Mar 5, 2024
fe7cd12
Update src/transformers/models/grounding_dino/processing_grounding_di…
EduardoPach Mar 9, 2024
ebf136f
Update src/transformers/models/grounding_dino/processing_grounding_di…
EduardoPach Mar 9, 2024
8728db6
Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
EduardoPach Mar 9, 2024
1be93d6
Update src/transformers/models/grounding_dino/configuration_grounding…
EduardoPach Mar 9, 2024
0c99ac7
Fixed erros and style
EduardoPach Mar 10, 2024
538e88f
Fix copies
EduardoPach Mar 10, 2024
18d3d63
Removed inheritance from PreTrainedModel from GroundingDinoTextModel
EduardoPach Mar 10, 2024
b4735d5
Fixed GroundingDinoTextModel
EduardoPach Mar 10, 2024
1cf5cf0
Fixed type of default backbone config
EduardoPach Mar 10, 2024
88c0467
Fixed missing methods for GroundingDinoTextModel and Added timm suppo…
EduardoPach Mar 10, 2024
2d95044
Addressed comments
EduardoPach Mar 10, 2024
710c1be
Addressed batched image processing tests
EduardoPach Mar 10, 2024
06a59b2
Addressed zero shot test comment
EduardoPach Mar 10, 2024
2de4e15
Addressed tip comment
EduardoPach Mar 10, 2024
0780569
Merge remote-tracking branch 'upstream/main' into adding-grounding-dino
EduardoPach Mar 11, 2024
4b9c9ad
Removed GroundingDinoTextModel from check_repo
EduardoPach Mar 22, 2024
4df56a4
Removed inplace masking
EduardoPach Mar 22, 2024
0e0ae3c
Addressed comments
EduardoPach Mar 22, 2024
e8222f3
Addressed comments
EduardoPach Mar 22, 2024
6cab49a
Addressed comments
EduardoPach Mar 22, 2024
8012f13
Merge remote-tracking branch 'upstream/main' into adding-grounding-dino
EduardoPach Mar 22, 2024
37b272f
Fix copies
EduardoPach Mar 22, 2024
d6966ce
Fixing timm test
EduardoPach Mar 23, 2024
1a94461
Fixed batching equivalence test
EduardoPach Mar 23, 2024
a584f65
Update docs/source/en/model_doc/grounding-dino.md
EduardoPach Mar 24, 2024
a9dfee3
Update docs/source/en/model_doc/grounding-dino.md
EduardoPach Mar 24, 2024
6f13fbb
Update docs/source/en/model_doc/grounding-dino.md
EduardoPach Mar 24, 2024
2f845b0
Merge remote-tracking branch 'upstream/main' into adding-grounding-dino
EduardoPach Apr 4, 2024
a1e9ff0
Addressed more comments
EduardoPach Apr 8, 2024
38a2e97
Added a new comment
EduardoPach Apr 9, 2024
e9633b4
Reduced image size
EduardoPach Apr 9, 2024
89e070f
Addressed more comments
EduardoPach Apr 9, 2024
a961ab7
Nits
EduardoPach Apr 10, 2024
6c2a617
Merge remote-tracking branch 'upstream/main' into adding-grounding-dino
EduardoPach Apr 10, 2024
f945c7a
Nits
EduardoPach Apr 10, 2024
b0891ca
Changed the way text_config is initialized
EduardoPach Apr 10, 2024
c630a9c
Update src/transformers/models/grounding_dino/processing_grounding_di…
EduardoPach Apr 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 17 additions & 27 deletions src/transformers/models/grounding_dino/modeling_grounding_dino.py
Original file line number Diff line number Diff line change
Expand Up @@ -375,17 +375,6 @@ class GroundingDinoObjectDetectionOutput(ModelOutput):
enc_outputs_coord_logits: Optional[torch.FloatTensor] = None


def _get_clones(module, num_copies):
return nn.ModuleList([copy.deepcopy(module) for i in range(num_copies)])


def inverse_sigmoid(x, eps=1e-5):
x = x.clamp(min=0, max=1)
x1 = x.clamp(min=eps)
x2 = (1 - x).clamp(min=eps)
return torch.log(x1 / x2)


# Copied from transformers.models.detr.modeling_detr.DetrFrozenBatchNorm2d with Detr->GroundingDino
class GroundingDinoFrozenBatchNorm2d(nn.Module):
"""
Expand Down Expand Up @@ -516,10 +505,10 @@ class GroundingDinoSinePositionEmbedding(nn.Module):
need paper, generalized to work on images.
"""

def __init__(self, embedding_dim=64, temperature=10000):
def __init__(self, config):
super().__init__()
self.embedding_dim = embedding_dim
self.temperature = temperature
self.embedding_dim = config.d_model // 2
self.temperature = config.positional_embedding_temperature
self.scale = 2 * math.pi

def forward(self, pixel_values, pixel_mask):
Expand All @@ -540,14 +529,15 @@ def forward(self, pixel_values, pixel_mask):
return pos


# Copied from transformers.models.detr.modeling_detr.DetrLearnedPositionEmbedding
class GroundingDinoLearnedPositionEmbedding(nn.Module):
"""
This module learns positional embeddings up to a fixed maximum size.
"""

def __init__(self, embedding_dim=256):
def __init__(self, config):
super().__init__()

embedding_dim = config.d_model // 2
self.row_embeddings = nn.Embedding(50, embedding_dim)
self.column_embeddings = nn.Embedding(50, embedding_dim)

Expand All @@ -565,12 +555,10 @@ def forward(self, pixel_values, pixel_mask=None):


def build_position_encoding(config):
NielsRogge marked this conversation as resolved.
Show resolved Hide resolved
n_steps = config.d_model // 2
if config.position_embedding_type == "sine":
# TODO find a better way of exposing other arguments
position_embedding = GroundingDinoSinePositionEmbedding(n_steps, config.positional_embedding_temperature)
position_embedding = GroundingDinoSinePositionEmbedding(config)
elif config.position_embedding_type == "learned":
position_embedding = GroundingDinoLearnedPositionEmbedding(n_steps)
position_embedding = GroundingDinoLearnedPositionEmbedding(config)
else:
raise ValueError(f"Not supported {config.position_embedding_type}")

Expand Down Expand Up @@ -1735,7 +1723,7 @@ def __init__(self, config: GroundingDinoConfig):
@staticmethod
def get_reference_points(spatial_shapes, valid_ratios, device):
"""
Get reference points for each feature map. Used in decoder.
Get reference points for each feature map.

Args:
spatial_shapes (`torch.LongTensor` of shape `(num_feature_levels, 2)`):
Expand Down Expand Up @@ -1932,10 +1920,11 @@ def get_proposal_pos_embed(self, proposals: torch.FloatTensor) -> torch.FloatTen
pos_x = torch.stack((pos_x[:, :, 0::2].sin(), pos_x[:, :, 1::2].cos()), dim=3).flatten(2)
pos_y = torch.stack((pos_y[:, :, 0::2].sin(), pos_y[:, :, 1::2].cos()), dim=3).flatten(2)

if proposals.size(-1) == 2:
num_coordinates = proposals.size(-1)
if num_coordinates == 2:
# batch_size, num_queries, num_pos_feats * 2
pos = torch.cat((pos_y, pos_x), dim=2)
elif proposals.size(-1) == 4:
elif num_coordinates == 4:
w_embed = proposals[:, :, 2] * scale
pos_w = w_embed[:, :, None] / dim_t
# batch_size, num_queries, num_pos_feats
Expand Down Expand Up @@ -2083,15 +2072,15 @@ def custom_forward(*inputs):
if self.bbox_embed is not None:
tmp = self.bbox_embed[idx](hidden_states)
if reference_points.shape[-1] == 4:
new_reference_points = tmp + inverse_sigmoid(reference_points)
new_reference_points = tmp + torch.special.logit(reference_points, eps=1e-5)
new_reference_points = new_reference_points.sigmoid()
else:
if reference_points.shape[-1] != 2:
raise ValueError(
f"Reference points' last dimension must be of size 2, but is {reference_points.shape[-1]}"
)
new_reference_points = tmp
new_reference_points[..., :2] = tmp[..., :2] + inverse_sigmoid(reference_points)
new_reference_points[..., :2] = tmp[..., :2] + torch.special.logit(reference_points, eps=1e-5)
new_reference_points = new_reference_points.sigmoid()
reference_points = new_reference_points.detach()

Expand Down Expand Up @@ -2140,6 +2129,7 @@ def custom_forward(*inputs):
)


# these correspond to [CLS], [SEP], . and ?
SPECIAL_TOKENS = [101, 102, 1012, 1029]
NielsRogge marked this conversation as resolved.
Show resolved Hide resolved


Expand Down Expand Up @@ -2621,7 +2611,7 @@ def __init__(self, config: GroundingDinoConfig):
if config.decoder_bbox_embed_share:
self.bbox_embed = nn.ModuleList([_bbox_embed for _ in range(config.decoder_layers)])
else:
self.bbox_embed = _get_clones(_bbox_embed, config.decoder_layers)
self.bbox_embed = nn.ModuleList(_bbox_embed, config.decoder_layers)
NielsRogge marked this conversation as resolved.
Show resolved Hide resolved
self.class_embed = nn.ModuleList([_class_embed for _ in range(config.decoder_layers)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty dirty and suboptimal. If we're "sharing" then we're creating an unnecessary memory burden: we can just defined the class once and call it n times in a loop.

If we're not sharing then we're using a copy hack to paper over not instantiating the class properly at the start

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, not sure if it is worth looking at DeformableDetr then since this came from add model-like deformable-detr

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! Well, you can tell I didn't review for that model ;)

Nevertheless, two wrongs don't make a right, let's do it better for this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I was about to fix this I have a question. What do you mean by "memory burden" in the case we share the layer? Because at first I thought you were saying to instead of doing:

self.bbox_embed = nn.ModueList([_bbox_embed for I in range(...)])
...
def forward(...):
...
for bbox in self.bbox_embed:
  x = bbox(x)

to do this:

self.bbox_embed = ...

def forward:
...
for _ in range(config.decoder_layers):
  x = self.bbox_embed(x)

But if that is the case I would imagine that it doesn't matter, right? Memory usage should be the same since there's only one reference to parameters. Consider this example:

import sys

from torch import nn

num_layers = 5

layer = nn.Linear(100, 100)
layers = nn.ModuleList([layer for _  in range(num_layers)])
layers_2 = nn.ModuleList([nn.Linear(100, 100) for _ in range(num_layers)])

print(sys.getsizeof(list(layer.parameters()))) # 120
print(sys.getsizeof(list(layers.parameters()))) # 120
print(sys.getsizeof(list(layers_2.parameters()))) # 184

def print_named_parameters(m):
    for name, param in m.named_parameters():
        print(name, param.shape)

print_named_parameters(layer)
# weight torch.Size([100, 100])
# bias torch.Size([100])
print_named_parameters(layers)
# 0.weight torch.Size([100, 100])
# 0.bias torch.Size([100])
print_named_parameters(layers_2)
# 0.weight torch.Size([100, 100])
# 0.bias torch.Size([100])
# 1.weight torch.Size([100, 100])
# 1.bias torch.Size([100])
# 2.weight torch.Size([100, 100])
# 2.bias torch.Size([100])
# 3.weight torch.Size([100, 100])
# 3.bias torch.Size([100])
# 4.weight torch.Size([100, 100])
# 4.bias torch.Size([100])

Let me know if I overlooked anything else

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was meaning is that, if the embedding is really shared, then we don't need a list at all i.e., instead of:

self.bbox_embed = nn.ModueList([_bbox_embed for I in range(...)])

we can just have

self.bbox_embed = _bbox_embed

which we call n times.

With regards to my previous comment:

If we're not sharing then we're using a copy hack to paper over not instantiating the class properly at the start

I was trying to say, if these parameters are really shared, then this list comprehension / copying isn't needed. Or, they are not shared, in which case we shouldn't be using the deep copy logic that was in _get_clones at all, but instead properly creating the classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we aren't using the _get_clones anymore, but I would still favor using the ModuleList for the case where parameters are shared since this maintains consistency when parameters are not shared and doesn't add complexity to forward, moreover since there's no (clear) disadvantage of using ModuleList I would leave as it is

# hack implementation for two-stage
self.model.decoder.bbox_embed = self.bbox_embed
Expand Down Expand Up @@ -2726,7 +2716,7 @@ def forward(
reference = init_reference_points
else:
reference = inter_references_points[:, level - 1]
reference = inverse_sigmoid(reference)
reference = torch.special.logit(reference, eps=1e-5)
outputs_class = self.class_embed[level](
vision_hidden_state=hidden_states[:, level],
text_hidden_state=enc_text_hidden_state,
Expand Down
1 change: 1 addition & 0 deletions utils/check_repo.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@
"Pop2PianoStack",
"SwitchTransformersStack",
"TFDPRSpanPredictor",
"GroundingDinoTextModel",
EduardoPach marked this conversation as resolved.
Show resolved Hide resolved
"MaskFormerSwinModel",
"MaskFormerSwinPreTrainedModel",
"BridgeTowerTextModel",
Expand Down