Skip to content

Commit c1e139c

Browse files
Namangarg110UbuntustevhliuEduardoPachamyeroberts
authored
Adding hiera (huggingface#30356)
* initialized Structure * Updated variable names * Added Config class, basic HF setup, convert_to_hf * Fixed Convert function, added hiera to HF files, Initilized test files * better naming for x in forward pass * Moved utils to hiera * Change hiera -> hiera_model * Fixed integration into tranformers * Fix: Convert Checkpoint * added documentation for hiera * added documentation for hiera * added Docstings to models, Transformers based changes * make style and quality * make style and quality * Integration & Block tests running * Fixed bugs * initialized Structure * Updated variable names * Added Config class, basic HF setup, convert_to_hf * Fixed Convert function, added hiera to HF files, Initilized test files * better naming for x in forward pass * Moved utils to hiera * Change hiera -> hiera_model * Fixed integration into tranformers * Fix: Convert Checkpoint * added documentation for hiera * added documentation for hiera * added Docstings to models, Transformers based changes * make style and quality * make style and quality * Integration & Block tests running * Fixed bugs * Removed tim dependency * added HieraBlock * fixed: Model name * added tests for HieraModel, HieraBlock * fixed imports * fixed quality & copies * Fixes * Update docs/source/en/model_doc/hiera.md Fix name Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/hiera.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/hiera.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/hiera/configuration_hiera.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/hiera/configuration_hiera.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/hiera/modeling_hiera.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/hiera/modeling_hiera.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fixed formatting * Code quality & Import differences * quality and repo-consistency fix * fixed no torch error * Docstring fix * Docstring fix * doc string fix * fixed example usage * Resolved issues in modeling_hiera * Removed Hiera MAE * Added test and resolved bug * fixed doc string * First commit * Finished conversion script and model forward working * Resolved all issues * nits * Improving tests * Nits * More nits * Improving HieraForMaskedImageModeling * More improvements and nits * Fixed docstrings of outputs * More fixes * More imrpovments * Updated conversion script * Fixed docstrings * Improved tests * Fixed attentou outputs test * All tests green * Removed unnecessary file * contribution attribution * Resolved a few issues * Resolved Comments * Updated model repo id and fixed bugs * Removed loss print * Make tests green * Updated docstrings * Fix style * Fixed num_heads in config * Removed unnecessary video checkpoint related code in the conversion script * Fix style * Changed atol in conversion script * HieraConfig * Fix copies * Fixed typo * Resolved few issues * make * converted conv_nd -> nn.Module * Removed video complexities * Removed video complexities * fix style * Addressing comments * Update src/transformers/models/hiera/modeling_hiera.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/hiera/modeling_hiera.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/hiera/modeling_hiera.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix style * Fixed tests * Fixed typo * Fixed interpolate test * Made torch fx compatible * Made sure imageprocesor is correct * Addressed comments * Noise directly as torch * Remove unnecesary attr * Added return_dit * Update src/transformers/models/hiera/__init__.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Updated checkpoints * [run_slow] hiera * Fixed device mismatch * [run_slow] hiera * Fixed GPU tests * [run_slow] hiera --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-29-50.us-east-2.compute.internal> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Eduardo Pacheco <eduardo.pach@hotmail.com> Co-authored-by: Eduardo Pacheco <69953243+EduardoPach@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
1 parent 574e68d commit c1e139c

18 files changed

+2945
-0
lines changed

docs/source/en/_toctree.yml

+4
Original file line numberDiff line numberDiff line change
@@ -603,6 +603,8 @@
603603
title: FocalNet
604604
- local: model_doc/glpn
605605
title: GLPN
606+
- local: model_doc/hiera
607+
title: Hiera
606608
- local: model_doc/imagegpt
607609
title: ImageGPT
608610
- local: model_doc/levit
@@ -680,6 +682,8 @@
680682
title: CLAP
681683
- local: model_doc/encodec
682684
title: EnCodec
685+
- local: model_doc/hiera
686+
title: Hiera
683687
- local: model_doc/hubert
684688
title: Hubert
685689
- local: model_doc/mctct

docs/source/en/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,7 @@ Flax), PyTorch, and/or TensorFlow.
159159
| [Grounding DINO](model_doc/grounding-dino) ||||
160160
| [GroupViT](model_doc/groupvit) ||||
161161
| [HerBERT](model_doc/herbert) ||||
162+
| [Hiera](model_doc/hiera) ||||
162163
| [Hubert](model_doc/hubert) ||||
163164
| [I-BERT](model_doc/ibert) ||||
164165
| [IDEFICS](model_doc/idefics) ||||

docs/source/en/model_doc/hiera.md

+48
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
12+
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
13+
rendered properly in your Markdown viewer.
14+
15+
-->
16+
17+
# Hiera
18+
19+
## Overview
20+
21+
Hiera was proposed in [Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles](https://arxiv.org/abs/2306.00989) by Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer
22+
23+
The paper introduces "Hiera," a hierarchical Vision Transformer that simplifies the architecture of modern hierarchical vision transformers by removing unnecessary components without compromising on accuracy or efficiency. Unlike traditional transformers that add complex vision-specific components to improve supervised classification performance, Hiera demonstrates that such additions, often termed "bells-and-whistles," are not essential for high accuracy. By leveraging a strong visual pretext task (MAE) for pretraining, Hiera retains simplicity and achieves superior accuracy and speed both in inference and training across various image and video recognition tasks. The approach suggests that spatial biases required for vision tasks can be effectively learned through proper pretraining, eliminating the need for added architectural complexity.
24+
25+
The abstract from the paper is the following:
26+
27+
*Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance. While these components lead to effective accuracies and attractive FLOP counts, the added complexity actually makes these transformers slower than their vanilla ViT counterparts. In this paper, we argue that this additional bulk is unnecessary. By pretraining with a strong visual pretext task (MAE), we can strip out all the bells-and-whistles from a state-of-the-art multi-stage vision transformer without losing accuracy. In the process, we create Hiera, an extremely simple hierarchical vision transformer that is more accurate than previous models while being significantly faster both at inference and during training. We evaluate Hiera on a variety of tasks for image and video recognition. Our code and models are available at https://github.com/facebookresearch/hiera.*
28+
29+
This model was a joint contibution by [EduardoPacheco](https://huggingface.co/EduardoPacheco) and [namangarg110](https://huggingface.co/namangarg110). The original code can be found [here] (https://github.com/facebookresearch/hiera).
30+
31+
## HieraConfig
32+
33+
[[autodoc]] HieraConfig
34+
35+
## HieraModel
36+
37+
[[autodoc]] HieraModel
38+
- forward
39+
40+
## HieraForPreTraining
41+
42+
[[autodoc]] HieraForPreTraining
43+
- forward
44+
45+
## HieraForImageClassification
46+
47+
[[autodoc]] HieraForImageClassification
48+
- forward

src/transformers/__init__.py

+18
Original file line numberDiff line numberDiff line change
@@ -462,6 +462,7 @@
462462
"GroupViTVisionConfig",
463463
],
464464
"models.herbert": ["HerbertTokenizer"],
465+
"models.hiera": ["HieraConfig"],
465466
"models.hubert": ["HubertConfig"],
466467
"models.ibert": ["IBertConfig"],
467468
"models.idefics": ["IdeficsConfig"],
@@ -2285,6 +2286,15 @@
22852286
"GroupViTVisionModel",
22862287
]
22872288
)
2289+
_import_structure["models.hiera"].extend(
2290+
[
2291+
"HieraBackbone",
2292+
"HieraForImageClassification",
2293+
"HieraForPreTraining",
2294+
"HieraModel",
2295+
"HieraPreTrainedModel",
2296+
]
2297+
)
22882298
_import_structure["models.hubert"].extend(
22892299
[
22902300
"HubertForCTC",
@@ -5112,6 +5122,7 @@
51125122
GroupViTVisionConfig,
51135123
)
51145124
from .models.herbert import HerbertTokenizer
5125+
from .models.hiera import HieraConfig
51155126
from .models.hubert import HubertConfig
51165127
from .models.ibert import IBertConfig
51175128
from .models.idefics import (
@@ -6795,6 +6806,13 @@
67956806
GroupViTTextModel,
67966807
GroupViTVisionModel,
67976808
)
6809+
from .models.hiera import (
6810+
HieraBackbone,
6811+
HieraForImageClassification,
6812+
HieraForPreTraining,
6813+
HieraModel,
6814+
HieraPreTrainedModel,
6815+
)
67986816
from .models.hubert import (
67996817
HubertForCTC,
68006818
HubertForSequenceClassification,

src/transformers/models/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@
105105
grounding_dino,
106106
groupvit,
107107
herbert,
108+
hiera,
108109
hubert,
109110
ibert,
110111
idefics,

src/transformers/models/auto/configuration_auto.py

+2
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@
122122
("graphormer", "GraphormerConfig"),
123123
("grounding-dino", "GroundingDinoConfig"),
124124
("groupvit", "GroupViTConfig"),
125+
("hiera", "HieraConfig"),
125126
("hubert", "HubertConfig"),
126127
("ibert", "IBertConfig"),
127128
("idefics", "IdeficsConfig"),
@@ -403,6 +404,7 @@
403404
("grounding-dino", "Grounding DINO"),
404405
("groupvit", "GroupViT"),
405406
("herbert", "HerBERT"),
407+
("hiera", "Hiera"),
406408
("hubert", "Hubert"),
407409
("ibert", "I-BERT"),
408410
("idefics", "IDEFICS"),

src/transformers/models/auto/image_processing_auto.py

+1
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,7 @@
8585
("glpn", ("GLPNImageProcessor",)),
8686
("grounding-dino", ("GroundingDinoImageProcessor",)),
8787
("groupvit", ("CLIPImageProcessor",)),
88+
("hiera", ("BitImageProcessor",)),
8889
("idefics", ("IdeficsImageProcessor",)),
8990
("idefics2", ("Idefics2ImageProcessor",)),
9091
("imagegpt", ("ImageGPTImageProcessor",)),

src/transformers/models/auto/modeling_auto.py

+5
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,7 @@
119119
("graphormer", "GraphormerModel"),
120120
("grounding-dino", "GroundingDinoModel"),
121121
("groupvit", "GroupViTModel"),
122+
("hiera", "HieraModel"),
122123
("hubert", "HubertModel"),
123124
("ibert", "IBertModel"),
124125
("idefics", "IdeficsModel"),
@@ -295,6 +296,7 @@
295296
("gpt2", "GPT2LMHeadModel"),
296297
("gpt_bigcode", "GPTBigCodeForCausalLM"),
297298
("gptsan-japanese", "GPTSanJapaneseForConditionalGeneration"),
299+
("hiera", "HieraForPreTraining"),
298300
("ibert", "IBertForMaskedLM"),
299301
("idefics", "IdeficsForVisionText2Text"),
300302
("idefics2", "Idefics2ForConditionalGeneration"),
@@ -535,6 +537,7 @@
535537
("efficientnet", "EfficientNetModel"),
536538
("focalnet", "FocalNetModel"),
537539
("glpn", "GLPNModel"),
540+
("hiera", "HieraModel"),
538541
("imagegpt", "ImageGPTModel"),
539542
("levit", "LevitModel"),
540543
("mobilenet_v1", "MobileNetV1Model"),
@@ -610,6 +613,7 @@
610613
),
611614
("efficientnet", "EfficientNetForImageClassification"),
612615
("focalnet", "FocalNetForImageClassification"),
616+
("hiera", "HieraForImageClassification"),
613617
("imagegpt", "ImageGPTForImageClassification"),
614618
(
615619
"levit",
@@ -1258,6 +1262,7 @@
12581262
("dinat", "DinatBackbone"),
12591263
("dinov2", "Dinov2Backbone"),
12601264
("focalnet", "FocalNetBackbone"),
1265+
("hiera", "HieraBackbone"),
12611266
("maskformer-swin", "MaskFormerSwinBackbone"),
12621267
("nat", "NatBackbone"),
12631268
("pvt_v2", "PvtV2Backbone"),
+59
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Copyright 2024 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
from typing import TYPE_CHECKING
15+
16+
from ...utils import (
17+
OptionalDependencyNotAvailable,
18+
_LazyModule,
19+
is_torch_available,
20+
)
21+
22+
23+
_import_structure = {"configuration_hiera": ["HieraConfig"]}
24+
25+
try:
26+
if not is_torch_available():
27+
raise OptionalDependencyNotAvailable()
28+
except OptionalDependencyNotAvailable:
29+
pass
30+
else:
31+
_import_structure["modeling_hiera"] = [
32+
"HieraForImageClassification",
33+
"HieraForPreTraining",
34+
"HieraBackbone",
35+
"HieraModel",
36+
"HieraPreTrainedModel",
37+
]
38+
39+
if TYPE_CHECKING:
40+
from .configuration_hiera import HieraConfig
41+
42+
try:
43+
if not is_torch_available():
44+
raise OptionalDependencyNotAvailable()
45+
except OptionalDependencyNotAvailable:
46+
pass
47+
else:
48+
from .modeling_hiera import (
49+
HieraBackbone,
50+
HieraForImageClassification,
51+
HieraForPreTraining,
52+
HieraModel,
53+
HieraPreTrainedModel,
54+
)
55+
56+
else:
57+
import sys
58+
59+
sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)

0 commit comments

Comments
 (0)