Skip to content

Commit 6f05572

Browse files
authored
Add support for ConvNeXT (V1+V2) models (#428)
* Add support for `convnext` and `convnextv2` models * Fix typo
1 parent 3da3841 commit 6f05572

File tree

5 files changed

+93
-0
lines changed

5 files changed

+93
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,8 @@ You can refine your search by selecting the task you're interested in (e.g., [te
271271
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
272272
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
273273
1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve.
274+
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
275+
1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
274276
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
275277
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
276278
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.

docs/snippets/6_supported-models.snippet

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1313
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1414
1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve.
15+
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
16+
1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
1517
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1618
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1719
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.

scripts/supported_models.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,45 @@
140140
'Salesforce/codegen-350M-multi',
141141
'Salesforce/codegen-350M-nl',
142142
],
143+
'convnext':[
144+
# Image classification
145+
'facebook/convnext-tiny-224',
146+
'facebook/convnext-small-224',
147+
'facebook/convnext-base-224',
148+
'facebook/convnext-base-224-22k',
149+
'facebook/convnext-base-224-22k-1k',
150+
'facebook/convnext-base-384',
151+
'facebook/convnext-base-384-22k-1k',
152+
'facebook/convnext-large-224',
153+
'facebook/convnext-large-224-22k',
154+
'facebook/convnext-large-224-22k-1k',
155+
'facebook/convnext-large-384',
156+
'facebook/convnext-large-384-22k-1k',
157+
'facebook/convnext-xlarge-224-22k',
158+
'facebook/convnext-xlarge-224-22k-1k',
159+
'facebook/convnext-xlarge-384-22k-1k',
160+
],
161+
'convnextv2':[
162+
# Image classification
163+
'facebook/convnextv2-atto-1k-224',
164+
'facebook/convnextv2-femto-1k-224',
165+
'facebook/convnextv2-pico-1k-224',
166+
'facebook/convnextv2-tiny-1k-224',
167+
'facebook/convnextv2-tiny-22k-384',
168+
'facebook/convnextv2-tiny-22k-224',
169+
'facebook/convnextv2-nano-1k-224',
170+
'facebook/convnextv2-nano-22k-384',
171+
'facebook/convnextv2-base-22k-224',
172+
'facebook/convnextv2-base-1k-224',
173+
'facebook/convnextv2-base-22k-384',
174+
'facebook/convnextv2-large-22k-224',
175+
'facebook/convnextv2-large-1k-224',
176+
'facebook/convnextv2-large-22k-384',
177+
# 'facebook/convnextv2-huge-22k-512',
178+
# 'facebook/convnextv2-huge-1k-224',
179+
# 'facebook/convnextv2-huge-22k-384',
180+
# 'facebook/convnextv2-nano-22k-224',
181+
],
143182
'deberta': [
144183
# Zero-shot classification
145184
'cross-encoder/nli-deberta-base',

src/models.js

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3545,6 +3545,50 @@ export class DonutSwinPreTrainedModel extends PreTrainedModel { }
35453545
export class DonutSwinModel extends DonutSwinPreTrainedModel { }
35463546
//////////////////////////////////////////////////
35473547

3548+
3549+
//////////////////////////////////////////////////
3550+
export class ConvNextPreTrainedModel extends PreTrainedModel { }
3551+
3552+
/**
3553+
* The bare ConvNext model outputting raw features without any specific head on top.
3554+
*/
3555+
export class ConvNextModel extends ConvNextPreTrainedModel { }
3556+
3557+
/**
3558+
* ConvNext Model with an image classification head on top (a linear layer on top of the pooled features), e.g. for ImageNet.
3559+
*/
3560+
export class ConvNextForImageClassification extends ConvNextPreTrainedModel {
3561+
/**
3562+
* @param {any} model_inputs
3563+
*/
3564+
async _call(model_inputs) {
3565+
return new SequenceClassifierOutput(await super._call(model_inputs));
3566+
}
3567+
}
3568+
//////////////////////////////////////////////////
3569+
3570+
3571+
//////////////////////////////////////////////////
3572+
export class ConvNextV2PreTrainedModel extends PreTrainedModel { }
3573+
3574+
/**
3575+
* The bare ConvNextV2 model outputting raw features without any specific head on top.
3576+
*/
3577+
export class ConvNextV2Model extends ConvNextV2PreTrainedModel { }
3578+
3579+
/**
3580+
* ConvNextV2 Model with an image classification head on top (a linear layer on top of the pooled features), e.g. for ImageNet.
3581+
*/
3582+
export class ConvNextV2ForImageClassification extends ConvNextV2PreTrainedModel {
3583+
/**
3584+
* @param {any} model_inputs
3585+
*/
3586+
async _call(model_inputs) {
3587+
return new SequenceClassifierOutput(await super._call(model_inputs));
3588+
}
3589+
}
3590+
//////////////////////////////////////////////////
3591+
35483592
//////////////////////////////////////////////////
35493593
export class YolosPreTrainedModel extends PreTrainedModel { }
35503594
export class YolosModel extends YolosPreTrainedModel { }
@@ -4114,6 +4158,8 @@ const MODEL_MAPPING_NAMES_ENCODER_ONLY = new Map([
41144158
['owlvit', ['OwlViTModel', OwlViTModel]],
41154159
['beit', ['BeitModel', BeitModel]],
41164160
['deit', ['DeiTModel', DeiTModel]],
4161+
['convnext', ['ConvNextModel', ConvNextModel]],
4162+
['convnextv2', ['ConvNextV2Model', ConvNextV2Model]],
41174163
['resnet', ['ResNetModel', ResNetModel]],
41184164
['swin', ['SwinModel', SwinModel]],
41194165
['swin2sr', ['Swin2SRModel', Swin2SRModel]],
@@ -4266,6 +4312,8 @@ const MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES = new Map([
42664312
['mobilevit', ['MobileViTForImageClassification', MobileViTForImageClassification]],
42674313
['beit', ['BeitForImageClassification', BeitForImageClassification]],
42684314
['deit', ['DeiTForImageClassification', DeiTForImageClassification]],
4315+
['convnext', ['ConvNextForImageClassification', ConvNextForImageClassification]],
4316+
['convnextv2', ['ConvNextV2ForImageClassification', ConvNextV2ForImageClassification]],
42694317
['resnet', ['ResNetForImageClassification', ResNetForImageClassification]],
42704318
['swin', ['SwinForImageClassification', SwinForImageClassification]],
42714319
]);

src/processors.js

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -592,6 +592,7 @@ export class DPTFeatureExtractor extends ImageFeatureExtractor { }
592592
export class GLPNFeatureExtractor extends ImageFeatureExtractor { }
593593
export class CLIPFeatureExtractor extends ImageFeatureExtractor { }
594594
export class ConvNextFeatureExtractor extends ImageFeatureExtractor { }
595+
export class ConvNextImageProcessor extends ConvNextFeatureExtractor { } // NOTE extends ConvNextFeatureExtractor
595596
export class ViTFeatureExtractor extends ImageFeatureExtractor { }
596597
export class MobileViTFeatureExtractor extends ImageFeatureExtractor { }
597598
export class OwlViTFeatureExtractor extends ImageFeatureExtractor {
@@ -1645,6 +1646,7 @@ export class AutoProcessor {
16451646
OwlViTFeatureExtractor,
16461647
CLIPFeatureExtractor,
16471648
ConvNextFeatureExtractor,
1649+
ConvNextImageProcessor,
16481650
DPTFeatureExtractor,
16491651
GLPNFeatureExtractor,
16501652
BeitFeatureExtractor,

0 commit comments

Comments
 (0)