huggingface · Aesha19 · May 29, 2025 · stevhliu · Jun 2, 2025 · stevhliu
diff --git a/src/transformers/models/convbert/modelcard.md b/src/transformers/models/convbert/modelcard.md
@@ -0,0 +1,125 @@
+<!-- ConvBERT model card -->
-<!-- ConvBERT model card -->
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
-<!-- ConvBERT model card -->
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# ConvBERT
+
+<div style="float: right;">
+    <div class="flex flex-wrap space-x-1">
+        <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
+    </div>
+</div>
+
+---
+
+## Model Overview
---
-
-## Model Overview
---
-
-## Model Overview
+
+ConvBERT is a lightweight and efficient NLP transformer model introduced by YituTech. It improves on the classic BERT architecture by incorporating **span-based dynamic convolutions** into the self-attention mechanism. This hybrid approach enables ConvBERT to model both local and global dependencies more effectively while reducing the computational cost.
-ConvBERT is a lightweight and efficient NLP transformer model introduced by YituTech. It improves on the classic BERT architecture by incorporating **span-based dynamic convolutions** into the self-attention mechanism. This hybrid approach enables ConvBERT to model both local and global dependencies more effectively while reducing the computational cost.
+[ConvBERT](https://huggingface.co/papers/2008.02496) incorporates a mixed attention block that makes it more efficient than [BERT](./bert). Attention is costly because it models global word relationships. This is inefficient because some heads only learn local word relationships. ConvBERT replaces some of the attention heads with a convolution head to handle this. The result of this new mixed attention design is a more lightweight model with lower training costs without compromising performance.
+
+Instead of using attention heads everywhere to model , ConvBERT also includes convolution heads to model local word relationships.
+
+
+is a lightweight and efficient NLP transformer model introduced by YituTech. It improves on the classic BERT architecture by incorporating **span-based dynamic convolutions** into the self-attention mechanism. This hybrid approach enables ConvBERT to model both local and global dependencies more effectively while reducing the computational cost.
-ConvBERT is a lightweight and efficient NLP transformer model introduced by YituTech. It improves on the classic BERT architecture by incorporating **span-based dynamic convolutions** into the self-attention mechanism. This hybrid approach enables ConvBERT to model both local and global dependencies more effectively while reducing the computational cost.
+[ConvBERT](https://huggingface.co/papers/2008.02496) incorporates a mixed attention block that makes it more efficient than [BERT](./bert). Attention is costly because it models global word relationships. This is inefficient because some heads only learn local word relationships. ConvBERT replaces some of the attention heads with a convolution head to handle this. The result of this new mixed attention design is a more lightweight model with lower training costs without compromising performance.
+
+Instead of using attention heads everywhere to model , ConvBERT also includes convolution heads to model local word relationships.
+
+
+is a lightweight and efficient NLP transformer model introduced by YituTech. It improves on the classic BERT architecture by incorporating **span-based dynamic convolutions** into the self-attention mechanism. This hybrid approach enables ConvBERT to model both local and global dependencies more effectively while reducing the computational cost.
+
+The model performs exceptionally well on tasks such as **text classification**, **question answering**, and **sequence labeling**, making it suitable for deployment in real-time or edge environments. ConvBERT offers performance comparable to or better than BERT, but with fewer parameters and lower latency.
+
+**Authors**: YituTech (Research team)  
+**Contributors**: Hugging Face community  
+**Visual Example**: *(image placeholder)*
+
+---
+
+## Model Details 
+
+**Architecture**: ConvBERT is based on the Transformer encoder, similar to BERT, but introduces **span-based dynamic convolution** within its layers. Some self-attention heads are replaced with convolutional filters that dynamically select input spans, improving the modeling of local contexts.
+
+**Training Objective**: ConvBERT uses the same masked language modeling (MLM) objective as BERT but is trained with an improved token masking strategy.
+
+**Datasets Used**: ConvBERT is pre-trained on a combination of Wikipedia and BooksCorpus — the same corpora used for BERT pretraining.
+
+**Pretraining Details**:
+- MLM with whole-word masking
+- Smaller model sizes (fewer parameters than RoBERTa or BERT-Large)
+- Mixed attention/convolution blocks for speed
+
+**Training Frameworks**:
+- The architecture enables teacher-student knowledge distillation during fine-tuning for downstream tasks.
+- No explicit teacher-student training in pretraining phase reported.
+
+---
+
+## Intended Use Cases
+
+ConvBERT is designed for a variety of **NLP tasks**, including but not limited to:
+
+- Sentiment Analysis
+- Named Entity Recognition (NER)
+- Question Answering
+- Text Classification
+
+The model is suitable for both **zero-shot inference** (using pipelines) and **fine-tuning** for specific downstream tasks. It is especially recommended when compute efficiency or real-time inference is important.
+
+---
+
+## Limitations and Warnings 
+
+- ConvBERT may not perform as well as larger models like RoBERTa-Large on some high-resource benchmarks.
+- The model inherits any **biases present in the BooksCorpus and Wikipedia**, such as social, gender, and geographic biases.
+- Not suitable for tasks requiring reasoning over long documents unless specially fine-tuned.
+
+Always evaluate model performance in your own application before production use.
+
+---
+
+##  How to Use 
+
+You can use ConvBERT either through the Hugging Face `pipeline` API or directly with `AutoModel`:
+
+### Using `pipeline`
+
+```python
+from transformers import pipeline
+
+classifier = pipeline("text-classification", model="YituTech/conv-bert-base")
+print(classifier("ConvBERT is compact and powerful."))
+```
+
+### Using `AutoModel`
+
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+
+tokenizer = AutoTokenizer.from_pretrained("YituTech/conv-bert-base")
+model = AutoModelForSequenceClassification.from_pretrained("YituTech/conv-bert-base")
+inputs = tokenizer("ConvBERT balances speed and accuracy.", return_tensors="pt")
+outputs = model(**inputs)
+```
+
+### CLI Usage
+
+```bash
+transformers-cli env
+transformers-cli download YituTech/conv-bert-base
+```
+
+---
+
+## Performance Metrics
+
+ConvBERT outperforms BERT on the GLUE benchmark and performs comparably to RoBERTa-base while being faster.
+
+- GLUE score: ~79.3 (ConvBERT) vs ~77.6 (BERT)
+- SQuAD v1.1 F1: ~93.4
+- Parameters: ~110M
+
+---
+
+## References and Resources
+
+- Paper: https://arxiv.org/abs/2008.02496
+- GitHub: https://github.com/yitu-opensource/ConvBERT
+- Model on HF: https://huggingface.co/YituTech/conv-bert-base
+
+### Citation
+
+```
+@article{jiang2020convbert,
+  title={ConvBERT: Improving BERT with Span-based Dynamic Convolution},
+  author={Jiang, Wei and Yu, Haihua and Ye, Zihan and Li, Peng and Li, Weiping and Lin, Chin-Yew},
+  journal={arXiv preprint arXiv:2008.02496},
+  year={2020}
+}
+```