Skip to content

Add detailed ConvBERT model card with usage, architecture, and refere… #38470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions src/transformers/models/convbert/modelcard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
<!-- ConvBERT model card -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to remove this

Suggested change
<!-- ConvBERT model card -->
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->


# ConvBERT

<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
Comment on lines +5 to +9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the TensorFlow badge and this should go above # ConvBERT


---

## Model Overview
Comment on lines +11 to +13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
---
## Model Overview


ConvBERT is a lightweight and efficient NLP transformer model introduced by YituTech. It improves on the classic BERT architecture by incorporating **span-based dynamic convolutions** into the self-attention mechanism. This hybrid approach enables ConvBERT to model both local and global dependencies more effectively while reducing the computational cost.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ConvBERT is a lightweight and efficient NLP transformer model introduced by YituTech. It improves on the classic BERT architecture by incorporating **span-based dynamic convolutions** into the self-attention mechanism. This hybrid approach enables ConvBERT to model both local and global dependencies more effectively while reducing the computational cost.
[ConvBERT](https://huggingface.co/papers/2008.02496) incorporates a mixed attention block that makes it more efficient than [BERT](./bert). Attention is costly because it models global word relationships. This is inefficient because some heads only learn local word relationships. ConvBERT replaces some of the attention heads with a convolution head to handle this. The result of this new mixed attention design is a more lightweight model with lower training costs without compromising performance.
Instead of using attention heads everywhere to model , ConvBERT also includes convolution heads to model local word relationships.
is a lightweight and efficient NLP transformer model introduced by YituTech. It improves on the classic BERT architecture by incorporating **span-based dynamic convolutions** into the self-attention mechanism. This hybrid approach enables ConvBERT to model both local and global dependencies more effectively while reducing the computational cost.


The model performs exceptionally well on tasks such as **text classification**, **question answering**, and **sequence labeling**, making it suitable for deployment in real-time or edge environments. ConvBERT offers performance comparable to or better than BERT, but with fewer parameters and lower latency.

**Authors**: YituTech (Research team)
**Contributors**: Hugging Face community
**Visual Example**: *(image placeholder)*

---

## Model Details

**Architecture**: ConvBERT is based on the Transformer encoder, similar to BERT, but introduces **span-based dynamic convolution** within its layers. Some self-attention heads are replaced with convolutional filters that dynamically select input spans, improving the modeling of local contexts.

**Training Objective**: ConvBERT uses the same masked language modeling (MLM) objective as BERT but is trained with an improved token masking strategy.

**Datasets Used**: ConvBERT is pre-trained on a combination of Wikipedia and BooksCorpus — the same corpora used for BERT pretraining.

**Pretraining Details**:
- MLM with whole-word masking
- Smaller model sizes (fewer parameters than RoBERTa or BERT-Large)
- Mixed attention/convolution blocks for speed

**Training Frameworks**:
- The architecture enables teacher-student knowledge distillation during fine-tuning for downstream tasks.
- No explicit teacher-student training in pretraining phase reported.

---

## Intended Use Cases

ConvBERT is designed for a variety of **NLP tasks**, including but not limited to:

- Sentiment Analysis
- Named Entity Recognition (NER)
- Question Answering
- Text Classification

The model is suitable for both **zero-shot inference** (using pipelines) and **fine-tuning** for specific downstream tasks. It is especially recommended when compute efficiency or real-time inference is important.

---

## Limitations and Warnings

- ConvBERT may not perform as well as larger models like RoBERTa-Large on some high-resource benchmarks.
- The model inherits any **biases present in the BooksCorpus and Wikipedia**, such as social, gender, and geographic biases.
- Not suitable for tasks requiring reasoning over long documents unless specially fine-tuned.

Always evaluate model performance in your own application before production use.

---

## How to Use

You can use ConvBERT either through the Hugging Face `pipeline` API or directly with `AutoModel`:

### Using `pipeline`

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="YituTech/conv-bert-base")
print(classifier("ConvBERT is compact and powerful."))
```

### Using `AutoModel`

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("YituTech/conv-bert-base")
model = AutoModelForSequenceClassification.from_pretrained("YituTech/conv-bert-base")
inputs = tokenizer("ConvBERT balances speed and accuracy.", return_tensors="pt")
outputs = model(**inputs)
```

### CLI Usage

```bash
transformers-cli env
transformers-cli download YituTech/conv-bert-base
```

---

## Performance Metrics

ConvBERT outperforms BERT on the GLUE benchmark and performs comparably to RoBERTa-base while being faster.

- GLUE score: ~79.3 (ConvBERT) vs ~77.6 (BERT)
- SQuAD v1.1 F1: ~93.4
- Parameters: ~110M

---

## References and Resources

- Paper: https://arxiv.org/abs/2008.02496
- GitHub: https://github.com/yitu-opensource/ConvBERT
- Model on HF: https://huggingface.co/YituTech/conv-bert-base

### Citation

```
@article{jiang2020convbert,
title={ConvBERT: Improving BERT with Span-based Dynamic Convolution},
author={Jiang, Wei and Yu, Haihua and Ye, Zihan and Li, Peng and Li, Weiping and Lin, Chin-Yew},
journal={arXiv preprint arXiv:2008.02496},
year={2020}
}
```