You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Updated CamemBERT model card to new standardized format (#39227)
* Updated CamemBERT model card to new standardized format
* Applied review suggestions for CamemBERT: restored API refs, added examples, badges, and attribution
* Updated CamemBERT usage examples, quantization, badges, and format
* Updated CamemBERT badges
* Fixed CLI Section
[CamemBERT](https://huggingface.co/papers/1911.03894) is a language model based on [RoBERTa](./roberta), but trained specifically on French text from the OSCAR dataset, making it more effective for French language tasks.
28
+
29
+
What sets CamemBERT apart is that it learned from a huge, high quality collection of French data, as opposed to mixing lots of languages. This helps it really understand French better than many multilingual models.
30
+
31
+
Common applications of CamemBERT include masked language modeling (Fill-mask prediction), text classification (sentiment analysis), token classification (entity recognition) and sentence pair classification (entailment tasks).
32
+
33
+
You can find all the original CamemBERT checkpoints under the [ALMAnaCH](https://huggingface.co/almanach/models?search=camembert) organization.
34
+
35
+
> [!TIP]
36
+
> This model was contributed by the [ALMAnaCH (Inria)](https://huggingface.co/almanach) team.
37
+
>
38
+
> Click on the CamemBERT models in the right sidebar for more examples of how to apply CamemBERT to different NLP tasks.
39
+
40
+
The examples below demonstrate how to predict the `<mask>` token with [`Pipeline`], [`AutoModel`], and from the command line.
pipeline("Le camembert est un délicieux fromage <mask>.")
52
+
```
53
+
</hfoption>
26
54
27
-
The CamemBERT model was proposed in [CamemBERT: a Tasty French Language Model](https://huggingface.co/papers/1911.03894) by
28
-
[Louis Martin](https://huggingface.co/louismartin), [Benjamin Muller](https://huggingface.co/benjamin-mlr), [Pedro Javier Ortiz Suárez](https://huggingface.co/pjox), Yoann Dupont, Laurent Romary, Éric Villemonte de la
29
-
Clergerie, [Djamé Seddah](https://huggingface.co/Djame), and [Benoît Sagot](https://huggingface.co/sagot). It is based on Facebook's RoBERTa model released in 2019. It is a model
30
-
trained on 138GB of French text.
55
+
<hfoptionid="AutoModel">
31
56
32
-
The abstract from the paper is the following:
57
+
```python
58
+
import torch
59
+
from transformers import AutoTokenizer, AutoModelForMaskedLM
33
60
34
-
*Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available
35
-
models have either been trained on English data or on the concatenation of data in multiple languages. This makes
36
-
practical use of such models --in all languages except English-- very limited. Aiming to address this issue for French,
37
-
we release CamemBERT, a French version of the Bi-directional Encoders for Transformers (BERT). We measure the
38
-
performance of CamemBERT compared to multilingual models in multiple downstream tasks, namely part-of-speech tagging,
39
-
dependency parsing, named-entity recognition, and natural language inference. CamemBERT improves the state of the art
40
-
for most of the tasks considered. We release the pretrained model for CamemBERT hoping to foster research and
model = AutoModelForMaskedLM.from_pretrained("camembert-base", torch_dtype="auto", device_map="auto", attn_implementation="sdpa")
63
+
inputs = tokenizer("Le camembert est un délicieux fromage <mask>.", return_tensors="pt").to("cuda")
42
64
43
-
This model was contributed by [the ALMAnaCH team (Inria)](https://huggingface.co/almanach). The original code can be found [here](https://camembert-model.fr/).
Quantization reduces the memory burden of large models by representing weights in lower precision. Refer to the [Quantization](../quantization/overview) overview for available options.
89
+
90
+
The example below uses [bitsandbytes](../quantization/bitsandbytes) quantization to quantize the weights to 8-bits.
91
+
92
+
```python
93
+
from transformers import AutoTokenizer, AutoModelForMaskedLM, BitsAndBytesConfig
0 commit comments