You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert) is a language model based on RoBERTa, but it was trained specifically on French text from the OSCAR dataset.
25
+
[CamemBERT](https://huggingface.co/papers/1911.03894) is a language model based on [RoBERTa](./roberta), but trained specifically on French text from the OSCAR dataset, making it more effective for French language tasks.
10
26
27
+
## CamembertConfig
11
28
What sets CamemBERT apart is that it learned from a huge, high quality collection of French data, as opposed to mixing lots of languages. This helps it really understand French better than many multilingual models.
12
29
13
30
Common applications of CamemBERT include masked language modeling (Fill-mask prediction), text classification (sentiment analysis), token classification (entity recognition) and sentence pair classification (entailment tasks).
14
31
15
-
You can find all the original CamemBERT checkpoints under the [CamemBERT](https://huggingface.co/models?search=camembert) collection.
32
+
[[autodoc]] CamembertConfig
33
+
## CamembertTokenizer
34
+
You can find all the original CamemBERT checkpoints under the [ALMAnaCH](https://huggingface.co/almanach/models?search=camembert) organization.
16
35
17
36
> [!TIP]
18
-
> This model was contributed by the [Facebook AI](https://huggingface.co/facebook) team.
37
+
> This model was contributed by the [ALMAnaCH (Inria)](https://huggingface.co/almanach) team.
19
38
>
20
39
> Click on the CamemBERT models in the right sidebar for more examples of how to apply CamemBERT to different NLP tasks.
40
+
[[autodoc]] CamembertTokenizer
41
+
## CamembertTokenizerFast
21
42
22
-
The examples below demonstrate how to perform masked language modeling with `pipeline` or the `AutoModel` class.
43
+
The examples below demonstrate how to predict the `<mask>` token with [`Pipeline`], [`AutoModel`], and from the command line.
print(f"The predicted token is: {predicted_token}")
48
78
```
49
79
50
80
</hfoption>
51
81
52
82
</hfoptions>
83
+
[[autodoc]] CamembertTokenizerFast
84
+
85
+
## CamembertModel
86
+
[[autodoc]] CamembertModel
87
+
88
+
## CamembertForMaskedLM
89
+
[[autodoc]] CamembertForMaskedLM
90
+
91
+
## CamembertForSequenceClassification
53
92
54
-
Quantization reduces the memory burden of large models by representing weights in lower precision. Refer to the [Quantization](https://huggingface.co/docs/transformers/main/en/quantization) overview for available options.
55
-
The example below uses [BitsAndBytes](https://huggingface.co/docs/transformers/main/en/perf_infer_gpu_one#load-in-8bit-or-4bit-using-bitsandbytes) quantization to load the model in 8-bit precision.
93
+
[[autodoc]] CamembertForSequenceClassification
94
+
95
+
## CamembertForMultipleChoice
96
+
97
+
You can also use CamemBERT for multiple-choice tasks via the command line:
Quantization reduces the memory burden of large models by representing weights in lower precision. Refer to the [Quantization](../quantization/overview) overview for available options.
138
+
139
+
The example below uses [bitsandbytes](../quantization/bitsandbytes) quantization to quantize the weights to 8-bits.
140
+
57
141
```python
58
142
from transformers import AutoTokenizer, AutoModelForMaskedLM, BitsAndBytesConfig
59
143
@@ -65,15 +149,39 @@ model = AutoModelForMaskedLM.from_pretrained(
0 commit comments