Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/source/de/quicktour.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ Die [`pipeline`] kann jedes Modell aus dem [Model Hub](https://huggingface.co/mo

<frameworkcontent>
<pt>
Use the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and it's associated tokenizer (more on an `AutoClass` below):
Use the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and its associated tokenizer (more on an `AutoClass` below):

```py
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
Expand All @@ -166,7 +166,7 @@ Use the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the
```
</pt>
<tf>
Use the [`TFAutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and it's associated tokenizer (more on an `TFAutoClass` below):
Use the [`TFAutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and its associated tokenizer (more on an `TFAutoClass` below):

```py
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
Expand Down Expand Up @@ -222,7 +222,7 @@ Anschließend wandelt der Tokenizer die Token in Zahlen um, um einen Tensor als
Der Tokenizer gibt ein Wörterbuch zurück, das Folgendes enthält:

* [input_ids](./glossary#input-ids): numerische Repräsentationen Ihrer Token.
* [atttention_mask](.glossary#attention-mask): gibt an, welche Token beachtet werden sollen.
* [attention_mask](.glossary#attention-mask): gibt an, welche Token beachtet werden sollen.

Genau wie die [`pipeline`] akzeptiert der Tokenizer eine Liste von Eingaben. Darüber hinaus kann der Tokenizer den Text auch auffüllen und kürzen, um einen Stapel mit einheitlicher Länge zurückzugeben:

Expand Down
6 changes: 3 additions & 3 deletions docs/source/en/cache_explanation.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Unless required by applicable law or agreed to in writing, software distributed
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->
Expand Down Expand Up @@ -62,7 +62,7 @@ for _ in range(max_new_tokens):
# Greedily sample one next token
next_token_ids = outputs.logits[:, -1:].argmax(-1)
generated_ids = torch.cat([generated_ids, next_token_ids], dim=-1)
# Prepare inputs for the next generation step by leaaving unprocessed tokens, in our case we have only one new token
# Prepare inputs for the next generation step by leaving unprocessed tokens, in our case we have only one new token
# and expanding attn mask for the new token, as explained above
attention_mask = inputs["attention_mask"]
attention_mask = torch.cat([attention_mask, attention_mask.new_ones((attention_mask.shape[0], 1))], dim=-1)
Expand All @@ -88,7 +88,7 @@ model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", to
inputs = tokenizer("Hello, my name is", return_tensors="pt").to(model.device)

# `return_dict_in_generate=True` is required to return the cache and `return_legacy_cache` forces the returned cache
# in the the legacy format
# in the legacy format
generation_outputs = model.generate(**inputs, return_dict_in_generate=True, return_legacy_cache=True, max_new_tokens=5)

cache = DynamicCache.from_legacy_cache(generation_outputs.past_key_values)
Expand Down
8 changes: 4 additions & 4 deletions docs/source/en/chat_templating_multimodal.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Unless required by applicable law or agreed to in writing, software distributed
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->
Expand All @@ -18,7 +18,7 @@ rendered properly in your Markdown viewer.

Multimodal model chat templates expect a similar [template](./chat_templating) as text-only models. It needs `messages` that includes a dictionary of the `role` and `content`.

Multimodal templates are included in the [Processor](./processors) class and requires an additional `type` key for specifying whether the included content is an image, video, or text.
Multimodal templates are included in the [Processor](./processors) class and require an additional `type` key for specifying whether the included content is an image, video, or text.

This guide will show you how to format chat templates for multimodal models as well as some best practices for configuring the template

Expand Down Expand Up @@ -109,7 +109,7 @@ These inputs are now ready to be used in [`~GenerationMixin.generate`].

Some vision models also support video inputs. The message format is very similar to the format for [image inputs](#image-inputs).

- The content `"type"` should be `"video"` to indicate the the content is a video.
- The content `"type"` should be `"video"` to indicate the content is a video.
- For videos, it can be a link to the video (`"url"`) or it could be a file path (`"path"`). Videos loaded from a URL can only be decoded with [PyAV](https://pyav.basswood-io.com/docs/stable/) or [Decord](https://github.com/dmlc/decord).

> [!WARNING]
Expand Down Expand Up @@ -141,7 +141,7 @@ Pass `messages` to [`~ProcessorMixin.apply_chat_template`] to tokenize the input

The `video_load_backend` parameter refers to a specific framework to load a video. It supports [PyAV](https://pyav.basswood-io.com/docs/stable/), [Decord](https://github.com/dmlc/decord), [OpenCV](https://github.com/opencv/opencv), and [torchvision](https://pytorch.org/vision/stable/index.html).

The examples below uses Decord as the backend because it is a bit faster than PyAV.
The examples below use Decord as the backend because it is a bit faster than PyAV.

<hfoptions id="sampling">
<hfoption id="fixed number of frames">
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/custom_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ class ResnetModel(PreTrainedModel):
</hfoption>
<hfoption id="ResnetModelForImageClassification">

The `forward` method needs to be rewrittten to calculate the loss for each logit if labels are available. Otherwise, the ResNet model class is the same.
The `forward` method needs to be rewritten to calculate the loss for each logit if labels are available. Otherwise, the ResNet model class is the same.

> [!TIP]
> Add `config_class` to the model class to enable [AutoClass](#autoclass-support) support.
Expand Down
4 changes: 2 additions & 2 deletions docs/source/en/gpu_selection.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Unless required by applicable law or agreed to in writing, software distributed
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->
Expand Down Expand Up @@ -56,7 +56,7 @@ deepspeed --num_gpus 2 trainer-program.py ...

### Order of GPUs

To select specific GPUs to use and their order, configure the the `CUDA_VISIBLE_DEVICES` environment variable. It is easiest to set the environment variable in `~/bashrc` or another startup config file. `CUDA_VISIBLE_DEVICES` is used to map which GPUs are used. For example, if there are 4 GPUs (0, 1, 2, 3) and you only want to run GPUs 0 and 2:
To select specific GPUs to use and their order, configure the `CUDA_VISIBLE_DEVICES` environment variable. It is easiest to set the environment variable in `~/bashrc` or another startup config file. `CUDA_VISIBLE_DEVICES` is used to map which GPUs are used. For example, if there are 4 GPUs (0, 1, 2, 3) and you only want to run GPUs 0 and 2:

```bash
CUDA_VISIBLE_DEVICES=0,2 torchrun trainer-program.py ...
Expand Down
2 changes: 1 addition & 1 deletion docs/source/es/quicktour.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ Pasa tu texto al tokenizador:
El tokenizador devolverá un diccionario conteniendo:

* [input_ids](./glossary#input-ids): representaciones numéricas de los tokens.
* [atttention_mask](.glossary#attention-mask): indica cuáles tokens deben ser atendidos.
* [attention_mask](.glossary#attention-mask): indica cuáles tokens deben ser atendidos.

Como con el [`pipeline`], el tokenizador aceptará una lista de inputs. Además, el tokenizador también puede rellenar (pad, en inglés) y truncar el texto para devolver un lote (batch, en inglés) de longitud uniforme:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/it/perf_infer_cpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Abbiamo integrato di recente `BetterTransformer` per fare inferenza più rapidam

## PyTorch JIT-mode (TorchScript)

TorchScript è un modo di creare modelli serializzabili e ottimizzabili da codice PyTorch. Ogni programmma TorchScript può esere salvato da un processo Python e caricato in un processo dove non ci sono dipendenze Python.
TorchScript è un modo di creare modelli serializzabili e ottimizzabili da codice PyTorch. Ogni programma TorchScript può esere salvato da un processo Python e caricato in un processo dove non ci sono dipendenze Python.
Comparandolo con l'eager mode di default, jit mode in PyTorch normalmente fornisce prestazioni migliori per l'inferenza del modello da parte di metodologie di ottimizzazione come la operator fusion.

Per una prima introduzione a TorchScript, vedi la Introduction to [PyTorch TorchScript tutorial](https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html#tracing-modules).
Expand Down
2 changes: 1 addition & 1 deletion docs/source/pt/quicktour.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ Passe o texto para o tokenizer:
O tokenizer retornará um dicionário contendo:

* [input_ids](./glossary#input-ids): representações numéricas de seus tokens.
* [atttention_mask](.glossary#attention-mask): indica quais tokens devem ser atendidos.
* [attention_mask](.glossary#attention-mask): indica quais tokens devem ser atendidos.

Assim como o [`pipeline`], o tokenizer aceitará uma lista de entradas. Além disso, o tokenizer também pode preencher e truncar o texto para retornar um lote com comprimento uniforme:

Expand Down
2 changes: 1 addition & 1 deletion src/transformers/commands/add_new_model_like.py
Original file line number Diff line number Diff line change
Expand Up @@ -918,7 +918,7 @@ def add_model_to_main_init(
new_model_patterns (`ModelPatterns`): The patterns for the new model.
frameworks (`List[str]`, *optional*):
If specified, only the models implemented in those frameworks will be added.
with_processsing (`bool`, *optional*, defaults to `True`):
with_processing (`bool`, *optional*, defaults to `True`):
Whether the tokenizer/feature extractor/processor of the model should also be added to the init or not.
"""
with open(TRANSFORMERS_PATH / "__init__.py", "r", encoding="utf-8") as f:
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/image_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@
list["np.ndarray"],
list["torch.Tensor"],
list[list["PIL.Image.Image"]],
list[list["np.ndarrray"]],
list[list["np.ndarray"]],
list[list["torch.Tensor"]],
] # noqa

Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/align/processing_align.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ def __call__(
arguments to BertTokenizerFast's [`~BertTokenizerFast.__call__`] if `text` is not `None` to encode
the text. To prepare the image(s), this method forwards the `images` arguments to
EfficientNetImageProcessor's [`~EfficientNetImageProcessor.__call__`] if `images` is not `None`. Please refer
to the doctsring of the above two methods for more information.
to the docstring of the above two methods for more information.

Args:
images (`PIL.Image.Image`, `np.ndarray`, `torch.Tensor`, `List[PIL.Image.Image]`, `List[np.ndarray]`, `List[torch.Tensor]`):
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/altclip/processing_altclip.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ def __call__(
Main method to prepare for the model one or several sequences(s) and image(s). This method forwards the `text`
and `kwargs` arguments to XLMRobertaTokenizerFast's [`~XLMRobertaTokenizerFast.__call__`] if `text` is not
`None` to encode the text. To prepare the image(s), this method forwards the `images` and `kwrags` arguments to
CLIPImageProcessor's [`~CLIPImageProcessor.__call__`] if `images` is not `None`. Please refer to the doctsring
CLIPImageProcessor's [`~CLIPImageProcessor.__call__`] if `images` is not `None`. Please refer to the docstring
of the above two methods for more information.

Args:
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/auto/modeling_flax_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@

FLAX_MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
[
# Model for Image-classsification
# Model for Image-classification
("beit", "FlaxBeitForImageClassification"),
("dinov2", "FlaxDinov2ForImageClassification"),
("regnet", "FlaxRegNetForImageClassification"),
Expand Down
4 changes: 2 additions & 2 deletions src/transformers/models/bamba/configuration_bamba.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ class BambaConfig(PretrainedConfig):
`inputs_ids` passed when calling [`BambaModel`]
tie_word_embeddings (`bool`, *optional*, defaults to `False`):
Whether the model's input and output word embeddings should be tied. Note that this is only relevant if the
model has a output word embedding layer.
model has an output word embedding layer.
hidden_size (`int`, *optional*, defaults to 4096):
Dimension of the hidden representations.
intermediate_size (`int`, *optional*, defaults to 14336):
Expand Down Expand Up @@ -85,7 +85,7 @@ class BambaConfig(PretrainedConfig):
mamba_n_heads (`int`, *optional*, defaults to 128):
The number of mamba heads used in the v2 implementation.
mamba_d_head (`int`, *optional*, defaults to `"auto"`):
Head embeddding dimension size
Head embedding dimension size
mamba_n_groups (`int`, *optional*, defaults to 1):
The number of the mamba groups used in the v2 implementation.
mamba_d_state (`int`, *optional*, defaults to 256):
Expand Down
6 changes: 3 additions & 3 deletions src/transformers/models/bark/convert_suno_to_hf.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,12 +190,12 @@ def load_model(pytorch_dump_folder_path, use_small=False, model_type="text"):
output_new_model = output_new_model_total.logits[:, [-1], :]

else:
prediction_codeboook_channel = 3
prediction_codebook_channel = 3
n_codes_total = 8
vec = torch.randint(256, (batch_size, sequence_length, n_codes_total), dtype=torch.int)

output_new_model_total = model(prediction_codeboook_channel, vec)
output_old_model = bark_model(prediction_codeboook_channel, vec)
output_new_model_total = model(prediction_codebook_channel, vec)
output_old_model = bark_model(prediction_codebook_channel, vec)

output_new_model = output_new_model_total.logits

Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/chameleon/processing_chameleon.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ def __call__(
Main method to prepare for the model one or several sequences(s) and image(s). This method forwards the `text`
and `kwargs` arguments to LlamaTokenizerFast's [`~LlamaTokenizerFast.__call__`] if `text` is not `None` to encode
the text. To prepare the image(s), this method forwards the `images` and `kwrags` arguments to
CLIPImageProcessor's [`~CLIPImageProcessor.__call__`] if `images` is not `None`. Please refer to the doctsring
CLIPImageProcessor's [`~CLIPImageProcessor.__call__`] if `images` is not `None`. Please refer to the docstring
of the above two methods for more information.

Args:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ def __call__(
Main method to prepare for the model one or several sequences(s) and image(s). This method forwards the `text`
and `kwargs` arguments to BertTokenizerFast's [`~BertTokenizerFast.__call__`] if `text` is not `None` to encode
the text. To prepare the image(s), this method forwards the `images` and `kwrags` arguments to
CLIPImageProcessor's [`~CLIPImageProcessor.__call__`] if `images` is not `None`. Please refer to the doctsring
CLIPImageProcessor's [`~CLIPImageProcessor.__call__`] if `images` is not `None`. Please refer to the docstring
of the above two methods for more information.

Args:
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/clap/processing_clap.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def __call__(self, text=None, audios=None, return_tensors=None, **kwargs):
and `kwargs` arguments to RobertaTokenizerFast's [`~RobertaTokenizerFast.__call__`] if `text` is not `None` to
encode the text. To prepare the audio(s), this method forwards the `audios` and `kwrags` arguments to
ClapFeatureExtractor's [`~ClapFeatureExtractor.__call__`] if `audios` is not `None`. Please refer to the
doctsring of the above two methods for more information.
docstring of the above two methods for more information.

Args:
text (`str`, `List[str]`, `List[List[str]]`):
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/clip/processing_clip.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def __call__(self, text=None, images=None, return_tensors=None, **kwargs):
Main method to prepare for the model one or several sequences(s) and image(s). This method forwards the `text`
and `kwargs` arguments to CLIPTokenizerFast's [`~CLIPTokenizerFast.__call__`] if `text` is not `None` to encode
the text. To prepare the image(s), this method forwards the `images` and `kwrags` arguments to
CLIPImageProcessor's [`~CLIPImageProcessor.__call__`] if `images` is not `None`. Please refer to the doctsring
CLIPImageProcessor's [`~CLIPImageProcessor.__call__`] if `images` is not `None`. Please refer to the docstring
of the above two methods for more information.

Args:
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/clipseg/processing_clipseg.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def __call__(self, text=None, images=None, visual_prompt=None, return_tensors=No
Main method to prepare for the model one or several sequences(s) and image(s). This method forwards the `text`
and `kwargs` arguments to CLIPTokenizerFast's [`~CLIPTokenizerFast.__call__`] if `text` is not `None` to encode
the text. To prepare the image(s), this method forwards the `images` and `kwrags` arguments to
ViTImageProcessor's [`~ViTImageProcessor.__call__`] if `images` is not `None`. Please refer to the doctsring of
ViTImageProcessor's [`~ViTImageProcessor.__call__`] if `images` is not `None`. Please refer to the docstring of
the above two methods for more information.

Args:
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/clvp/processing_clvp.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ def __init__(self, feature_extractor, tokenizer):
def __call__(self, *args, **kwargs):
"""
Forwards the `audio` and `sampling_rate` arguments to [`~ClvpFeatureExtractor.__call__`] and the `text`
argument to [`~ClvpTokenizer.__call__`]. Please refer to the doctsring of the above two methods for more
argument to [`~ClvpTokenizer.__call__`]. Please refer to the docstring of the above two methods for more
information.
"""

Expand Down
6 changes: 3 additions & 3 deletions src/transformers/models/colpali/modular_colpali.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,11 +100,11 @@ def __call__(
wrapper around the PaliGemmaProcessor's [`~PaliGemmaProcessor.__call__`] method adapted for the ColPali model. It cannot process
both text and images at the same time.

When preparing the the text(s), this method forwards the `text` and `kwargs` arguments to LlamaTokenizerFast's
When preparing the text(s), this method forwards the `text` and `kwargs` arguments to LlamaTokenizerFast's
[`~LlamaTokenizerFast.__call__`].
When preparing the the image(s), this method forwards the `images` and `kwargs` arguments to SiglipImageProcessor's
When preparing the image(s), this method forwards the `images` and `kwargs` arguments to SiglipImageProcessor's
[`~SiglipImageProcessor.__call__`].
Please refer to the doctsring of the above two methods for more information.
Please refer to the docstring of the above two methods for more information.

Args:
images (`PIL.Image.Image`, `np.ndarray`, `torch.Tensor`, `List[PIL.Image.Image]`, `List[np.ndarray]`, `List[torch.Tensor]`):
Expand Down
Loading