Skip to content

Commit 04addbc

Browse files
Merge branch 'main' into remove-constrained-bs
2 parents e800c78 + becab2c commit 04addbc

File tree

216 files changed

+3779
-2126
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

216 files changed

+3779
-2126
lines changed

.github/workflows/collated-reports.yml

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,3 @@ jobs:
4141
--job ${{ inputs.job }} \
4242
--report-repo-id ${{ inputs.report_repo_id }} \
4343
--gpu-name ${{ inputs.gpu_name }}
44-
45-
- name: Upload collated reports
46-
uses: actions/upload-artifact@v4
47-
with:
48-
name: collated_reports_${{ env.CI_SHA }}.json
49-
path: collated_reports_${{ env.CI_SHA }}.json

.github/workflows/push-important-models.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,7 @@ jobs:
145145
name: Model CI
146146
uses: ./.github/workflows/self-scheduled.yml
147147
needs: get_modified_models
148+
if: needs.get_modified_models.outputs.matrix != '' && needs.get_modified_models.outputs.matrix != '[]'
148149
with:
149150
job: run_models_gpu
150151
slack_report_channel: "#transformers-ci-push"

.github/workflows/self-scheduled.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -515,7 +515,7 @@ jobs:
515515
run_quantization_torch_gpu,
516516
run_extract_warnings
517517
]
518-
if: ${{ always() }}
518+
if: always() && !cancelled()
519519
uses: ./.github/workflows/slack-report.yml
520520
with:
521521
job: ${{ inputs.job }}

.github/workflows/slack-report.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ jobs:
9393
python utils/notification_service.py "${{ inputs.quantization_matrix }}"
9494
else
9595
python utils/notification_service.py "${{ inputs.folder_slices }}"
96-
fi
96+
fi
9797
9898
# Upload complete failure tables, as they might be big and only truncated versions could be sent to Slack.
9999
- name: Failure table artifacts

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -373,6 +373,8 @@
373373
- sections:
374374
- local: model_doc/albert
375375
title: ALBERT
376+
- local: model_doc/apertus
377+
title: Apertus
376378
- local: model_doc/arcee
377379
title: Arcee
378380
- local: model_doc/bamba

docs/source/en/cache_explanation.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ rendered properly in your Markdown viewer.
1515
-->
1616

1717
# Caching
18+
1819
Imagine you're having a conversation with someone, and instead of remembering what they previously said, they have to start from scratch every time you respond. This would be slow and inefficient, right?
1920

2021
You can extend this analogy to transformer models. Autoregressive model generation can be slow because it makes a prediction one token at a time. Each new prediction is dependent on all the previous context.
@@ -107,7 +108,7 @@ model_id = "meta-llama/Llama-2-7b-chat-hf"
107108
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16, device_map=device)
108109
tokenizer = AutoTokenizer.from_pretrained(model_id)
109110

110-
past_key_values = DynamicCache()
111+
past_key_values = DynamicCache(config=model.config)
111112
messages = [{"role": "user", "content": "Hello, what's your name."}]
112113
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt", return_dict=True).to(model.device)
113114

@@ -138,7 +139,7 @@ The cache position tracks where to insert new tokens in the attention cache. It
138139
Cache position is used internally for two purposes:
139140

140141
1. Selecting new tokens to process in the input sequence and ensuring only tokens that haven’t been cached yet are passed to the model's `forward`.
141-
2. Storing key/value pairs at the correct positions in the cache. This is especially important for fixed-size caches, like [`StaticCache`], that pre-allocates a specific cache length.
142+
2. Storing key/value pairs at the correct positions in the cache. This is especially important for fixed-size caches, that pre-allocates a specific cache length.
142143

143144
The generation loop usually takes care of the cache position, but if you're writing a custom generation method, it is important that cache positions are accurate since they are used to write and read key/value states into fixed slots.
144145

docs/source/en/gguf.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ Add the `gguf_file` parameter to [`~PreTrainedModel.from_pretrained`] to specify
3333

3434
```py
3535
# pip install gguf
36+
import torch
3637
from transformers import AutoTokenizer, AutoModelForCausalLM
3738

3839
model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"

docs/source/en/kv_cache.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_id)
227227

228228
user_prompts = ["Hello, what's your name?", "Btw, yesterday I was on a rock concert."]
229229

230-
past_key_values = DynamicCache()
230+
past_key_values = DynamicCache(config=model.config)
231231

232232
messages = []
233233
for prompt in user_prompts:
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
<!--Copyright 2025 The HuggingFace Team and the Swiss AI Initiative. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
12+
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
13+
rendered properly in your Markdown viewer.
14+
15+
-->
16+
17+
<div style="float: right;">
18+
<div class="flex flex-wrap space-x-1">
19+
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
20+
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
21+
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
22+
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
23+
</div>
24+
</div>
25+
26+
# Apertus
27+
28+
[Apertus](https://www.swiss-ai.org) is a family of large language models from the Swiss AI Initiative.
29+
30+
> [!TIP]
31+
> Coming soon
32+
33+
The example below demonstrates how to generate text with [`Pipeline`] or the [`AutoModel`], and from the command line.
34+
35+
<hfoptions id="usage">
36+
<hfoption id="Pipeline">
37+
38+
```py
39+
import torch
40+
from transformers import pipeline
41+
42+
pipeline = pipeline(
43+
task="text-generation",
44+
model="swiss-ai/Apertus-8B",
45+
dtype=torch.bfloat16,
46+
device=0
47+
)
48+
pipeline("Plants create energy through a process known as")
49+
```
50+
51+
</hfoption>
52+
<hfoption id="AutoModel">
53+
54+
```py
55+
import torch
56+
from transformers import AutoModelForCausalLM, AutoTokenizer
57+
58+
tokenizer = AutoTokenizer.from_pretrained(
59+
"swiss-ai/Apertus-8B",
60+
)
61+
model = AutoModelForCausalLM.from_pretrained(
62+
"swiss-ai/Apertus-8B",
63+
dtype=torch.bfloat16,
64+
device_map="auto",
65+
attn_implementation="sdpa"
66+
)
67+
input_ids = tokenizer("Plants create energy through a process known as", return_tensors="pt").to("cuda")
68+
69+
output = model.generate(**input_ids)
70+
print(tokenizer.decode(output[0], skip_special_tokens=True))
71+
```
72+
73+
</hfoption>
74+
<hfoption id="transformers CLI">
75+
76+
```bash
77+
echo -e "Plants create energy through a process known as" | transformers run --task text-generation --model swiss-ai/Apertus-8B --device 0
78+
```
79+
80+
</hfoption>
81+
</hfoptions>
82+
83+
## ApertusConfig
84+
85+
[[autodoc]] ApertusConfig
86+
87+
## ApertusModel
88+
89+
[[autodoc]] ApertusModel
90+
- forward
91+
92+
## ApertusForCausalLM
93+
94+
[[autodoc]] ApertusForCausalLM
95+
- forward
96+
97+
## ApertusForTokenClassification
98+
99+
[[autodoc]] ApertusForTokenClassification
100+
- forward

docs/source/en/model_doc/efficientloftr.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ results = keypoint_matcher([url_0, url_1], threshold=0.9)
4545
print(results[0])
4646
# {'keypoint_image_0': {'x': ..., 'y': ...}, 'keypoint_image_1': {'x': ..., 'y': ...}, 'score': ...}
4747
```
48-
<hfoption id="AutoModel">
48+
</hfoption>
4949
<hfoption id="AutoModel">
5050

5151
```py
@@ -65,7 +65,7 @@ processor = AutoImageProcessor.from_pretrained("zju-community/efficientloftr")
6565
model = AutoModelForKeypointMatching.from_pretrained("zju-community/efficientloftr")
6666

6767
inputs = processor(images, return_tensors="pt")
68-
with torch.no_grad():
68+
with torch.inference_mode():
6969
outputs = model(**inputs)
7070

7171
# Post-process to get keypoints and matches
@@ -92,7 +92,8 @@ processed_outputs = processor.post_process_keypoint_matching(outputs, image_size
9292
# EfficientLoFTR requires pairs of images
9393
images = [image1, image2]
9494
inputs = processor(images, return_tensors="pt")
95-
outputs = model(**inputs)
95+
with torch.inference_mode():
96+
outputs = model(**inputs)
9697

9798
# Extract matching information
9899
keypoints = outputs.keypoints # Keypoints in both images

0 commit comments

Comments
 (0)