You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/model_doc/whisper.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -72,7 +72,7 @@ Here is a step-by-step guide to transcribing an audio sample using a pre-trained
72
72
' Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.'
73
73
```
74
74
75
-
Whisper is compatible with the following optimisations:
75
+
Whisper is compatible with the following optimisations for both short and long-form generation:
76
76
-[PyTorch Scaled Dot Product Attention (SDPA)](../perf_infer_gpu_one#pytorch-scaled-dot-product-attention): flash attention and memory-efficient attention kernels. Enabled by default for `torch>=2.1.1`.
77
77
-[Flash Attention 2](../perf_infer_gpu_one#flashattention-2): improved implementation of flash attention through better parallelism and work partitioning.
78
78
-[torch.compile](../llm_optims#static-kv-cache-and-torchcompile): JIT-compile the forward pass to dispatch to efficient fused kernels.
@@ -101,7 +101,8 @@ As an example, the following codesnippet enables SDPA and `torch.compile` for up
101
101
... ).input_features
102
102
103
103
>>># Compile the forward pass
104
-
>>> _ = model.generate(input_features)
104
+
>>>for _ inrange(2):
105
+
>>> model.generate(input_features)
105
106
106
107
>>># Generate token ids using compiled graph (fast!)
0 commit comments