Skip to content

Commit 8af3af1

Browse files
committed
Squashed commit remove-constrastive-search
1 parent f690a2a commit 8af3af1

26 files changed

+107
-1109
lines changed

docs/source/en/generation_strategies.md

Lines changed: 0 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -225,29 +225,6 @@ outputs = model.generate(**inputs, assistant_model=assistant_model, tokenizer=to
225225
tokenizer.batch_decode(outputs, skip_special_tokens=True)
226226
['Alice and Bob are sitting in a bar. Alice is drinking a beer and Bob is drinking a']
227227
```
228-
229-
### Contrastive search
230-
231-
[Contrastive search](https://huggingface.co/papers/2202.06417) is a decoding strategy that aims to reduce repetition even while generating longer sequences. This strategy compares how similar a generated token is against previous tokens, and if they're more similar, a penalty is applied.
232-
233-
Enable contrastive search with the `penalty_alpha` and `top_k` parameters. The `penalty_alpha` manages the penalty applied and `top_k` is the number of most likely tokens to return.
234-
235-
```py
236-
import torch
237-
from transformers import AutoModelForCausalLM, AutoTokenizer, infer_device
238-
239-
device = infer_device()
240-
241-
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
242-
inputs = tokenizer("Hugging Face is an open-source company", return_tensors="pt").to(device)
243-
244-
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", dtype=torch.float16).to(device)
245-
# explicitly set to 100 because Llama2 generation length is 4096
246-
outputs = model.generate(**inputs, max_new_tokens=100, penalty_alpha=0.6, top_k=4)
247-
tokenizer.batch_decode(outputs, skip_special_tokens=True)
248-
'Hugging Face is an open-source company that provides a platform for building and deploying AI models.\nHugging Face is an open-source company that provides a platform for building and deploying AI models. The platform allows developers to build and deploy AI models, as well as collaborate with other developers.\nHugging Face was founded in 2019 by Thibault Wittemberg and Clément Delangue. The company is based in Paris, France.\nHugging Face has'
249-
```
250-
251228
### Diverse beam search
252229

253230
[Diverse beam search](https://hf.co/papers/1610.02424) is a variant of beam search that produces more diverse output candidates to choose from. This strategy measures the dissimilarity of sequences and a penalty is applied if sequences are too similar. To avoid high computation costs, the number of beams is divided into groups.

docs/source/ja/generation_strategies.md

Lines changed: 0 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -168,29 +168,6 @@ An increasing sequence: one, two, three, four, five, six, seven, eight, nine, te
168168
['I look forward to seeing you all again!\n\n\n\n\n\n\n\n\n\n\n']
169169
```
170170

171-
### Contrastive search
172-
173-
コントラスティブ検索デコーディング戦略は、2022年の論文[A Contrastive Framework for Neural Text Generation](https://huggingface.co/papers/2202.06417)で提案されました。
174-
これは、非反復的でありながら一貫性のある長い出力を生成するために優れた結果を示しています。コントラスティブ検索の動作原理を学ぶには、[このブログポスト](https://huggingface.co/blog/introducing-csearch)をご覧ください。
175-
コントラスティブ検索の動作を有効にし、制御する2つの主要なパラメータは「penalty_alpha」と「top_k」です:
176-
177-
```python
178-
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
179-
180-
>>> checkpoint = "openai-community/gpt2-large"
181-
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
182-
>>> model = AutoModelForCausalLM.from_pretrained(checkpoint)
183-
184-
>>> prompt = "Hugging Face Company is"
185-
>>> inputs = tokenizer(prompt, return_tensors="pt")
186-
187-
>>> outputs = model.generate(**inputs, penalty_alpha=0.6, top_k=4, max_new_tokens=100)
188-
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
189-
['Hugging Face Company is a family owned and operated business. We pride ourselves on being the best
190-
in the business and our customer service is second to none.\n\nIf you have any questions about our
191-
products or services, feel free to contact us at any time. We look forward to hearing from you!']
192-
```
193-
194171
### Multinomial sampling
195172

196173
常に最高確率のトークンを次のトークンとして選択する貪欲検索とは異なり、多項分布サンプリング(または祖先サンプリングとも呼ばれます)はモデルによって提供される語彙全体の確率分布に基づいて次のトークンをランダムに選択します。ゼロ以外の確率を持つすべてのトークンには選択される可能性があり、これにより繰り返しのリスクが減少します。

docs/source/ko/generation_strategies.md

Lines changed: 1 addition & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ GenerationConfig {
6868
- `max_new_tokens`: 생성할 최대 토큰 수입니다. 즉, 프롬프트에 있는 토큰을 제외한 출력 시퀀스의 크기입니다. 출력의 길이를 중단 기준으로 사용하는 대신, 전체 생성물이 일정 시간을 초과할 때 생성을 중단하기로 선택할 수도 있습니다. 더 알아보려면 [`StoppingCriteria`]를 확인하세요.
6969
- `num_beams`: 1보다 큰 수의 빔을 지정함으로써, 탐욕 탐색(greedy search)에서 빔 탐색(beam search)으로 전환하게 됩니다. 이 전략은 각 시간 단계에서 여러 가설을 평가하고 결국 전체 시퀀스에 대해 가장 높은 확률을 가진 가설을 선택합니다. 이는 초기 토큰의 확률이 낮아 탐욕 탐색에 의해 무시되었을 높은 확률의 시퀀스를 식별할 수 있는 장점을 가집니다.
7070
- `do_sample`: 이 매개변수를 `True`로 설정하면, 다항 샘플링, 빔 탐색 다항 샘플링, Top-K 샘플링 및 Top-p 샘플링과 같은 디코딩 전략을 활성화합니다. 이러한 전략들은 전체 어휘에 대한 확률 분포에서 다음 토큰을 선택하며, 전략별로 특정 조정이 적용됩니다.
71-
- `num_return_sequences`: 각 입력에 대해 반환할 시퀀스 후보의 수입니다. 이 옵션은 빔 탐색(beam search)의 변형과 샘플링과 같이 여러 시퀀스 후보를 지원하는 디코딩 전략에만 사용할 수 있습니다. 탐욕 탐색(greedy search)과 대조 탐색(contrastive search) 같은 디코딩 전략은 단일 출력 시퀀스를 반환합니다.
71+
- `num_return_sequences`: 각 입력에 대해 반환할 시퀀스 후보의 수입니다. 이 옵션은 빔 탐색(beam search)의 변형과 샘플링과 같이 여러 시퀀스 후보를 지원하는 디코딩 전략에만 사용할 수 있습니다. 탐욕 탐색(greedy search) 같은 디코딩 전략은 단일 출력 시퀀스를 반환합니다.
7272

7373
## 모델에 사용자 정의 디코딩 전략 저장[[save-a-custom-decoding-strategy-with-your-model]]
7474

@@ -165,27 +165,6 @@ An increasing sequence: one, two, three, four, five, six, seven, eight, nine, te
165165
['I look forward to seeing you all again!\n\n\n\n\n\n\n\n\n\n\n']
166166
```
167167

168-
### 대조 탐색(Contrastive search)[[contrastive-search]]
169-
170-
2022년 논문 [A Contrastive Framework for Neural Text Generation](https://huggingface.co/papers/2202.06417)에서 제안된 대조 탐색 디코딩 전략은 반복되지 않으면서도 일관된 긴 출력을 생성하는 데 있어 우수한 결과를 보였습니다. 대조 탐색이 작동하는 방식을 알아보려면 [이 블로그 포스트](https://huggingface.co/blog/introducing-csearch)를 확인하세요. 대조 탐색의 동작을 가능하게 하고 제어하는 두 가지 주요 매개변수는 `penalty_alpha``top_k`입니다:
171-
172-
```python
173-
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
174-
175-
>>> checkpoint = "openai-community/gpt2-large"
176-
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
177-
>>> model = AutoModelForCausalLM.from_pretrained(checkpoint)
178-
179-
>>> prompt = "Hugging Face Company is"
180-
>>> inputs = tokenizer(prompt, return_tensors="pt")
181-
182-
>>> outputs = model.generate(**inputs, penalty_alpha=0.6, top_k=4, max_new_tokens=100)
183-
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
184-
['Hugging Face Company is a family owned and operated business. We pride ourselves on being the best
185-
in the business and our customer service is second to none.\n\nIf you have any questions about our
186-
products or services, feel free to contact us at any time. We look forward to hearing from you!']
187-
```
188-
189168
### 다항 샘플링(Multinomial sampling)[[multinomial-sampling]]
190169

191170
탐욕 탐색(greedy search)이 항상 가장 높은 확률을 가진 토큰을 다음 토큰으로 선택하는 것과 달리, 다항 샘플링(multinomial sampling, 조상 샘플링(ancestral sampling)이라고도 함)은 모델이 제공하는 전체 어휘에 대한 확률 분포를 기반으로 다음 토큰을 무작위로 선택합니다. 0이 아닌 확률을 가진 모든 토큰은 선택될 기회가 있으므로, 반복의 위험을 줄일 수 있습니다.

examples/pytorch/text-generation/run_generation_contrastive_search.py

Lines changed: 0 additions & 146 deletions
This file was deleted.

src/transformers/cache_utils.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1358,7 +1358,7 @@ def check_dynamic_cache(self, method: str):
13581358
def crop(self, maximum_length: int):
13591359
"""
13601360
Crop the past key values up to a new `maximum_length` in terms of tokens. `maximum_length` can also be
1361-
negative to remove `maximum_length` tokens. This is used in assisted decoding and contrastive search.
1361+
negative to remove `maximum_length` tokens. This is used in assisted decoding and contrastive search (on the Hub).
13621362
"""
13631363
self.check_dynamic_cache(self.crop.__name__)
13641364
self.self_attention_cache.crop(maximum_length)
@@ -1378,13 +1378,13 @@ def batch_split(self, full_batch_size: int, split_size: int) -> "list[EncoderDec
13781378
return out
13791379

13801380
def batch_repeat_interleave(self, repeats: int):
1381-
"""Repeat the cache `repeats` times in the batch dimension. Used in contrastive search."""
1381+
"""Repeat the cache `repeats` times in the batch dimension. Used in contrastive search (on the Hub)."""
13821382
self.check_dynamic_cache(self.batch_repeat_interleave.__name__)
13831383
self.self_attention_cache.batch_repeat_interleave(repeats)
13841384
self.cross_attention_cache.batch_repeat_interleave(repeats)
13851385

13861386
def batch_select_indices(self, indices: torch.Tensor):
1387-
"""Only keep the `indices` in the batch dimension of the cache. Used in contrastive search."""
1387+
"""Only keep the `indices` in the batch dimension of the cache. Used in contrastive search (on the Hub)."""
13881388
self.check_dynamic_cache(self.batch_select_indices.__name__)
13891389
self.self_attention_cache.batch_select_indices(indices)
13901390
self.cross_attention_cache.batch_select_indices(indices)

0 commit comments

Comments
 (0)