Remove ConversationalPipeline and Conversation object (#31165)

* Remove ConversationalPipeline and Conversation object, as they have been deprecated for some time and are due for removal * Update not-doctested.txt * Fix JA and ZH docs * Fix JA and ZH docs some more * Fix JA and ZH docs some more
huggingface · Jun 17, 2024 · 4412651 · 4412651
1 parent f8e6f3b
commit 4412651
Show file tree

Hide file tree

Showing 46 changed files with 30 additions and 914 deletions.
diff --git a/docs/source/en/main_classes/pipelines.md b/docs/source/en/main_classes/pipelines.md
@@ -386,14 +386,6 @@ Pipelines available for computer vision tasks include the following.
 
 Pipelines available for natural language processing tasks include the following.
 
-### ConversationalPipeline
-
-[[autodoc]] Conversation
-
-[[autodoc]] ConversationalPipeline
-    - __call__
-    - all
-
 ### FillMaskPipeline
 
 [[autodoc]] FillMaskPipeline

diff --git a/docs/source/ja/chat_templating.md b/docs/source/ja/chat_templating.md
@@ -180,16 +180,16 @@ tokenizer.chat_template = template  # Set the new template
 tokenizer.push_to_hub("model_name")  # Upload your new template to the Hub!
 ```
 
-[`~PreTrainedTokenizer.apply_chat_template`] メソッドは、あなたのチャットテンプレートを使用するために [`ConversationalPipeline`] クラスによって呼び出されます。
-したがって、正しいチャットテンプレートを設定すると、あなたのモデルは自動的に [`ConversationalPipeline`] と互換性があるようになります。
+[`~PreTrainedTokenizer.apply_chat_template`] メソッドは、あなたのチャットテンプレートを使用するために `TextGenerationPipeline` クラスによって呼び出されます。
+したがって、正しいチャットテンプレートを設定すると、あなたのモデルは自動的に [`TextGenerationPipeline`] と互換性があるようになります。
 
 
 ## What are "default" templates?
 
 チャットテンプレートの導入前に、チャットの処理はモデルクラスレベルでハードコードされていました。
 後方互換性のために、このクラス固有の処理をデフォルトテンプレートとして保持し、クラスレベルで設定されています。
 モデルにチャットテンプレートが設定されていない場合、ただしモデルクラスのデフォルトテンプレートがある場合、
-`ConversationalPipeline`クラスや`apply_chat_template`などのメソッドはクラステンプレートを使用します。
+`TextGenerationPipeline`クラスや`apply_chat_template`などのメソッドはクラステンプレートを使用します。
 トークナイザのデフォルトのチャットテンプレートを確認するには、`tokenizer.default_chat_template`属性をチェックしてください。
 
 これは、後方互換性のために純粋に行っていることで、既存のワークフローを壊さないようにしています。
@@ -233,7 +233,7 @@ I'm doing great!<|im_end|>
 ```
 
 「ユーザー」、「システム」、および「アシスタント」の役割は、チャットの標準です。
-特に、[`ConversationalPipeline`]との連携をスムーズに行う場合には、これらの役割を使用することをお勧めします。ただし、これらの役割に制約はありません。テンプレートは非常に柔軟で、任意の文字列を役割として使用できます。
+特に、`TextGenerationPipeline`との連携をスムーズに行う場合には、これらの役割を使用することをお勧めします。ただし、これらの役割に制約はありません。テンプレートは非常に柔軟で、任意の文字列を役割として使用できます。
 
 ## I want to use chat templates! How should I get started?
 
@@ -242,7 +242,7 @@ I'm doing great!<|im_end|>
 この属性を適切に設定できるように[プルリクエスト](https://huggingface.co/docs/hub/repositories-pull-requests-discussions)を開いてください。
 
 一度属性が設定されれば、それで完了です！ `tokenizer.apply_chat_template`は、そのモデルに対して正しく動作するようになります。これは、
-`ConversationalPipeline`などの場所でも自動的にサポートされます。
+`TextGenerationPipeline` などの場所でも自動的にサポートされます。
 
 モデルがこの属性を持つことを確認することで、オープンソースモデルの全コミュニティがそのフルパワーを使用できるようになります。
 フォーマットの不一致はこの分野に悩み続け、パフォーマンスに黙って影響を与えてきました。それを終わらせる時が来ました！

diff --git a/docs/source/ja/main_classes/pipelines.md b/docs/source/ja/main_classes/pipelines.md
@@ -388,14 +388,6 @@ my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline)
 
 自然言語処理タスクに使用できるパイプラインには次のものがあります。
 
-### ConversationalPipeline
-
-[[autodoc]] Conversation
-
-[[autodoc]] ConversationalPipeline
-    - __call__
-    - all
-
 ### FillMaskPipeline
 
 [[autodoc]] FillMaskPipeline

diff --git a/docs/source/zh/chat_templating.md b/docs/source/zh/chat_templating.md
@@ -117,30 +117,27 @@ Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopte
 
 ## 有自动化的聊天`pipeline`吗？
 
-有的，[`ConversationalPipeline`]。这个`pipeline`的设计是为了方便使用聊天模型。让我们再试一次 Zephyr 的例子，但这次使用`pipeline`：
+有的，[`TextGenerationPipeline`]。这个`pipeline`的设计是为了方便使用聊天模型。让我们再试一次 Zephyr 的例子，但这次使用`pipeline`：
 
 ```python
 from transformers import pipeline
 
-pipe = pipeline("conversational", "HuggingFaceH4/zephyr-7b-beta")
+pipe = pipeline("text-generation", "HuggingFaceH4/zephyr-7b-beta")
 messages = [
     {
         "role": "system",
         "content": "You are a friendly chatbot who always responds in the style of a pirate",
     },
     {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
 ]
-print(pipe(messages))
+print(pipe(messages, max_new_tokens=256)['generated_text'][-1])
 ```
 
 ```text
-Conversation id: 76d886a0-74bd-454e-9804-0467041a63dc
-system: You are a friendly chatbot who always responds in the style of a pirate
-user: How many helicopters can a human eat in one sitting?
-assistant: Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all.
+{'role': 'assistant', 'content': "Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all."}
 ```
 
-[`ConversationalPipeline`]将负责处理所有的`tokenized`并调用`apply_chat_template`，一旦模型有了聊天模板，您只需要初始化pipeline并传递消息列表！
+[`TextGenerationPipeline`]将负责处理所有的`tokenized`并调用`apply_chat_template`，一旦模型有了聊天模板，您只需要初始化pipeline并传递消息列表！
 
 ## 什么是"generation prompts"?
 
@@ -317,12 +314,12 @@ tokenizer.chat_template = template  # Set the new template
 tokenizer.push_to_hub("model_name")  # Upload your new template to the Hub!
 ```
 
-由于[`~PreTrainedTokenizer.apply_chat_template`]方法是由[`ConversationalPipeline`]类调用，
-因此一旦你设置了聊天模板，您的模型将自动与[`ConversationalPipeline`]兼容。
+由于[`~PreTrainedTokenizer.apply_chat_template`]方法是由[`TextGenerationPipeline`]类调用，
+因此一旦你设置了聊天模板，您的模型将自动与[`TextGenerationPipeline`]兼容。
 ### “默认”模板是什么？
 
 在引入聊天模板（chat_template）之前，聊天prompt是在模型中通过硬编码处理的。为了向前兼容，我们保留了这种硬编码处理聊天prompt的方法。
-如果一个模型没有设置聊天模板，但其模型有默认模板，`ConversationalPipeline`类和`apply_chat_template`等方法将使用该模型的聊天模板。
+如果一个模型没有设置聊天模板，但其模型有默认模板，`TextGenerationPipeline`类和`apply_chat_template`等方法将使用该模型的聊天模板。
 您可以通过检查`tokenizer.default_chat_template`属性来查找`tokenizer`的默认模板。
 
 这是我们纯粹为了向前兼容性而做的事情，以避免破坏任何现有的工作流程。即使默认的聊天模板适用于您的模型，
@@ -367,7 +364,7 @@ How are you?<|im_end|>
 I'm doing great!<|im_end|>
 ```
 
-`user`，`system`和`assistant`是对话助手模型的标准角色，如果您的模型要与[`ConversationalPipeline`]兼容，我们建议你使用这些角色。
+`user`，`system`和`assistant`是对话助手模型的标准角色，如果您的模型要与[`TextGenerationPipeline`]兼容，我们建议你使用这些角色。
 但您可以不局限于这些角色，模板非常灵活，任何字符串都可以成为角色。
 
 ### 如何添加聊天模板？
@@ -378,7 +375,7 @@ I'm doing great!<|im_end|>
 请发起一个[pull request](https://huggingface.co/docs/hub/repositories-pull-requests-discussions)，以便正确设置该属性！
 
 一旦属性设置完成，就完成了！`tokenizer.apply_chat_template`现在将在该模型中正常工作，
-这意味着它也会自动支持在诸如`ConversationalPipeline`的地方！
+这意味着它也会自动支持在诸如`TextGenerationPipeline`的地方！
 
 通过确保模型具有这一属性，我们可以确保整个社区都能充分利用开源模型的全部功能。
 格式不匹配已经困扰这个领域并悄悄地损害了性能太久了，是时候结束它们了！

diff --git a/docs/source/zh/main_classes/pipelines.md b/docs/source/zh/main_classes/pipelines.md
@@ -362,14 +362,6 @@ my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline)
 
 可用于自然语言处理任务的pipeline包括以下几种。
 
-### ConversationalPipeline
-
-[[autodoc]] Conversation
-
-[[autodoc]] ConversationalPipeline
-    - __call__
-    - all
-
 ### FillMaskPipeline
 
 [[autodoc]] FillMaskPipeline

diff --git a/src/transformers/__init__.py b/src/transformers/__init__.py
@@ -799,8 +799,6 @@
     "pipelines": [
         "AudioClassificationPipeline",
         "AutomaticSpeechRecognitionPipeline",
-        "Conversation",
-        "ConversationalPipeline",
         "CsvPipelineDataFormat",
         "DepthEstimationPipeline",
         "DocumentQuestionAnsweringPipeline",
@@ -5428,8 +5426,6 @@
     from .pipelines import (
         AudioClassificationPipeline,
         AutomaticSpeechRecognitionPipeline,
-        Conversation,
-        ConversationalPipeline,
         CsvPipelineDataFormat,
         DepthEstimationPipeline,
         DocumentQuestionAnsweringPipeline,

diff --git a/src/transformers/models/cohere/tokenization_cohere_fast.py b/src/transformers/models/cohere/tokenization_cohere_fast.py
@@ -20,7 +20,6 @@
 
 from tokenizers import processors
 
-from ...pipelines.conversational import Conversation
 from ...tokenization_utils_base import BatchEncoding
 from ...tokenization_utils_fast import PreTrainedTokenizerFast
 from ...utils import logging
@@ -413,7 +412,7 @@ def default_chat_template(self):
 
     def apply_tool_use_template(
         self,
-        conversation: Union[List[Dict[str, str]], "Conversation"],
+        conversation: Union[List[Dict[str, str]]],
         tools: List[Dict],
         **kwargs,
     ) -> Union[str, List[int]]:
@@ -424,13 +423,13 @@ def apply_tool_use_template(
 
         Conceptually, this works in the same way as `apply_chat_format`, but takes an additional `tools` parameter.
 
-        Converts a Conversation object or a list of dictionaries with `"role"` and `"content"` keys and a list of available
+        Converts a chat in the form of a list of dictionaries with `"role"` and `"content"` keys and a list of available
         tools for the model to use into a prompt string, or a list of token ids.
         This method will use the tokenizer's `default_tool_use_template` template specified at the class level.
         You can override the default template using the `tool_use_template` kwarg but the quality of your results may decrease.
 
         Args:
-            conversation (Union[List[Dict[str, str]], "Conversation"]): A Conversation object or list of dicts
+            conversation (Union[List[Dict[str, str]]]): A list of dicts
                 with "role" and "content" keys, representing the chat history so far.
             tools (List[Dict]): a list of tools to render into the prompt for the model to choose from.
                 See an example at the bottom of the docstring.
@@ -568,7 +567,7 @@ def directly_answer() -> List[Dict]:
 
     def apply_grounded_generation_template(
         self,
-        conversation: Union[List[Dict[str, str]], "Conversation"],
+        conversation: Union[List[Dict[str, str]]],
         documents: List[Dict],
         citation_mode: Literal["fast", "accurate"] = "accurate",
         **kwargs,
@@ -580,13 +579,13 @@ def apply_grounded_generation_template(
         Conceptually, this works in the same way as `apply_chat_format`, but takes additional `documents`
         and parameter `citation_mode` parameters.
 
-        Converts a Conversation object or a list of dictionaries with `"role"` and `"content"` keys and a list of
+        Converts a list of dictionaries with `"role"` and `"content"` keys and a list of
         documents for the model to ground its response on into a prompt string, or a list of token ids.
         This method will use the tokenizer's `grounded_generation_template` template specified at the class level.
         You can override the default template using the `grounded_generation_template` kwarg but the quality of your results may decrease.
 
         Args:
-            conversation (Union[List[Dict[str, str]], "Conversation"]): A Conversation object or list of dicts
+            conversation (Union[List[Dict[str, str]]]): A list of dicts
                 with "role" and "content" keys, representing the chat history so far.
             documents (List[Dict[str, str]): A list of dicts, representing documents or tool outputs to ground your
                 generation on. A document is a semistructured dict, wiht a string to string mapping. Common fields are

diff --git a/src/transformers/models/idefics2/processing_idefics2.py b/src/transformers/models/idefics2/processing_idefics2.py
@@ -26,7 +26,6 @@
 
 
 if TYPE_CHECKING:
-    from ...pipelines.conversational import Conversation
     from ...tokenization_utils_base import PreTokenizedInput
 
 
@@ -255,7 +254,7 @@ def model_input_names(self):
 
     def apply_chat_template(
         self,
-        conversation: Union[List[Dict[str, str]], "Conversation"],
+        conversation: Union[List[Dict[str, str]]],
         chat_template: Optional[str] = None,
         tokenize: bool = False,
         **kwargs,
@@ -269,7 +268,7 @@ def apply_chat_template(
         tokens to the sequence length or adding the surrounding tokens e.g. <fake_image_token>.
 
         Args:
-            conversation (`Union[List[Dict, str, str], "Conversation"]`):
+            conversation (`Union[List[Dict, str, str]]`):
                 The conversation to format.
             chat_template (`Optional[str]`, *optional*):
                 The Jinja template to use for formatting the conversation. If not provided, the default chat template

diff --git a/src/transformers/pipelines/__init__.py b/src/transformers/pipelines/__init__.py
@@ -58,7 +58,6 @@
     get_default_model_and_revision,
     infer_framework_load_model,
 )
-from .conversational import Conversation, ConversationalPipeline
 from .depth_estimation import DepthEstimationPipeline
 from .document_question_answering import DocumentQuestionAnsweringPipeline
 from .feature_extraction import FeatureExtractionPipeline
@@ -340,15 +339,6 @@
         },
         "type": "multimodal",
     },
-    "conversational": {
-        "impl": ConversationalPipeline,
-        "tf": (TFAutoModelForSeq2SeqLM, TFAutoModelForCausalLM) if is_tf_available() else (),
-        "pt": (AutoModelForSeq2SeqLM, AutoModelForCausalLM) if is_torch_available() else (),
-        "default": {
-            "model": {"pt": ("microsoft/DialoGPT-medium", "8bada3b"), "tf": ("microsoft/DialoGPT-medium", "8bada3b")}
-        },
-        "type": "text",
-    },
     "image-classification": {
         "impl": ImageClassificationPipeline,
         "tf": (TFAutoModelForImageClassification,) if is_tf_available() else (),
@@ -593,7 +583,6 @@ def pipeline(
 
             - `"audio-classification"`: will return a [`AudioClassificationPipeline`].
             - `"automatic-speech-recognition"`: will return a [`AutomaticSpeechRecognitionPipeline`].
-            - `"conversational"`: will return a [`ConversationalPipeline`].
             - `"depth-estimation"`: will return a [`DepthEstimationPipeline`].
             - `"document-question-answering"`: will return a [`DocumentQuestionAnsweringPipeline`].
             - `"feature-extraction"`: will return a [`FeatureExtractionPipeline`].