feat(llms)!: Add default and call options in VertexAI and ChatVertexAI (

#155) Before if you wanted to change any model options (e.g. temperature) you needed to create a new instance of the model wrapper. Now, we have a `defaultOptions` parameter that allows to configure the default options but you can always override them when calling the model with the `options` parameter. Example: ```dart final chatModel = ChatVertexAI( authHttpClient: authClient, project: 'your-project-id', defaultOptions: ChatVertexAIOptions( temperature: 0.9, ), ); final result = await chatModel( [ChatMessage.human('Hello')], options: ChatVertexAIOptions( temperature: 0.5, ), ); ``` This PR also adds support for two new options: - `stopSequences`: Specifies a list of strings that tells the model to stop generating text if one of the strings is encountered in the response. If a string appears multiple times in the response, then the response truncates where it's first encountered. The strings are case-sensitive. - `candidateCount`: The number of response variations to return.
davidmigloz · Sep 5, 2023 · fe1b12e · fe1b12e
1 parent 2d21c16
commit fe1b12e
Show file tree

Hide file tree

Showing 7 changed files with 379 additions and 150 deletions.
diff --git a/packages/langchain_google/example/langchain_google_example.dart b/packages/langchain_google/example/langchain_google_example.dart
@@ -17,7 +17,9 @@ Future<void> _example1() async {
   final openai = VertexAI(
     authHttpClient: await _getAuthHttpClient(),
     project: _getProjectId(),
-    temperature: 0.9,
+    defaultOptions: const VertexAIOptions(
+      temperature: 0.9,
+    ),
   );
   final result = await openai('Tell me a joke');
   print(result);
@@ -29,7 +31,9 @@ Future<void> _example2() async {
   final chat = ChatVertexAI(
     authHttpClient: await _getAuthHttpClient(),
     project: _getProjectId(),
-    temperature: 0,
+    defaultOptions: const ChatVertexAIOptions(
+      temperature: 0,
+    ),
   );
 
   while (true) {

diff --git a/packages/langchain_google/lib/src/chat_models/models/models.dart b/packages/langchain_google/lib/src/chat_models/models/models.dart
@@ -6,9 +6,80 @@ import 'package:langchain/langchain.dart';
 class ChatVertexAIOptions extends ChatModelOptions {
   /// {@macro chat_vertex_ai_options}
   const ChatVertexAIOptions({
+    this.maxOutputTokens = 1024,
+    this.temperature = 0.2,
+    this.topP = 0.95,
+    this.topK = 40,
+    this.stopSequences = const [],
+    this.candidateCount = 1,
     this.examples,
   });
 
+  /// Maximum number of tokens that can be generated in the response. A token
+  /// is approximately four characters. 100 tokens correspond to roughly
+  /// 60-80 words.
+  ///
+  /// Specify a lower value for shorter responses and a higher value for longer
+  /// responses.
+  ///
+  /// Range: `[1, 1024]`
+  final int maxOutputTokens;
+
+  /// The temperature is used for sampling during response generation, which
+  /// occurs when topP and topK are applied. Temperature controls the degree of
+  /// randomness in token selection. Lower temperatures are good for prompts
+  /// that require a more deterministic and less open-ended or creative
+  /// response, while higher temperatures can lead to more diverse or creative
+  /// results. A temperature of 0 is deterministic, meaning that the highest
+  /// probability response is always selected.
+  ///
+  /// For most use cases, try starting with a temperature of 0.2. If the model
+  /// returns a response that's too generic, too short, or the model gives a
+  /// fallback response, try increasing the temperature.
+  ///
+  /// Range: `[0.0, 1.0]`
+  final double temperature;
+
+  /// Top-P changes how the model selects tokens for output. Tokens are
+  /// selected from the most (see top-K) to least probable until the sum of
+  /// their probabilities equals the top-P value. For example, if tokens A, B,
+  /// and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5,
+  /// then the model will select either A or B as the next token by using
+  /// temperature and excludes C as a candidate.
+  ///
+  /// Specify a lower value for less random responses and a higher value for
+  /// more random responses.
+  ///
+  /// Range: `[0.0, 1.0]`
+  final double topP;
+
+  /// Top-K changes how the model selects tokens for output. A top-K of 1 means
+  /// the next selected token is the most probable among all tokens in the
+  /// model's vocabulary (also called greedy decoding), while a top-K of 3
+  /// means that the next token is selected from among the three most probable
+  /// tokens by using temperature.
+  ///
+  /// For each token selection step, the top-K tokens with the highest
+  /// probabilities are sampled. Then tokens are further filtered based on
+  /// top-P with the final token selected using temperature sampling.
+  ///
+  /// Specify a lower value for less random responses and a higher value for
+  /// more random responses.
+  ///
+  /// Range: `[1, 40]`
+  final int topK;
+
+  /// Specifies a list of strings that tells the model to stop generating text
+  /// if one of the strings is encountered in the response. If a string appears
+  /// multiple times in the response, then the response truncates where it's
+  /// first encountered. The strings are case-sensitive.
+  final List<String> stopSequences;
+
+  /// The number of response variations to return.
+  ///
+  /// Range: `[1–8]`
+  final int candidateCount;
+
   /// List of messages to the model to learn how to respond to the conversation.
   final List<ChatExample>? examples;
 }
diff --git a/packages/langchain_google/lib/src/chat_models/vertex_ai.dart b/packages/langchain_google/lib/src/chat_models/vertex_ai.dart
@@ -83,6 +83,31 @@ import 'models/models.dart';
 /// The previous list of models may not be exhaustive or up-to-date. Check out
 /// the [Vertex AI documentation](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models)
 /// for the latest list of available models.
+///
+/// ### Model options
+///
+/// You can define default options to use when calling the model (e.g.
+/// temperature, stop sequences, etc. ) using the [defaultOptions] parameter.
+///
+/// The default options can be overridden when calling the model using the
+/// `options` parameter.
+///
+/// Example:
+/// ```dart
+/// final chatModel = ChatVertexAI(
+///   authHttpClient: authClient,
+///   project: 'your-project-id',
+///   defaultOptions: ChatVertexAIOptions(
+///     temperature: 0.9,
+///   ),
+/// );
+/// final result = await chatModel(
+///   [ChatMessage.human('Hello')],
+///   options: ChatVertexAIOptions(
+///     temperature: 0.5,
+///    ),
+/// );
+/// ```
 /// {@endtemplate}
 class ChatVertexAI extends BaseChatModel<ChatVertexAIOptions> {
   /// {@macro chat_vertex_ai}
@@ -93,10 +118,7 @@ class ChatVertexAI extends BaseChatModel<ChatVertexAIOptions> {
     final String rootUrl = 'https://us-central1-aiplatform.googleapis.com/',
     this.publisher = 'google',
     this.model = 'chat-bison',
-    this.maxOutputTokens = 1024,
-    this.temperature = 0.2,
-    this.topP = 0.95,
-    this.topK = 40,
+    this.defaultOptions = const ChatVertexAIOptions(),
   }) : client = VertexAIGenAIClient(
           authHttpClient: authHttpClient,
           project: project,
@@ -123,59 +145,8 @@ class ChatVertexAI extends BaseChatModel<ChatVertexAIOptions> {
   /// https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models
   final String model;
 
-  /// Maximum number of tokens that can be generated in the response. A token
-  /// is approximately four characters. 100 tokens correspond to roughly
-  /// 60-80 words.
-  ///
-  /// Specify a lower value for shorter responses and a higher value for longer
-  /// responses.
-  ///
-  /// Range: `[1, 1024]`
-  final int maxOutputTokens;
-
-  /// The temperature is used for sampling during response generation, which
-  /// occurs when topP and topK are applied. Temperature controls the degree of
-  /// randomness in token selection. Lower temperatures are good for prompts
-  /// that require a more deterministic and less open-ended or creative
-  /// response, while higher temperatures can lead to more diverse or creative
-  /// results. A temperature of 0 is deterministic, meaning that the highest
-  /// probability response is always selected.
-  ///
-  /// For most use cases, try starting with a temperature of 0.2. If the model
-  /// returns a response that's too generic, too short, or the model gives a
-  /// fallback response, try increasing the temperature.
-  ///
-  /// Range: `[0.0, 1.0]`
-  final double temperature;
-
-  /// Top-P changes how the model selects tokens for output. Tokens are
-  /// selected from the most (see top-K) to least probable until the sum of
-  /// their probabilities equals the top-P value. For example, if tokens A, B,
-  /// and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5,
-  /// then the model will select either A or B as the next token by using
-  /// temperature and excludes C as a candidate.
-  ///
-  /// Specify a lower value for less random responses and a higher value for
-  /// more random responses.
-  ///
-  /// Range: `[0.0, 1.0]`
-  final double topP;
-
-  /// Top-K changes how the model selects tokens for output. A top-K of 1 means
-  /// the next selected token is the most probable among all tokens in the
-  /// model's vocabulary (also called greedy decoding), while a top-K of 3
-  /// means that the next token is selected from among the three most probable
-  /// tokens by using temperature.
-  ///
-  /// For each token selection step, the top-K tokens with the highest
-  /// probabilities are sampled. Then tokens are further filtered based on
-  /// top-P with the final token selected using temperature sampling.
-  ///
-  /// Specify a lower value for less random responses and a higher value for
-  /// more random responses.
-  ///
-  /// Range: `[1, 40]`
-  final int topK;
+  /// The default options to use when calling the model.
+  final ChatVertexAIOptions defaultOptions;
 
   /// Scope required for Vertex AI API calls.
   static const cloudPlatformScope = VertexAIGenAIClient.cloudPlatformScope;
@@ -198,7 +169,7 @@ class ChatVertexAI extends BaseChatModel<ChatVertexAIOptions> {
         vertexMessages.add(message.toVertexAIChatMessage());
       }
     }
-    final examples = options?.examples
+    final examples = (options?.examples ?? defaultOptions.examples)
         ?.map((final e) => e.toVertexAIChatExample())
         .toList(growable: false);
 
@@ -209,10 +180,14 @@ class ChatVertexAI extends BaseChatModel<ChatVertexAIOptions> {
       publisher: publisher,
       model: model,
       parameters: VertexAITextChatModelRequestParams(
-        maxOutputTokens: maxOutputTokens,
-        temperature: temperature,
-        topP: topP,
-        topK: topK,
+        maxOutputTokens:
+            options?.maxOutputTokens ?? defaultOptions.maxOutputTokens,
+        temperature: options?.temperature ?? defaultOptions.temperature,
+        topP: options?.topP ?? defaultOptions.topP,
+        topK: options?.topK ?? defaultOptions.topK,
+        stopSequences: options?.stopSequences ?? defaultOptions.stopSequences,
+        candidateCount:
+            options?.candidateCount ?? defaultOptions.candidateCount,
       ),
     );
     return result.toChatResult(model);

diff --git a/packages/langchain_google/lib/src/llms/models/models.dart b/packages/langchain_google/lib/src/llms/models/models.dart
@@ -5,5 +5,77 @@ import 'package:langchain/langchain.dart';
 /// {@endtemplate}
 class VertexAIOptions extends LLMOptions {
   /// {@macro vertex_ai_options}
-  const VertexAIOptions();
+  const VertexAIOptions({
+    this.maxOutputTokens = 1024,
+    this.temperature = 0.2,
+    this.topP = 0.95,
+    this.topK = 40,
+    this.stopSequences = const [],
+    this.candidateCount = 1,
+  });
+
+  /// Maximum number of tokens that can be generated in the response. A token
+  /// is approximately four characters. 100 tokens correspond to roughly
+  /// 60-80 words.
+  ///
+  /// Specify a lower value for shorter responses and a higher value for longer
+  /// responses.
+  ///
+  /// Range: `[1, 1024]`
+  final int maxOutputTokens;
+
+  /// The temperature is used for sampling during response generation, which
+  /// occurs when topP and topK are applied. Temperature controls the degree of
+  /// randomness in token selection. Lower temperatures are good for prompts
+  /// that require a more deterministic and less open-ended or creative
+  /// response, while higher temperatures can lead to more diverse or creative
+  /// results. A temperature of 0 is deterministic, meaning that the highest
+  /// probability response is always selected.
+  ///
+  /// For most use cases, try starting with a temperature of 0.2. If the model
+  /// returns a response that's too generic, too short, or the model gives a
+  /// fallback response, try increasing the temperature.
+  ///
+  /// Range: `[0.0, 1.0]`
+  final double temperature;
+
+  /// Top-P changes how the model selects tokens for output. Tokens are
+  /// selected from the most (see top-K) to least probable until the sum of
+  /// their probabilities equals the top-P value. For example, if tokens A, B,
+  /// and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5,
+  /// then the model will select either A or B as the next token by using
+  /// temperature and excludes C as a candidate.
+  ///
+  /// Specify a lower value for less random responses and a higher value for
+  /// more random responses.
+  ///
+  /// Range: `[0.0, 1.0]`
+  final double topP;
+
+  /// Top-K changes how the model selects tokens for output. A top-K of 1 means
+  /// the next selected token is the most probable among all tokens in the
+  /// model's vocabulary (also called greedy decoding), while a top-K of 3
+  /// means that the next token is selected from among the three most probable
+  /// tokens by using temperature.
+  ///
+  /// For each token selection step, the top-K tokens with the highest
+  /// probabilities are sampled. Then tokens are further filtered based on
+  /// top-P with the final token selected using temperature sampling.
+  ///
+  /// Specify a lower value for less random responses and a higher value for
+  /// more random responses.
+  ///
+  /// Range: `[1, 40]`
+  final int topK;
+
+  /// Specifies a list of strings that tells the model to stop generating text
+  /// if one of the strings is encountered in the response. If a string appears
+  /// multiple times in the response, then the response truncates where it's
+  /// first encountered. The strings are case-sensitive.
+  final List<String> stopSequences;
+
+  /// The number of response variations to return.
+  ///
+  /// Range: `[1–8]`
+  final int candidateCount;
 }