Skip to content

Commit

Permalink
feat(llms)!: Add default and call options in VertexAI and ChatVertexAI (
Browse files Browse the repository at this point in the history
#155)

Before if you wanted to change any model options (e.g. temperature) you needed to create a new instance of the model wrapper. Now, we have a `defaultOptions` parameter that allows to configure the default options but you can always override them when calling the model with the `options` parameter.

Example:
```dart
final chatModel = ChatVertexAI(
  authHttpClient: authClient,
  project: 'your-project-id',
  defaultOptions: ChatVertexAIOptions(
    temperature: 0.9,
  ),
);
final result = await chatModel(
  [ChatMessage.human('Hello')],
  options: ChatVertexAIOptions(
    temperature: 0.5,
   ),
);
```

This PR also adds support for two new options:

- `stopSequences`: Specifies a list of strings that tells the model to stop generating text if one of the strings is encountered in the response. If a string appears multiple times in the response, then the response truncates where it's first encountered. The strings are case-sensitive.
- `candidateCount`: The number of response variations to return.
  • Loading branch information
davidmigloz committed Sep 5, 2023
1 parent 2d21c16 commit fe1b12e
Show file tree
Hide file tree
Showing 7 changed files with 379 additions and 150 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@ Future<void> _example1() async {
final openai = VertexAI(
authHttpClient: await _getAuthHttpClient(),
project: _getProjectId(),
temperature: 0.9,
defaultOptions: const VertexAIOptions(
temperature: 0.9,
),
);
final result = await openai('Tell me a joke');
print(result);
Expand All @@ -29,7 +31,9 @@ Future<void> _example2() async {
final chat = ChatVertexAI(
authHttpClient: await _getAuthHttpClient(),
project: _getProjectId(),
temperature: 0,
defaultOptions: const ChatVertexAIOptions(
temperature: 0,
),
);

while (true) {
Expand Down
71 changes: 71 additions & 0 deletions packages/langchain_google/lib/src/chat_models/models/models.dart
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,80 @@ import 'package:langchain/langchain.dart';
class ChatVertexAIOptions extends ChatModelOptions {
/// {@macro chat_vertex_ai_options}
const ChatVertexAIOptions({
this.maxOutputTokens = 1024,
this.temperature = 0.2,
this.topP = 0.95,
this.topK = 40,
this.stopSequences = const [],
this.candidateCount = 1,
this.examples,
});

/// Maximum number of tokens that can be generated in the response. A token
/// is approximately four characters. 100 tokens correspond to roughly
/// 60-80 words.
///
/// Specify a lower value for shorter responses and a higher value for longer
/// responses.
///
/// Range: `[1, 1024]`
final int maxOutputTokens;

/// The temperature is used for sampling during response generation, which
/// occurs when topP and topK are applied. Temperature controls the degree of
/// randomness in token selection. Lower temperatures are good for prompts
/// that require a more deterministic and less open-ended or creative
/// response, while higher temperatures can lead to more diverse or creative
/// results. A temperature of 0 is deterministic, meaning that the highest
/// probability response is always selected.
///
/// For most use cases, try starting with a temperature of 0.2. If the model
/// returns a response that's too generic, too short, or the model gives a
/// fallback response, try increasing the temperature.
///
/// Range: `[0.0, 1.0]`
final double temperature;

/// Top-P changes how the model selects tokens for output. Tokens are
/// selected from the most (see top-K) to least probable until the sum of
/// their probabilities equals the top-P value. For example, if tokens A, B,
/// and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5,
/// then the model will select either A or B as the next token by using
/// temperature and excludes C as a candidate.
///
/// Specify a lower value for less random responses and a higher value for
/// more random responses.
///
/// Range: `[0.0, 1.0]`
final double topP;

/// Top-K changes how the model selects tokens for output. A top-K of 1 means
/// the next selected token is the most probable among all tokens in the
/// model's vocabulary (also called greedy decoding), while a top-K of 3
/// means that the next token is selected from among the three most probable
/// tokens by using temperature.
///
/// For each token selection step, the top-K tokens with the highest
/// probabilities are sampled. Then tokens are further filtered based on
/// top-P with the final token selected using temperature sampling.
///
/// Specify a lower value for less random responses and a higher value for
/// more random responses.
///
/// Range: `[1, 40]`
final int topK;

/// Specifies a list of strings that tells the model to stop generating text
/// if one of the strings is encountered in the response. If a string appears
/// multiple times in the response, then the response truncates where it's
/// first encountered. The strings are case-sensitive.
final List<String> stopSequences;

/// The number of response variations to return.
///
/// Range: `[1–8]`
final int candidateCount;

/// List of messages to the model to learn how to respond to the conversation.
final List<ChatExample>? examples;
}
99 changes: 37 additions & 62 deletions packages/langchain_google/lib/src/chat_models/vertex_ai.dart
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,31 @@ import 'models/models.dart';
/// The previous list of models may not be exhaustive or up-to-date. Check out
/// the [Vertex AI documentation](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models)
/// for the latest list of available models.
///
/// ### Model options
///
/// You can define default options to use when calling the model (e.g.
/// temperature, stop sequences, etc. ) using the [defaultOptions] parameter.
///
/// The default options can be overridden when calling the model using the
/// `options` parameter.
///
/// Example:
/// ```dart
/// final chatModel = ChatVertexAI(
/// authHttpClient: authClient,
/// project: 'your-project-id',
/// defaultOptions: ChatVertexAIOptions(
/// temperature: 0.9,
/// ),
/// );
/// final result = await chatModel(
/// [ChatMessage.human('Hello')],
/// options: ChatVertexAIOptions(
/// temperature: 0.5,
/// ),
/// );
/// ```
/// {@endtemplate}
class ChatVertexAI extends BaseChatModel<ChatVertexAIOptions> {
/// {@macro chat_vertex_ai}
Expand All @@ -93,10 +118,7 @@ class ChatVertexAI extends BaseChatModel<ChatVertexAIOptions> {
final String rootUrl = 'https://us-central1-aiplatform.googleapis.com/',
this.publisher = 'google',
this.model = 'chat-bison',
this.maxOutputTokens = 1024,
this.temperature = 0.2,
this.topP = 0.95,
this.topK = 40,
this.defaultOptions = const ChatVertexAIOptions(),
}) : client = VertexAIGenAIClient(
authHttpClient: authHttpClient,
project: project,
Expand All @@ -123,59 +145,8 @@ class ChatVertexAI extends BaseChatModel<ChatVertexAIOptions> {
/// https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models
final String model;

/// Maximum number of tokens that can be generated in the response. A token
/// is approximately four characters. 100 tokens correspond to roughly
/// 60-80 words.
///
/// Specify a lower value for shorter responses and a higher value for longer
/// responses.
///
/// Range: `[1, 1024]`
final int maxOutputTokens;

/// The temperature is used for sampling during response generation, which
/// occurs when topP and topK are applied. Temperature controls the degree of
/// randomness in token selection. Lower temperatures are good for prompts
/// that require a more deterministic and less open-ended or creative
/// response, while higher temperatures can lead to more diverse or creative
/// results. A temperature of 0 is deterministic, meaning that the highest
/// probability response is always selected.
///
/// For most use cases, try starting with a temperature of 0.2. If the model
/// returns a response that's too generic, too short, or the model gives a
/// fallback response, try increasing the temperature.
///
/// Range: `[0.0, 1.0]`
final double temperature;

/// Top-P changes how the model selects tokens for output. Tokens are
/// selected from the most (see top-K) to least probable until the sum of
/// their probabilities equals the top-P value. For example, if tokens A, B,
/// and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5,
/// then the model will select either A or B as the next token by using
/// temperature and excludes C as a candidate.
///
/// Specify a lower value for less random responses and a higher value for
/// more random responses.
///
/// Range: `[0.0, 1.0]`
final double topP;

/// Top-K changes how the model selects tokens for output. A top-K of 1 means
/// the next selected token is the most probable among all tokens in the
/// model's vocabulary (also called greedy decoding), while a top-K of 3
/// means that the next token is selected from among the three most probable
/// tokens by using temperature.
///
/// For each token selection step, the top-K tokens with the highest
/// probabilities are sampled. Then tokens are further filtered based on
/// top-P with the final token selected using temperature sampling.
///
/// Specify a lower value for less random responses and a higher value for
/// more random responses.
///
/// Range: `[1, 40]`
final int topK;
/// The default options to use when calling the model.
final ChatVertexAIOptions defaultOptions;

/// Scope required for Vertex AI API calls.
static const cloudPlatformScope = VertexAIGenAIClient.cloudPlatformScope;
Expand All @@ -198,7 +169,7 @@ class ChatVertexAI extends BaseChatModel<ChatVertexAIOptions> {
vertexMessages.add(message.toVertexAIChatMessage());
}
}
final examples = options?.examples
final examples = (options?.examples ?? defaultOptions.examples)
?.map((final e) => e.toVertexAIChatExample())
.toList(growable: false);

Expand All @@ -209,10 +180,14 @@ class ChatVertexAI extends BaseChatModel<ChatVertexAIOptions> {
publisher: publisher,
model: model,
parameters: VertexAITextChatModelRequestParams(
maxOutputTokens: maxOutputTokens,
temperature: temperature,
topP: topP,
topK: topK,
maxOutputTokens:
options?.maxOutputTokens ?? defaultOptions.maxOutputTokens,
temperature: options?.temperature ?? defaultOptions.temperature,
topP: options?.topP ?? defaultOptions.topP,
topK: options?.topK ?? defaultOptions.topK,
stopSequences: options?.stopSequences ?? defaultOptions.stopSequences,
candidateCount:
options?.candidateCount ?? defaultOptions.candidateCount,
),
);
return result.toChatResult(model);
Expand Down
74 changes: 73 additions & 1 deletion packages/langchain_google/lib/src/llms/models/models.dart
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,77 @@ import 'package:langchain/langchain.dart';
/// {@endtemplate}
class VertexAIOptions extends LLMOptions {
/// {@macro vertex_ai_options}
const VertexAIOptions();
const VertexAIOptions({
this.maxOutputTokens = 1024,
this.temperature = 0.2,
this.topP = 0.95,
this.topK = 40,
this.stopSequences = const [],
this.candidateCount = 1,
});

/// Maximum number of tokens that can be generated in the response. A token
/// is approximately four characters. 100 tokens correspond to roughly
/// 60-80 words.
///
/// Specify a lower value for shorter responses and a higher value for longer
/// responses.
///
/// Range: `[1, 1024]`
final int maxOutputTokens;

/// The temperature is used for sampling during response generation, which
/// occurs when topP and topK are applied. Temperature controls the degree of
/// randomness in token selection. Lower temperatures are good for prompts
/// that require a more deterministic and less open-ended or creative
/// response, while higher temperatures can lead to more diverse or creative
/// results. A temperature of 0 is deterministic, meaning that the highest
/// probability response is always selected.
///
/// For most use cases, try starting with a temperature of 0.2. If the model
/// returns a response that's too generic, too short, or the model gives a
/// fallback response, try increasing the temperature.
///
/// Range: `[0.0, 1.0]`
final double temperature;

/// Top-P changes how the model selects tokens for output. Tokens are
/// selected from the most (see top-K) to least probable until the sum of
/// their probabilities equals the top-P value. For example, if tokens A, B,
/// and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5,
/// then the model will select either A or B as the next token by using
/// temperature and excludes C as a candidate.
///
/// Specify a lower value for less random responses and a higher value for
/// more random responses.
///
/// Range: `[0.0, 1.0]`
final double topP;

/// Top-K changes how the model selects tokens for output. A top-K of 1 means
/// the next selected token is the most probable among all tokens in the
/// model's vocabulary (also called greedy decoding), while a top-K of 3
/// means that the next token is selected from among the three most probable
/// tokens by using temperature.
///
/// For each token selection step, the top-K tokens with the highest
/// probabilities are sampled. Then tokens are further filtered based on
/// top-P with the final token selected using temperature sampling.
///
/// Specify a lower value for less random responses and a higher value for
/// more random responses.
///
/// Range: `[1, 40]`
final int topK;

/// Specifies a list of strings that tells the model to stop generating text
/// if one of the strings is encountered in the response. If a string appears
/// multiple times in the response, then the response truncates where it's
/// first encountered. The strings are case-sensitive.
final List<String> stopSequences;

/// The number of response variations to return.
///
/// Range: `[1–8]`
final int candidateCount;
}
Loading

0 comments on commit fe1b12e

Please sign in to comment.