Skip to content

Gemini Server Behavior Change Catch - llm model glitch #17

@huan

Description

@huan
src/assisted-mode/gemini-helper.test.ts > callDeterministicGemini > Determinism - JSON Mode > should return identical JSON for same prompt (JSON mode): extension/functions/src/assisted-mode/gemini-helper.test.ts#L90AssertionError: expected { …(2) } to deeply equal { …(2) }  - Expected + Received    {     "model": "veo-3.1-fast-generate-preview",     "reasoning": [       "The user wants to generate a video of a cat playing piano.", -     "The request is straightforward and does not require extremely high fidelity or complex scene composition.", -     "veo-3.1-fast-generate-preview is suitable for generating videos quickly and efficiently, making it a good choice for this type of request where speed is likely a priority.", +     "The request is straightforward and does not involve complex scenes or specific stylistic requirements.", +     "veo-3.1-fast-generate-preview is suitable for generating standard video content quickly and efficiently.",     ],   }   ❯ src/assisted-mode/gemini-helper.test.ts:90:21
--
src/assisted-mode/gemini-helper.test.ts > callDeterministicGemini > Determinism - Text Mode > should return identical results for same prompt (text mode): extension/functions/src/assisted-mode/gemini-helper.test.ts#L42AssertionError: expected 'The **temperature parameter** in Larg…' to be 'The **temperature parameter** in Larg…' // Object.is equality  - Expected + Received  - The **temperature parameter** in Large Language Model (LLM) generation is a crucial hyperparameter that controls the **randomness and creativity** of the model's output. It essentially influences the probability distribution of the next token (word or sub-word) the model chooses. + The **temperature parameter** in Large Language Model (LLM) generation is a crucial hyperparameter that controls the **randomness and creativity** of the model's output. It essentially influences how the model chooses the next word (or token) in a sequence.    Here's a breakdown of its purpose:  - **1. Controlling Randomness and Predictability:** + **1. Controlling Randomness and Determinism:**    *   **Low Temperature (e.g., 0.1 - 0.5):**       *   **Purpose:** To make the output more **deterministic, focused, and predictable**. -     *   **How it works:** The model will strongly favor the tokens with the highest probability. It's less likely to deviate from the most "obvious" or common next word. +     *   **How it works:** When the temperature is low, the LLM assigns higher probabilities to the most likely next tokens. The model will tend to pick the token with the absolute highest probability, making its choices more conservative and less prone to deviation.       *   **Use Cases:**           *   **Factual recall and summarization:** When you need accurate and concise information. -         *   **Code generation:** To produce syntactically correct and functional code. +         *   **Code generation:** To ensure the generated code is syntactically correct and follows expected patterns. -         *   **Translation:** To ensure the most direct and accurate translation. -         *   **Tasks requiring consistency:** Where a predictable and repeatable output is desired. +         *   **Translation:** To maintain the original meaning and structure as closely as possible. +         *   **Tasks requiring precision and consistency.**    *   **High Temperature (e.g., 0.7 - 1.0+):**       *   **Purpose:** To make the output more **creative, diverse, and surprising**. -     *   **How it works:** The model is more willing to explore tokens with lower probabilities. This introduces more variation and can lead to unexpected but potentially interesting combinations of words. +     *   **How it works:** When the temperature is high, the probability distribution over possible next tokens becomes flatter. This means that tokens with lower probabilities have a greater chance of being selected. The model is more willing to explore less common word choices, leading to more varied and sometimes unexpected outputs.       *   **Use Cases:**           *   **Creative writing (stories, poems, scripts):** To generate novel ideas and imaginative content.           *   **Brainstorming and idea generation:** To explore a wider range of possibilities. -         *   **Generating different stylistic variations:** To produce text in a particular tone or voice. +         *   **Generating different stylistic variations of text.**           *   **Conversational AI:** To make chatbots more engaging and less repetitive.  - **2. Shaping the Probability Distribution:** - - LLMs work by predicting the probability of each possible next token. The temperature parameter is applied to these probabilities *before* the model samples from them. - - Mathematically, the temperature is often used in a softmax function. The standard softmax function is: - - $P(token_i) = \frac{exp(logit_i)}{\sum_j exp(logit_j)}$ + *   **Medium Temperature (e.g., 0.5 - 0.7):** +     *   **Purpose:** To strike a **balance between predictability and creativity**. +     *   **How it works:** This range offers a good compromise, allowing for some variation without becoming completely nonsensical. +     *   **Use Cases:** +         *   **General text generation:** For tasks where a mix of coherence and some originality is desired. +         *   **Conten

[src/assisted-mode/gemini-helper.test.ts > callDeterministicGemini > Determinism - JSON Mode > should return identical JSON for same prompt (JSON mode): extension/functions/src/assisted-mode/gemini-helper.test.ts#L90](https://github.com/ShipFail/firegen/commit/57ffcf83e364a7772be4608360046e936fd249cf#annotation_42644863675)
AssertionError: expected { …(2) } to deeply equal { …(2) }

- Expected
+ Received

  {
    "model": "veo-3.1-fast-generate-preview",
    "reasoning": [
      "The user wants to generate a video of a cat playing piano.",
-     "The request is straightforward and does not require extremely high fidelity or complex scene composition.",
-     "veo-3.1-fast-generate-preview is suitable for generating videos quickly and efficiently, making it a good choice for this type of request where speed is likely a priority.",
+     "The request is straightforward and does not involve complex scenes or specific stylistic requirements.",
+     "veo-3.1-fast-generate-preview is suitable for generating standard video content quickly and efficiently.",
    ],
  }

 ❯ src/assisted-mode/gemini-helper.test.ts:90:21

[src/assisted-mode/gemini-helper.test.ts > callDeterministicGemini > Determinism - Text Mode > should return identical results for same prompt (text mode): extension/functions/src/assisted-mode/gemini-helper.test.ts#L42](https://github.com/ShipFail/firegen/commit/57ffcf83e364a7772be4608360046e936fd249cf#annotation_42644863682)
AssertionError: expected 'The **temperature parameter** in Larg…' to be 'The **temperature parameter** in Larg…' // Object.is equality

- Expected
+ Received

- The **temperature parameter** in Large Language Model (LLM) generation is a crucial hyperparameter that controls the **randomness and creativity** of the model's output. It essentially influences the probability distribution of the next token (word or sub-word) the model chooses.
+ The **temperature parameter** in Large Language Model (LLM) generation is a crucial hyperparameter that controls the **randomness and creativity** of the model's output. It essentially influences how the model chooses the next word (or token) in a sequence.

  Here's a breakdown of its purpose:

- **1. Controlling Randomness and Predictability:**
+ **1. Controlling Randomness and Determinism:**

  *   **Low Temperature (e.g., 0.1 - 0.5):**
      *   **Purpose:** To make the output more **deterministic, focused, and predictable**.
-     *   **How it works:** The model will strongly favor the tokens with the highest probability. It's less likely to deviate from the most "obvious" or common next word.
+     *   **How it works:** When the temperature is low, the LLM assigns higher probabilities to the most likely next tokens. The model will tend to pick the token with the absolute highest probability, making its choices more conservative and less prone to deviation.
      *   **Use Cases:**
          *   **Factual recall and summarization:** When you need accurate and concise information.
-         *   **Code generation:** To produce syntactically correct and functional code.
+         *   **Code generation:** To ensure the generated code is syntactically correct and follows expected patterns.
-         *   **Translation:** To ensure the most direct and accurate translation.
-         *   **Tasks requiring consistency:** Where a predictable and repeatable output is desired.
+         *   **Translation:** To maintain the original meaning and structure as closely as possible.
+         *   **Tasks requiring precision and consistency.**

  *   **High Temperature (e.g., 0.7 - 1.0+):**
      *   **Purpose:** To make the output more **creative, diverse, and surprising**.
-     *   **How it works:** The model is more willing to explore tokens with lower probabilities. This introduces more variation and can lead to unexpected but potentially interesting combinations of words.
+     *   **How it works:** When the temperature is high, the probability distribution over possible next tokens becomes flatter. This means that tokens with lower probabilities have a greater chance of being selected. The model is more willing to explore less common word choices, leading to more varied and sometimes unexpected outputs.
      *   **Use Cases:**
          *   **Creative writing (stories, poems, scripts):** To generate novel ideas and imaginative content.
          *   **Brainstorming and idea generation:** To explore a wider range of possibilities.
-         *   **Generating different stylistic variations:** To produce text in a particular tone or voice.
+         *   **Generating different stylistic variations of text.**
          *   **Conversational AI:** To make chatbots more engaging and less repetitive.

- **2. Shaping the Probability Distribution:**
-
- LLMs work by predicting the probability of each possible next token. The temperature parameter is applied to these probabilities *before* the model samples from them.
-
- Mathematically, the temperature is often used in a softmax function. The standard softmax function is:
-
- $P(token_i) = \frac{exp(logit_i)}{\sum_j exp(logit_j)}$
+ *   **Medium Temperature (e.g., 0.5 - 0.7):**
+     *   **Purpose:** To strike a **balance between predictability and creativity**.
+     *   **How it works:** This range offers a good compromise, allowing for some variation without becoming completely nonsensical.
+     *   **Use Cases:**
+         *   **General text generation:** For tasks where a mix of coherence and some originality is desired.
+         *   **Conten

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions