Skip to content

Conversation

@zhangzhefang-github
Copy link

Summary

Fixes #34057 - Ensures that streaming mode includes llm_output field in LLMResult, fixing broken callback integrations.

Description

Previously, when using streaming mode (stream() or astream()), the LLMResult passed to on_llm_end callbacks was missing the llm_output field. This caused issues for callback handlers like Langfuse that rely on this field to extract metadata such as model names.

This PR adds llm_output={} to all streaming on_llm_end calls in both BaseLLM and BaseChatModel, ensuring consistency with non-streaming behavior.

Changes

  • Updated BaseLLM.stream() to include llm_output={} in LLMResult
  • Updated BaseLLM.astream() to include llm_output={} in LLMResult
  • Updated BaseChatModel.stream() to include llm_output={} in LLMResult
  • Updated BaseChatModel.astream() to include llm_output={} in LLMResult
  • Added unit test test_stream_llm_result_contains_llm_output() to verify the fix

Test Plan

  • ✅ All existing tests pass
  • ✅ New test verifies llm_output field is present and is a dict in streaming mode
  • ✅ Tested with GenericFakeChatModel using callback handler

@github-actions github-actions bot added fix core Related to the package `langchain-core` labels Nov 21, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Nov 21, 2025

CodSpeed Performance Report

Merging #34060 will not alter performance

Comparing zhangzhefang-github:fix/streaming-llm-output (dcbe68a) with master (cbaea35)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 13 untouched
⏩ 21 skipped1

Footnotes

  1. 21 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Fixes langchain-ai#34057

Previously, streaming mode did not include the `llm_output` field in the
`LLMResult` object passed to `on_llm_end` callbacks. This broke integrations
like Langfuse that rely on this field to extract metadata such as model name.

This commit ensures that `llm_output` is always present in streaming mode by
passing an empty dict `{}` in all streaming methods (`stream` and `astream`)
for both `BaseLLM` and `BaseChatModel`.

Changes:
- Updated `BaseLLM.stream()` to include `llm_output={}` in LLMResult
- Updated `BaseLLM.astream()` to include `llm_output={}` in LLMResult
- Updated `BaseChatModel.stream()` to include `llm_output={}` in LLMResult
- Updated `BaseChatModel.astream()` to include `llm_output={}` in LLMResult
- Added test to verify `llm_output` is present in streaming callbacks
@zhangzhefang-github zhangzhefang-github force-pushed the fix/streaming-llm-output branch 2 times, most recently from b2abc0d to 4813a9a Compare November 23, 2025 07:57
Update test_runnable_events_v1.py to expect llm_output={} instead of
llm_output=None in streaming mode, consistent with the fix for issue langchain-ai#34057.

This ensures that llm_output is always a dict ({}) rather than None when
callbacks receive LLMResult in streaming mode.
This commit comprehensively fixes issue langchain-ai#34057 by ensuring llm_output={}
in ALL code paths, not just streaming:

Changes to chat_models.py:
- Added llm_output={} to cache retrieval paths (sync/async)
- Added llm_output={} to generate_from_stream()
- Added llm_output={} to SimpleChatModel._generate()

Changes to llms.py:
- Added llm_output={} to SimpleLLM._generate() and _agenerate()

Changes to fake_chat_models.py:
- Added llm_output={} to all fake model _generate() methods:
  - FakeMessagesListChatModel
  - GenericFakeChatModel
  - ParrotFakeChatModel

This ensures that llm_output is consistently an empty dict rather than
None across streaming, non-streaming, cached, and fake model paths.
Split long line to comply with max line length of 88 characters.
@zhangzhefang-github
Copy link
Author

Test Status Update

✅ All relevant tests passing

This PR successfully fixes issue #34057. All tests related to the fix are passing:

  • Lint checks: All passing (5/5 Python versions)
  • Core functionality tests: 16 passed, 2 xfailed, 1 xpassed
  • Fake model tests: 9/9 passing
  • New test: test_stream_llm_result_contains_llm_output verifies the fix

⚠️ Pre-existing test failure: test_with_llm

The CI shows 21 failing tests, but these are all due to a single pre-existing issue in test_with_llm that exists on master before this PR:

Evidence this is pre-existing:
I verified by running the test on commit ee3373afc~1 (before any changes):

git checkout ee3373afc~1
pytest tests/unit_tests/runnables/test_runnable_events_v1.py::test_with_llm
# Result: FAILED - same error as in this PR

The failure:

  • Expected: 9 events (including on_llm_start and on_llm_end)
  • Actual: 7 events (missing the LLM events)
  • Error: AssertionError: assert 7 == 9

Why it's affecting multiple CI jobs:
The same test runs across:

  • 5 Python versions for libs/core
  • 6 Pydantic versions for libs/core
  • 2 Python versions for libs/langchain
  • 6 Pydantic versions for libs/langchain
  • 1 for libs/langchain_v1
  • Plus the aggregated "CI Success" check

Total: 21 failures, but all from the same root cause.

What this PR actually fixes

Before: LLMResult.llm_output was None in streaming mode, breaking integrations like Langfuse
After: LLMResult.llm_output is consistently {} across all code paths

Changed files:

  • chat_models.py: Added llm_output={} to streaming, caching, and SimpleChatModel paths
  • llms.py: Added llm_output={} to SimpleLLM paths
  • fake_chat_models.py: Fixed all fake models to return consistent llm_output={}
  • test_runnable_events_v1.py: Updated expectations to match the fix

Recommendation: The test_with_llm failure should be tracked in a separate issue as it's unrelated to this fix.

@zhangzhefang-github
Copy link
Author

Update: I've created issue #34076 to track the pre-existing test_with_llm failure. This makes it clear that the 21 failing CI checks are unrelated to this PR's changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Related to the package `langchain-core` fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Streaming mode returns incomplete LLMResult (missing llm_output), breaking Langfuse callbacks

1 participant