-
Notifications
You must be signed in to change notification settings - Fork 348
Fix Ollama client streaming issue with stream=True #439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix Ollama client streaming issue with stream=True #439
Conversation
Resolves issue SylphAI-Inc#299 where OllamaClient failed with 'generator' object has no attribute 'raw_response' error when using stream=True. Changes: - Modified OllamaClient.parse_chat_completion to return raw generators directly for streaming - Updated Generator error handling to prevent generator objects in raw_response field - Added proper type checking for both sync and async generators - Updated tests to reflect correct streaming behavior The fix ensures that streaming generators are handled correctly by the Generator component rather than being incorrectly wrapped in GeneratorOutput.raw_response.
adalflow/adalflow/core/generator.py
Outdated
output = self._post_call(completion) | ||
except Exception as e: | ||
log.error(f"Error processing the output: {e}") | ||
# Check if completion is a generator to avoid placing generator object in raw_response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we almost never change the generator output. Raw_response is for streaming, data is for final parsed result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check anthropic client on how to handle the streaming. Eventually the standard is to convert to openai's responses api standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try this code,
from adalflow.components.model_client.ollama_client import OllamaClient
from adalflow.core import Generator
stream_generator = Generator(
model_client=OllamaClient(host="http://localhost:11434"),
model_kwargs={
"model": "gpt-oss:20b",
"stream": True, # Enable streaming
}
)
async def test_ollama_streaming():
# async call with streaming
output = await stream_generator.acall(prompt_kwargs={"input_str": "Why is the sky blue?"})
async for chunk in output.raw_response:
print(chunk["message"]["content"], end='', flush=True)
if __name__ == "__main__":
import asyncio
asyncio.run(test_ollama_streaming())
It works for streaming.
Previous implementation broke interface consistency and created architectural problems. Corrected approach: - OllamaClient consistently returns GeneratorOutput for all cases - raw_response contains the streaming generator (following Anthropic client pattern) - data remains None until final output is processed - Removed incorrect type checking from Generator core component - Maintains polymorphism across all model clients This follows the established contract: - raw_response = streaming chunks/iterator - data = finalized complete output (processed later) Fixes maintain full compatibility with Generator component and preserve all existing functionality (processors, tracking, caching). All tests pass and integration with Generator component verified.
@liyin2015 I've updated the implementation based on your feedback. You mentioned that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check the comments
except Exception as e: | ||
log.error(f"Error processing the output processors: {e}") | ||
output.error = str(e) | ||
# Check if this is a streaming response (generator/iterator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this pr does not do much. and we cant force the data to None either. It is supposed to be the final complete output data, which should be handled in ollama_client, where u have to collect all stream and save the complete one in this field. you can see example in https://github.com/SylphAI-Inc/AdalFlow/blob/main/adalflow/adalflow/components/model_client/anthropic_client.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep the generator not changed at all
Summary
Fixes issue #299 where the Ollama client fails with
'generator' object has no attribute 'raw_response'
error when usingstream=True
.Correct Implementation
GeneratorOutput
for both streaming and non-streamingraw_response
contains the streaming generator (following established contract)data
remainsNone
until final output is processed by Generator componentPrevious Approach (Fixed)
The initial implementation incorrectly returned raw generators directly, breaking:
Technical Changes
1. OllamaClient (
adalflow/components/model_client/ollama_client.py
)2. Generator Core (
adalflow/core/generator.py
)3. Test Updates (
tests/test_ollama_client.py
)Updated to follow proper streaming contract:
parsed.raw_response
→ streaming iteratorparsed.data
→None
(until consumed)GeneratorOutput
consistencyTest Results
# All tests passing pytest tests/test_ollama_client.py -v ======================== 10 passed ========================
#299