Description
Please read this first
- Have you read the custom model provider docs, including the 'Common issues' section? Model provider docs Yes
- Have you searched for related issues? Others may have faced similar issues. Yes
Describe the question
(This is my first ever issue posted ever! If anything, I am happy to receive advice about openai agents and the like, as well as how I should be describing/framing issues I post. I hope to be able to help everyone out!)
I was following this guide on using custom model providers with open ai agents, and for my model i am using llama-4-scout-17b-16e-instruct
via the LiteLLM proxy.
When I try to run the code, I keep getting the following error:
openai.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: Hosted_vllmException - "auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set. Received Model Group=llama-4-scout-17b-16e-instruct\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '400'}}
I want to be able to run everything within a script rather than rely on CLI calls. I was able to run the code when I use AWS Bedrock as my model instead, so I doubt it is an issue with the code, nor would it be the API Keys or the Base URL I am using.
I have also explored using VLLM, but unfortunately as I work with a pretty sizable team I am unable to downgrade my Python version to <3.13 as required by VLLM without subjecting the rest of my team to verifying and checking none of the other features break because of the downgrade.
I want to ask if it is possible to do tool calling and/or reasoning capabilities using Llama 4 Scout via the LiteLLM Proxy in OpenAI Agents SDK
Debug information
- Agents SDK version: 0.0.15
- Python version 3.13.3
Repro steps
Ideally provide a minimal python script that can be run to reproduce the issue.
from __future__ import annotations
import asyncio
import litellm
from agents import Agent, Runner, function_tool, set_default_openai_api
from agents.extensions.models.litellm_model import LitellmModel
set_default_openai_api('chat_completions')
# get litellm proxy
import os
from dotenv import load_dotenv, find_dotenv
from pathlib import Path
load_dotenv(Path("../.env"))
API_KEY = os.getenv("API_KEY")
BASE_URL = os.getenv("BASE_URL")
MODEL_NAME = "litellm_proxy/llama-4-scout-17b-16e-instruct"
# MODEL_NAME = "litellm_proxy/bedrock"
@function_tool
def get_weather(city: str):
print(f"[debug] getting weather for {city}")
return f"The weather in {city} is sunny."
async def main(model: str, base_url: str, api_key: str):
agent = Agent(
name="Assistant",
instructions="You only respond in haikus.",
model=LitellmModel(model=model, base_url=base_url, api_key=api_key),
tools=[get_weather],
)
result = await Runner.run(agent, "What's the weather in Tokyo?")
print(result.final_output)
if __name__ == '__main__':
try:
asyncio.run(main(model=MODEL_NAME, base_url=BASE_URL, api_key=API_KEY))
except Exception as e:
print(e)
Expected behavior
A clear and concise description of what you expected to happen.
I expected to see this response (which I got from running using bedrock):
[debug] getting weather for Tokyo
Sun bathes Tokyo
Warmth embraces the city
Clear skies reign supreme