Skip to content

Unable to use tool calling when using Llama 4 scout via LiteLLM Proxy #723

Open
@tewenhao

Description

@tewenhao

Please read this first

  • Have you read the custom model provider docs, including the 'Common issues' section? Model provider docs Yes
  • Have you searched for related issues? Others may have faced similar issues. Yes

Describe the question

(This is my first ever issue posted ever! If anything, I am happy to receive advice about openai agents and the like, as well as how I should be describing/framing issues I post. I hope to be able to help everyone out!)

I was following this guide on using custom model providers with open ai agents, and for my model i am using llama-4-scout-17b-16e-instruct via the LiteLLM proxy.

When I try to run the code, I keep getting the following error:

openai.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: Hosted_vllmException - "auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set. Received Model Group=llama-4-scout-17b-16e-instruct\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '400'}}

I want to be able to run everything within a script rather than rely on CLI calls. I was able to run the code when I use AWS Bedrock as my model instead, so I doubt it is an issue with the code, nor would it be the API Keys or the Base URL I am using.

I have also explored using VLLM, but unfortunately as I work with a pretty sizable team I am unable to downgrade my Python version to <3.13 as required by VLLM without subjecting the rest of my team to verifying and checking none of the other features break because of the downgrade.

I want to ask if it is possible to do tool calling and/or reasoning capabilities using Llama 4 Scout via the LiteLLM Proxy in OpenAI Agents SDK

Debug information

  • Agents SDK version: 0.0.15
  • Python version 3.13.3

Repro steps

Ideally provide a minimal python script that can be run to reproduce the issue.

from __future__ import annotations
import asyncio
import litellm

from agents import Agent, Runner, function_tool, set_default_openai_api
from agents.extensions.models.litellm_model import LitellmModel

set_default_openai_api('chat_completions')

# get litellm proxy
import os
from dotenv import load_dotenv, find_dotenv
from pathlib import Path

load_dotenv(Path("../.env"))
API_KEY = os.getenv("API_KEY")
BASE_URL = os.getenv("BASE_URL")
MODEL_NAME = "litellm_proxy/llama-4-scout-17b-16e-instruct"
# MODEL_NAME = "litellm_proxy/bedrock"

@function_tool
def get_weather(city: str):
    print(f"[debug] getting weather for {city}")
    return f"The weather in {city} is sunny."

async def main(model: str, base_url: str, api_key: str):
    agent = Agent(
        name="Assistant",
        instructions="You only respond in haikus.",
        model=LitellmModel(model=model, base_url=base_url, api_key=api_key),
        tools=[get_weather],
    )

    result = await Runner.run(agent, "What's the weather in Tokyo?")
    print(result.final_output)

if __name__ == '__main__':
    try:
        asyncio.run(main(model=MODEL_NAME, base_url=BASE_URL, api_key=API_KEY))
    except Exception as e:
        print(e)

Expected behavior

A clear and concise description of what you expected to happen.

I expected to see this response (which I got from running using bedrock):

[debug] getting weather for Tokyo
Sun bathes Tokyo
Warmth embraces the city
Clear skies reign supreme

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions