Model responses may violate input schema in tool specifications

## Issue Description

I've noticed a significant issue with using tool specifications with the Claude 3 Sonnet model through the AWS Bedrock Runtime API. The main problem is that the model's responses sometimes don't follow the input schema outlined in the tool specification.

This situation raises important questions about the reliability and intended use of the tool specification feature. Most developers likely don't realize that the system currently does not perform schema verification. This lack of clarity underscores the need for better documentation about schema validation and adherence, which are critical for developing robust and predictable applications.

## Key Points of Concern

1. The current behavior suggests a lack of strict enforcement of the input schema, which may lead to unpredictable results and increased complexity in error handling for developers.

2. There is a need for clear documentation and best practices on how to handle and validate responses that may not conform to the specified schema.

## Example

To illustrate this issue, I've implemented a nested tool specification for a "WorkoutPlan" model. This example demonstrates one way in which the model's output can deviate from the specified schema:


```python
import boto3

formatted_tools = [
    {
        "toolSpec": {
            "name": "WorkoutPlan",
            "description": "Model for WorkoutPlan",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "weeks": {
                            "type": "array",
                            "description": "List of 2 workout weeks.",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "days": {
                                        "type": "array",
                                        "description": "List of 7 workout days.",
                                        "items": {
                                            "type": "object",
                                            "properties": {
                                                "description": {
                                                    "type": "string",
                                                    "description": "Description of the workout for this day."
                                                },
                                                "duration": {
                                                    "type": "integer",
                                                    "description": "Duration of the workout in minutes."
                                                }
                                            },
                                            "required": [
                                                "description",
                                                "duration"
                                            ]
                                        }
                                    }
                                },
                                "required": [
                                    "days"
                                ]
                            }
                        }
                    },
                    "required": [
                        "weeks"
                    ]
                }
            }
        }
    }
]



# Define input message
input_text = """
Create a workout program for the below person:

Age: 70
Weight: 240
Gender: Male
Height: 6'2"

I want the program to focus on cardio and weight loss.
"""
messages = [{"role": "user", "content": [{"text": input_text}]}]

bedrock_client = boto3.client(service_name='bedrock-runtime')

# Invoke converse API with tool config
response = bedrock_client.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=messages,
    inferenceConfig={"maxTokens": 4096, "temperature": 0},
    toolConfig={"tools": formatted_tools},
)

assert response['stopReason'] == 'tool_use'

for content in response['output']['message']['content']:
    if 'toolUse' in content:
        tool = content['toolUse']
        print(f"Requesting tool {tool['name']}. Request: {tool['toolUseId']}")
        print(json.dumps(tool['input'], indent=4))  # This output often misses several 'required' fields (e.g. 'duration')

```

And here's an example output:

```
Requesting tool WorkoutPlan. Request: tooluse_aHTGParNQOiW13Fgf41XvQ
{
    "properties": {
        "weeks": [
            {
                "days": [
                    {
                        "description": "30 minutes low-impact cardio (walking or cycling)",
                        "duration": 30
                    },
                    {
                        "description": "Rest day"
                    },
                    {
                        "description": "30 minutes water aerobics or swimming laps",
                        "duration": 30
                    },
                    {
                        "description": "Light strength training - 2 sets of 10-12 reps of seated exercises targeting major muscle groups",
                        "duration": 45
                    },
                    {
                        "description": "45 minutes low-impact cardio (walking or stationary cycling)",
                        "duration": 45
                    },
                    {
                        "description": "Rest day"
                    },
                    {
                        "description": "Flexibility and balance training (stretching, yoga, tai chi)",
                        "duration": 30
                    }
                ]
            },
            {
                "days": [
                    {
                        "description": "30 minutes low-impact cardio (walking or recumbent cycling)",
                        "duration": 30
                    },
                    {
                        "description": "Rest day"
                    },
                    {
                        "description": "45 minutes water aerobics or lap swimming",
                        "duration": 45
                    },
                    {
                        "description": "Light strength training - 2 sets of 10-12 reps of seated/machine exercises",
                        "duration": 45
                    },
                    {
                        "description": "60 minutes low-impact cardio (walking or stationary cycling)",
                        "duration": 60
                    },
                    {
                        "description": "Rest day"
                    },
                    {
                        "description": "Flexibility and balance training (yoga, stretching)",
                        "duration": 30
                    }
                ]
            }
        ]
    }
}
```

Notice that the output skips some 'required' parameters. For example this output is missing several `duration` fields.

This example shows that the model's output can omit fields marked as "required" in the input schema, which is one way the response can violate the schema specifications.

## Questions

Is this behavior with the `toolSpec` intended, or is it a limitation in the current implementation? This raises several important questions:

1. Does the model/API fully support nested parameters in tool specifications?
2. What guarantees, if any, does the tool specification provide regarding schema adherence?
3. Is there any server-side validation or retry logic to ensure responses match the input schema?
4. How should developers approach schema validation and error handling when using tool specifications?
5. Is there a need for clearer documentation on how tool specifications work in practice?

The current behavior, where missing parameters can occur in the output, is quite surprising. While improving parameter descriptions can help in specific cases, the broader issue is the lack of clarity around schema guarantees.

It would be valuable to have documentation that explicitly states whether "built-in function calling support" includes any schema guarantees, or if all validation and retry logic needs to be implemented client-side. Users might wrongly assume that the API enforces stricter schema compliance.

While some indeterminism in parameter values is expected, having inconsistency in the output keys as well is unexpected and could lead to reliability issues in applications using this feature.

Your insights on these points would be helpful in clarifying the expected behavior and guiding developers on how to build reliable applications using tool specifications!

**Update:** It looks like [OpenAI recently added support for this](https://openai.com/index/introducing-structured-outputs-in-the-api/) and differentiated terms by comparing their previously released "JSON Mode" which had no guarantees on output schema to "Structured Outputs" which includes these guarantees. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model responses may violate input schema in tool specifications #619

Issue Description

Key Points of Concern

Example

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model responses may violate input schema in tool specifications #619

Description

Issue Description

Key Points of Concern

Example

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions