π Enterprise-Grade AI Agent Framework
A modular, scalable framework for building production-ready AI agents using AWS Step Functions. Supporting multiple LLM providers, any programming language for tools, human approval workflows, and comprehensive observability.
- Multi-Provider LLM Support: Anthropic Claude, OpenAI GPT, Google Gemini, Amazon Bedrock, xAI Grok, DeepSeek
- Unified Rust LLM Service: High-performance, provider-agnostic LLM interface
- Modular Architecture: Shared infrastructure, reusable tools, and independent agent deployments
- Language Agnostic Tools: Build tools in Python, TypeScript, Rust, Go, Java, or any language
- Human-in-the-Loop: Built-in approval workflows for sensitive operations
- Long Content Support: Handle extensive documents and conversations
- Enterprise Ready: Full observability, cost tracking, and security best practices
- Architecture Overview
- Quick Start
- Modular Stack Structure
- Deployment Guide
- Creating New Agents
- Building Tools
- LLM Providers
- Monitoring & Observability
- Documentation
graph TB
subgraph "Shared Infrastructure Layer"
SharedInfra[Shared Infrastructure Stack]
AgentRegistry[Agent Registry]
ToolRegistry[Tool Registry]
LLMModels[LLM Models Table]
end
subgraph "LLM Layer"
SharedLLM[Shared LLM Stack<br/>Claude, OpenAI, Gemini]
UnifiedRust[Unified Rust LLM<br/>High Performance]
end
subgraph "Tools Layer"
DBTool[Database Tool]
MapsTool[Google Maps Tool]
MSGraphTool[Microsoft Graph Tool]
WebTool[Web Research Tool]
CodeTool[Code Execution Tool]
end
subgraph "Agent Layer"
SQLAgent[SQL Agent]
ResearchAgent[Research Agent]
AutomationAgent[Automation Agent]
end
Agent Layer --> LLM Layer
Agent Layer --> Tools Layer
LLM Layer --> SharedInfra
Tools Layer --> SharedInfra
stateDiagram-v2
[*] --> LoadConfig: Start Execution
LoadConfig --> LoadTools: Load Agent Config
LoadTools --> CallLLM: Load Tool Definitions
CallLLM --> UpdateMetrics: LLM Response
UpdateMetrics --> CheckTools: Record Usage
CheckTools --> ExecuteTools: Tools Needed
CheckTools --> Success: No Tools
ExecuteTools --> CallLLM: Tool Results
Success --> [*]: Return Response
- AWS Account with appropriate permissions
- Python 3.12+
- Node.js 18+ (for CDK)
- AWS CDK CLI (
npm install -g aws-cdk
) - UV for Python dependency management
# Clone the repository
git clone https://github.com/your-org/step-functions-agent.git
cd step-functions-agent
# Install dependencies using UV
uv pip install -r requirements.txt
# Bootstrap CDK (first time only)
cdk bootstrap
# Deploy shared infrastructure first
cdk deploy SharedInfrastructureStack-prod SharedLLMStack-prod AgentRegistryStack-prod
# Deploy tools
cdk deploy DBInterfaceToolStack-prod GoogleMapsToolStack-prod
# Deploy an agent
cdk deploy SQLAgentUnifiedLLMStack-prod
Stack | Purpose | Resources |
---|---|---|
SharedInfrastructureStack |
Core infrastructure | Tool Registry, Content Storage |
SharedLLMStack |
Traditional LLM providers | Claude, OpenAI, Gemini Lambda functions |
SharedUnifiedRustLLMStack |
High-performance LLM | Rust-based unified provider |
AgentRegistryStack |
Agent configurations | DynamoDB table for agents |
Tool | Stack | Language | Purpose |
---|---|---|---|
Database | DBInterfaceToolStack |
Python | SQL database operations |
Google Maps | GoogleMapsToolStack |
TypeScript | Location services |
Microsoft Graph | MicrosoftGraphToolStack |
Python | Office 365 integration |
Web Research | WebResearchToolStack |
Python | Web scraping and research |
Code Execution | E2BToolStack |
Python | Safe code execution |
Finance | FinancialToolStack |
Python | Market data analysis |
CloudWatch | CloudWatchToolStack |
Python | AWS metrics and logs |
Agent | Stack | Tools Used | Use Case |
---|---|---|---|
SQL Agent | SQLAgentUnifiedLLMStack |
Database, Code | Data analysis & reporting |
Research Agent | WebResearchAgentUnifiedLLMStack |
Web Research, Code | Market research |
Automation Agent | TestAutomationRemoteAgentStack |
Local Execute, MS Graph | Enterprise automation |
Maps Agent | GoogleMapsAgentUnifiedLLMStack |
Google Maps | Location intelligence |
# Development environment
export ENVIRONMENT=dev
cdk deploy SharedInfrastructureStack-dev
# Production environment
export ENVIRONMENT=prod
cdk deploy SharedInfrastructureStack-prod
-
Shared Infrastructure (once per environment)
cdk deploy SharedInfrastructureStack-prod cdk deploy AgentRegistryStack-prod
-
LLM Services (choose one or both)
# Traditional multi-provider cdk deploy SharedLLMStack-prod # OR High-performance unified cdk deploy SharedUnifiedRustLLMStack-prod
-
Tools (deploy only what you need)
cdk deploy DBInterfaceToolStack-prod cdk deploy GoogleMapsToolStack-prod
-
Agents (deploy your agents)
cdk deploy SQLAgentUnifiedLLMStack-prod
from stacks.agents.modular_base_agent_unified_llm_stack import ModularBaseAgentUnifiedLLMStack
class MyCustomAgentStack(ModularBaseAgentUnifiedLLMStack):
def __init__(self, scope, construct_id, env_name="prod", **kwargs):
# Import tool ARNs
tool1_arn = Fn.import_value(f"Tool1LambdaArn-{env_name}")
tool2_arn = Fn.import_value(f"Tool2LambdaArn-{env_name}")
# Define tool configurations
tool_configs = [
{
"tool_name": "tool1",
"lambda_arn": tool1_arn,
"requires_activity": False
},
{
"tool_name": "tool2",
"lambda_arn": tool2_arn,
"requires_activity": True,
"activity_type": "human_approval"
}
]
# System prompt
system_prompt = "You are a helpful assistant..."
# Call parent constructor
super().__init__(
scope, construct_id,
agent_name="my-custom-agent",
unified_llm_arn=Fn.import_value(f"SharedUnifiedRustLLMLambdaArn-{env_name}"),
tool_configs=tool_configs,
env_name=env_name,
system_prompt=system_prompt,
**kwargs
)
lambda/tools/my-tool/
βββ index.py # Lambda handler
βββ requirements.txt # Dependencies
βββ template.yaml # SAM template (optional)
def lambda_handler(event, context):
"""
Standard tool interface
Args:
event: {
"name": "tool_name",
"id": "unique_id",
"input": { ... tool specific input ... }
}
Returns:
{
"type": "tool_result",
"tool_use_id": event["id"],
"name": event["name"],
"content": "Tool execution result"
}
"""
tool_input = event["input"]
# Tool logic here
result = perform_tool_action(tool_input)
return {
"type": "tool_result",
"tool_use_id": event["id"],
"name": event["name"],
"content": result
}
Tools are automatically registered in the Tool Registry when deployed. The registry entry includes:
{
"tool_name": "my_tool",
"description": "Tool description for LLM",
"input_schema": {
"type": "object",
"properties": { ... },
"required": [ ... ]
},
"lambda_arn": "arn:aws:lambda:...",
"created_at": "2024-01-01T00:00:00Z"
}
Provider | Models | Stack | Notes |
---|---|---|---|
Anthropic | Claude 3.5 Sonnet, Claude 3 Opus | SharedLLMStack | Best for complex reasoning |
OpenAI | GPT-4o, GPT-4o-mini | SharedLLMStack | Versatile, good for code |
Gemini 1.5 Pro, Flash | SharedLLMStack | Multimodal capabilities | |
Amazon | Nova Pro, Nova Lite | SharedUnifiedRustLLMStack | AWS native, cost-effective |
xAI | Grok 2, Grok 2 mini | SharedUnifiedRustLLMStack | Latest models |
DeepSeek | DeepSeek V3 | SharedUnifiedRustLLMStack | Specialized capabilities |
# Store API keys in AWS Secrets Manager
aws secretsmanager create-secret \
--name /ai-agent/llm-secrets/prod \
--secret-string '{
"ANTHROPIC_API_KEY": "sk-ant-...",
"OPENAI_API_KEY": "sk-...",
"GEMINI_API_KEY": "..."
}'
Agents can dynamically select models based on the task:
# In agent configuration
self.llm_provider = "anthropic" # or "openai", "google", etc.
self.llm_model = "claude-3-5-sonnet-20241022"
- Token Usage: Input/output tokens per model
- Execution Time: Agent and tool execution duration
- Error Rates: Failed executions and retries
- Cost Tracking: Estimated costs per execution
All Step Functions executions include X-Ray tracing for detailed performance analysis:
# View traces in AWS Console
AWS Console > CloudWatch > Service Map
-- CloudWatch Insights query for cost analysis
fields @timestamp, agent, model, input_tokens, output_tokens
| stats sum(input_tokens) as total_input,
sum(output_tokens) as total_output
by model
- Deployment Guide - Detailed deployment instructions
- Architecture Overview - System design and patterns
- Agent Development - Creating custom agents
- Tool Development - Building new tools
- LLM Provider Setup - Configuring providers
- Long Content Support - Handling large documents
- Human Approval Workflows - Adding approval steps
- Activity Testing - Testing remote activities
- Security Best Practices - Security considerations
- Legacy Migration - Migrating from old architecture
- Provider Migration - Switching LLM providers
We welcome contributions! Please see our Contributing Guide for details.
# Create virtual environment
uv venv
source .venv/bin/activate
# Install dev dependencies
uv pip install -r requirements-dev.txt
# Run tests
pytest
# Format code
black .
This project is licensed under the MIT License - see the LICENSE file for details.
- AWS Step Functions team for the serverless orchestration platform
- Anthropic, OpenAI, and Google for their LLM APIs
- The open-source community for various tools and libraries
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Full Documentation
Built with β€οΈ using AWS CDK and Step Functions