Enterprise-grade FastAPI application that seamlessly integrates with AWS SageMaker deployed Hugging Face models for advanced question-answering tasks. Built with modern Python technologies and cloud-native architecture.
- Overview
- Features
- Architecture
- Tech Stack
- Quick Start
- Installation
- Configuration
- API Documentation
- Usage Examples
- Model Details
- Testing
- Deployment
- Troubleshooting
- Contributing
- License
FastAPI SageMaker Integration is a production-ready solution that bridges the gap between modern web APIs and AWS SageMaker's powerful machine learning capabilities. This application provides a robust, scalable interface for deploying and consuming Hugging Face models through SageMaker endpoints.
- π Seamless Integration: Direct connection to AWS SageMaker endpoints
- β‘ High Performance: FastAPI's async capabilities for optimal throughput
- π€ AI-Powered: Advanced question-answering with DistilBERT
- π Enterprise Ready: Production-grade security and monitoring
- π Batch Processing: Support for both single and batch predictions
- π‘οΈ Robust Error Handling: Comprehensive error management and logging
| Feature | Description | Status |
|---|---|---|
| π SageMaker Integration | Direct connection to AWS SageMaker endpoints | β Implemented |
| π€ Question-Answering | Advanced QA with DistilBERT model | β Implemented |
| π Batch Predictions | Efficient batch processing capabilities | β Implemented |
| π Model Information | Real-time model status and metadata | β Implemented |
| π‘οΈ Input Validation | Pydantic models for request/response validation | β Implemented |
| π CORS Support | Cross-origin resource sharing for web apps | β Implemented |
| Feature | Description | Status |
|---|---|---|
| β‘ Async Processing | Non-blocking API operations | β Implemented |
| π Comprehensive Logging | Detailed request/response logging | β Implemented |
| π AWS Authentication | Secure credential management | β Implemented |
| π Health Monitoring | Endpoint health checks and status | β Implemented |
| π Error Recovery | Graceful error handling and recovery | β Implemented |
| π Performance Metrics | Processing time and performance tracking | β Implemented |
- π Authentication & Authorization - JWT-based security
- π Advanced Analytics - Prediction analytics dashboard
- π Model Versioning - Support for multiple model versions
- β‘ Caching Layer - Redis caching for improved performance
- π± WebSocket Support - Real-time streaming capabilities
- π Rate Limiting - API rate limiting and throttling
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Client Apps β β FastAPI App β β AWS SageMaker β
β (Web/Mobile) βββββΊβ (Backend) βββββΊβ (ML Endpoint) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β HTTP Client β β SageMaker β β Hugging Face β
β (REST API) β β Client β β DistilBERT β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
-
Request Processing
Client Request β FastAPI β Input Validation β SageMaker Client -
Model Inference
Validated Input β SageMaker Endpoint β Hugging Face Model β Prediction -
Response Generation
Model Output β Response Processing β Validation β Client Response -
Error Handling
Any Error β Logging β Error Response β Client Notification
| Technology | Version | Purpose |
|---|---|---|
| FastAPI | 0.104.1 | High-performance web framework for building APIs |
| Uvicorn | 0.24.0 | ASGI server for production deployment |
| Boto3 | 1.34.0 | AWS SDK for Python |
| Pydantic | 2.5.0 | Data validation and settings management |
| Python-multipart | 0.0.6 | File upload support |
| Requests | 2.31.0 | HTTP library for API calls |
| NumPy | 1.26.0+ | Numerical computing |
| Pandas | Latest | Data manipulation |
| Service | Purpose |
|---|---|
| AWS SageMaker | Machine learning model hosting and inference |
| AWS IAM | Identity and access management |
| AWS CloudWatch | Monitoring and logging |
| AWS S3 | Model artifacts and data storage |
| Technology | Purpose |
|---|---|
| Docker | Containerization for consistent deployment |
| Environment Variables | Secure configuration management |
| Git | Version control and collaboration |
| AWS CLI | Command-line interface for AWS |
fastapi_sagemaker/
βββ π aws/ # AWS configuration files
βββ π main.py # FastAPI application entry point
βββ π models.py # Pydantic models for validation
βββ π sagemaker_client.py # SageMaker integration client
βββ π requirements.txt # Python dependencies
βββ π§ env.example # Environment variables template
βββ π§ͺ test_example.py # Comprehensive test suite
βββ π .gitignore # Git ignore rules
βββ π README.md # Project documentation
Before you begin, ensure you have the following:
- Python 3.8+ - Download Python
- AWS Account - Create AWS Account
- AWS CLI - Install AWS CLI
- Git - Download Git
You'll need the following AWS resources:
- SageMaker Endpoint - Deployed Hugging Face model
- IAM Role - SageMaker execution role with proper permissions
- AWS Credentials - Access key and secret key
- S3 Bucket - For model artifacts (optional)
git clone https://github.com/your-username/fastapi_sagemaker.git
cd fastapi_sagemaker# Create virtual environment
python -m venv .venv
# Activate virtual environment
# On Windows:
.venv\Scripts\activate
# On macOS/Linux:
source .venv/bin/activate# Install Python dependencies
pip install -r requirements.txt# Copy environment template
cp env.example .env
# Edit .env with your configuration
nano .env# Configure AWS CLI
aws configure
# Or set environment variables
export AWS_ACCESS_KEY_ID=your-access-key
export AWS_SECRET_ACCESS_KEY=your-secret-key
export AWS_DEFAULT_REGION=eu-north-1# Development mode
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Production mode
uvicorn main:app --host 0.0.0.0 --port 8000- API: http://localhost:8000
- Interactive Docs: http://localhost:8000/docs
- ReDoc Documentation: http://localhost:8000/redoc
- Health Check: http://localhost:8000/health
Edit the .env file with your specific configuration:
# SageMaker Configuration
SAGEMAKER_ENDPOINT_NAME=huggingface-pytorch-inference-2025-06-27-04-56-19-392
AWS_REGION=eu-north-1
MODEL_NAME=distilbert-base-uncased-distilled-squad
# AWS Credentials
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
# Hugging Face Model Configuration
MODEL_TYPE=question-answering
HF_MODEL_ID=distilbert-base-uncased-distilled-squad
HF_TASK=question-answering
# SageMaker Session Details
SAGEMAKER_BUCKET=your-sagemaker-bucket
SAGEMAKER_ROLE_ARN=your-sagemaker-execution-role
# Application Configuration
DEBUG=True
HOST=0.0.0.0
PORT=8000
# Logging Configuration
LOG_LEVEL=INFO
# CORS Configuration
ALLOWED_ORIGINS=http://localhost:3000,https://your-frontend-domain.com-
Deploy Hugging Face Model
import sagemaker from sagemaker.huggingface import HuggingFaceModel # Create Hugging Face model huggingface_model = HuggingFaceModel( model_data='s3://your-bucket/model.tar.gz', role='your-sagemaker-role', transformers_version='4.26.0', pytorch_version='1.13.1', py_version='py39', ) # Deploy model predictor = huggingface_model.deploy( initial_instance_count=1, instance_type='ml.m5.large', endpoint_name='huggingface-pytorch-inference-2025-06-27-04-56-19-392' )
-
Configure IAM Permissions
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "sagemaker:InvokeEndpoint", "sagemaker:DescribeEndpoint" ], "Resource": "*" } ] }
Currently, the API uses AWS IAM authentication. In production, implement additional API key authentication.
GET /health
Check application and SageMaker connection status.
Response:
{
"status": "healthy",
"sagemaker_configured": true,
"timestamp": "2024-01-01T00:00:00Z"
}POST /predict
Make a single question-answering prediction.
Request Body:
{
"data": {
"question": "Which name is also used to describe the Amazon rainforest in English?",
"context": "The Amazon rainforest (Portuguese: Floresta AmazΓ΄nica or AmazΓ΄nia; Spanish: Selva AmazΓ³nica, AmazonΓa or usually Amazonia; French: ForΓͺt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America."
},
"request_id": "req-123"
}Response:
{
"prediction": {
"answer": "Amazonia",
"score": 0.9540701508522034,
"start": 201,
"end": 209
},
"model_name": "distilbert-base-uncased-distilled-squad",
"request_id": "req-123",
"error": null,
"timestamp": "2024-01-01T00:00:00Z",
"processing_time_ms": 150.5
}POST /predict/batch
Make batch question-answering predictions.
Request Body:
[
{
"data": {
"question": "What is machine learning?",
"context": "Machine learning is a subset of artificial intelligence that enables computers to learn and make decisions from data without being explicitly programmed."
},
"request_id": "req-1"
},
{
"data": {
"question": "Where is the Eiffel Tower located?",
"context": "The Eiffel Tower is a wrought-iron lattice tower located on the Champ de Mars in Paris, France."
},
"request_id": "req-2"
}
]GET /model/info
Get information about the deployed SageMaker model.
Response:
{
"model_name": "distilbert-base-uncased-distilled-squad",
"endpoint_name": "huggingface-pytorch-inference-2025-06-27-04-56-19-392",
"region": "eu-north-1",
"status": "InService",
"instance_type": "ml.m5.large",
"creation_time": "2024-01-01T00:00:00Z"
}import requests
import json
# Single prediction
def predict_question(question, context):
url = "http://localhost:8000/predict"
payload = {
"data": {
"question": question,
"context": context
},
"request_id": "python-client-1"
}
response = requests.post(url, json=payload)
return response.json()
# Example usage
result = predict_question(
"What is the capital of France?",
"Paris is the capital of France and is known for the Eiffel Tower."
)
print(f"Answer: {result['prediction']['answer']}")
print(f"Confidence: {result['prediction']['score']:.2f}")# Health check
curl http://localhost:8000/health
# Single prediction
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"data": {
"question": "What is AI?",
"context": "Artificial Intelligence (AI) is the simulation of human intelligence in machines."
},
"request_id": "curl-test"
}'
# Batch prediction
curl -X POST "http://localhost:8000/predict/batch" \
-H "Content-Type: application/json" \
-d '[
{
"data": {
"question": "What is Python?",
"context": "Python is a high-level programming language known for its simplicity."
},
"request_id": "batch-1"
}
]'const axios = require('axios');
async function predictQuestion(question, context) {
try {
const response = await axios.post('http://localhost:8000/predict', {
data: {
question: question,
context: context
},
request_id: 'js-client-1'
});
return response.data;
} catch (error) {
console.error('Prediction failed:', error.response.data);
throw error;
}
}
// Example usage
predictQuestion(
'What is machine learning?',
'Machine learning is a subset of artificial intelligence.'
)
.then(result => {
console.log('Answer:', result.prediction.answer);
console.log('Confidence:', result.prediction.score);
});This integration is specifically configured for:
| Parameter | Value |
|---|---|
| Model | distilbert-base-uncased-distilled-squad |
| Task | Question-Answering |
| Framework | Hugging Face Transformers |
| Deployment | SageMaker Inference Endpoint |
| Region | eu-north-1 |
| Endpoint | huggingface-pytorch-inference-2025-06-27-04-56-19-392 |
- Question-Answering: Extract answers from context passages
- Confidence Scoring: Probability scores for answer confidence
- Position Tracking: Start and end positions of answers
- Context Understanding: Deep understanding of text context
The model accepts two input formats:
{
"data": {
"question": "Your question here?",
"context": "The context passage where the answer can be found."
}
}{
"data": {
"inputs": {
"question": "Your question here?",
"context": "The context passage where the answer can be found."
}
}
}{
"prediction": {
"answer": "The extracted answer text",
"score": 0.95,
"start": 10,
"end": 25
}
}# Run comprehensive tests
python test_example.py# Test health endpoint
curl http://localhost:8000/health
# Test single prediction
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"data": {
"question": "What is the capital of France?",
"context": "Paris is the capital of France."
}
}'
# Test batch prediction
curl -X POST "http://localhost:8000/predict/batch" \
-H "Content-Type: application/json" \
-d '[
{
"data": {
"question": "What is AI?",
"context": "Artificial Intelligence is a field of computer science."
}
}
]'# Load testing with Apache Bench
ab -n 100 -c 10 -T application/json -p test_data.json http://localhost:8000/predict
# Or use Python for custom load testing
python -c "
import asyncio
import aiohttp
import time
async def load_test():
async with aiohttp.ClientSession() as session:
start_time = time.time()
tasks = []
for i in range(100):
task = session.post('http://localhost:8000/predict', json={
'data': {
'question': f'Test question {i}?',
'context': 'This is a test context for load testing.'
}
})
tasks.append(task)
responses = await asyncio.gather(*tasks)
end_time = time.time()
print(f'Processed {len(responses)} requests in {end_time - start_time:.2f} seconds')
asyncio.run(load_test())
"- Create Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]- Build and Run
# Build image
docker build -t fastapi-sagemaker .
# Run container
docker run -p 8000:8000 --env-file .env fastapi-sagemakerversion: '3.8'
services:
fastapi-sagemaker:
build: .
ports:
- "8000:8000"
environment:
- SAGEMAKER_ENDPOINT_NAME=${SAGEMAKER_ENDPOINT_NAME}
- AWS_REGION=${AWS_REGION}
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
volumes:
- ./logs:/app/logs- EC2 Deployment
# Launch EC2 instance
aws ec2 run-instances \
--image-id ami-0c02fb55956c7d316 \
--instance-type t3.medium \
--key-name your-key-pair \
--security-group-ids sg-12345678
# Install dependencies
sudo yum update -y
sudo yum install python3 pip git -y
# Clone and setup application
git clone https://github.com/your-username/fastapi_sagemaker.git
cd fastapi_sagemaker
pip3 install -r requirements.txt
# Configure environment
cp env.example .env
# Edit .env with your settings
# Run application
python3 -m uvicorn main:app --host 0.0.0.0 --port 8000- ECS Deployment
# Create ECS cluster
aws ecs create-cluster --cluster-name fastapi-sagemaker
# Create task definition
aws ecs register-task-definition --cli-input-json file://task-definition.json
# Create service
aws ecs create-service \
--cluster fastapi-sagemaker \
--service-name fastapi-sagemaker-service \
--task-definition fastapi-sagemaker:1 \
--desired-count 2- Environment Variables: Use AWS Systems Manager Parameter Store
- Authentication: Implement API key or JWT authentication
- Rate Limiting: Add rate limiting to prevent abuse
- Monitoring: Set up CloudWatch monitoring and alarms
- HTTPS: Use Application Load Balancer with SSL certificate
- Auto Scaling: Configure auto-scaling based on CPU/memory usage
- Logging: Centralized logging with CloudWatch Logs
- Security: Use VPC and security groups for network isolation
Issue: NoCredentialsError: Unable to locate credentials
# Solution: Configure AWS credentials
aws configure
# Or set environment variables
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secretIssue: AccessDenied: User is not authorized to perform: sagemaker:InvokeEndpoint
# Solution: Add SageMaker permissions to IAM user/role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"sagemaker:InvokeEndpoint",
"sagemaker:DescribeEndpoint"
],
"Resource": "*"
}
]
}Issue: Endpoint not found
# Solution: Verify endpoint name and region
aws sagemaker list-endpoints --region eu-north-1
# Check endpoint status
aws sagemaker describe-endpoint --endpoint-name your-endpoint-nameIssue: Endpoint not in service
# Solution: Wait for endpoint to be ready
aws sagemaker wait endpoint-in-service --endpoint-name your-endpoint-nameIssue: ModuleNotFoundError: No module named 'fastapi'
# Solution: Install dependencies
pip install -r requirements.txtIssue: Address already in use
# Solution: Use different port
uvicorn main:app --port 8001
# Or kill existing process
lsof -ti:8000 | xargs kill -9Enable debug logging:
import logging
logging.basicConfig(level=logging.DEBUG)Test endpoint connectivity:
import boto3
runtime = boto3.client('sagemaker-runtime', region_name='eu-north-1')
response = runtime.invoke_endpoint(
EndpointName='your-endpoint-name',
ContentType='application/json',
Body='{"inputs": {"question": "test", "context": "test"}}'
)
print(response['Body'].read().decode())Check application logs:
# Application logs
tail -f logs/app.log
# SageMaker logs
aws logs describe-log-groups --log-group-name-prefix /aws/sagemakerWe welcome contributions! Please follow these guidelines:
-
Fork the Repository
git clone https://github.com/your-username/fastapi_sagemaker.git cd fastapi_sagemaker -
Create Feature Branch
git checkout -b feature/amazing-feature
-
Make Changes
- Follow PEP 8 for Python code
- Write tests for new features
- Update documentation
-
Test Your Changes
python test_example.py
-
Commit and Push
git commit -m "Add amazing feature" git push origin feature/amazing-feature -
Create Pull Request
- Open a pull request on GitHub
- Provide clear description of changes
- Include tests and documentation updates
- Code Style: Follow PEP 8 (Python)
- Testing: Write unit tests for new features
- Documentation: Update README and API docs
- Commits: Use conventional commit messages
- Reviews: All PRs require code review
This project is licensed under the MIT License - see the LICENSE file for details.
- FastAPI for the web framework
- AWS SageMaker for ML model hosting
- Hugging Face for pre-trained models
- Pydantic for data validation
- Uvicorn for ASGI server
- Boto3 for AWS SDK
- Documentation: Check this README and API docs
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Discord: Join our Discord server
- Twitter: Follow @FastAPISageMaker
- Blog: Read our blog posts
For enterprise customers, we offer:
- Priority Support: 24/7 technical support
- Custom Development: Tailored features and integrations
- Training: Team training and workshops
- Consulting: Architecture and deployment guidance
Contact us at: enterprise@fastapi-sagemaker.com
Made with β€οΈ by the FastAPI SageMaker Team
Empowering AI applications with enterprise-grade SageMaker integration