Enterprise-grade SAP Datasphere integration platform featuring Model Context Protocol (MCP) server for AI assistants, comprehensive metadata synchronization, and intelligent data replication to AWS services.
- 🤖 MCP Server: AI-accessible metadata operations via Model Context Protocol
- 🎯 Production Tested: Successfully syncing 14 real assets with 197K+ records
- 🚀 Three Environments: Dog (Dev), Wolf (Test), Bear (Production) architecture
- 🔄 Intelligent Replication: User-controlled selective data replication to AWS S3 Tables
- 🏗️ Enterprise Architecture: Scalable, robust, production-ready with Apache Iceberg
- 📊 Live Monitoring: Real-time job tracking and system health
- 🧠 AI Integration: Claude, Cursor, and other AI assistants ready
search_metadata- Search assets across Datasphere and AWS Glue with business contextdiscover_spaces- OAuth-enabled discovery of all Datasphere spacesget_asset_details- Detailed asset information with schema and lineagetrigger_sync- Initiate metadata synchronization operationsexplore_data_lineage- Trace data relationships and dependenciesget_sync_status- Monitor synchronization health and performance
- Claude Desktop - Full MCP integration with configuration examples
- Cursor IDE - Native MCP support for development workflows
- Custom AI Tools - Standard MCP protocol for any AI assistant
- User-Controlled Selection - Choose specific assets for replication
- Apache Iceberg Format - ACID transactions and schema evolution
- AWS S3 Tables - Serverless analytics-ready storage
- Real-time Progress - Live monitoring with detailed status updates
- Data Validation - Comprehensive quality checks and business rule validation
- Federation Pattern - Real-time queries from AWS to SAP Datasphere
- Replication Pattern - Data movement to AWS S3 Tables with Glue ETL
- Direct Query Pattern - On-demand access without data movement
# Required
Python 3.10+
SAP Datasphere account with OAuth application
AWS account with Glue and S3 Tables permissions
# Optional for AI Integration
Claude Desktop or Cursor IDE# 1. Clone the repository
git clone https://github.com/MarioDeFelipe/sap-datasphere-mcp.git
cd sap-datasphere-mcp
# 2. Install dependencies
pip install -r requirements.txt
# 3. Configure MCP Server for AI Assistants
python mcp_server_config.py
# 4. Start MCP Server (for AI integration)
python start_mcp_server.py --environment dog
# 5. Start Web Dashboard (for manual management)
python web_dashboard.py- 🤖 MCP Server: Available for Claude Desktop and Cursor IDE
- 🌐 Web Dashboard: http://localhost:8001 (Dog), http://localhost:5000 (Wolf)
- ☁️ Production API: https://krb7735xufadsj233kdnpaabta0eatck.lambda-url.us-east-1.on.aws
- 📚 API Docs: http://localhost:8001/docs
🐕 DOG Environment (Development) 🐺 WOLF Environment (Testing) 🐻 BEAR Environment (Production)
├── FastAPI Web Dashboard ├── FastAPI Application ├── AWS Lambda Serverless
├── Port: 8001 ├── Port: 5000 ├── Auto-scaling
├── Real SAP Integration ├── Production-like Testing ├── Enterprise Monitoring
└── Hot-reload Development └── Performance Benchmarking └── High Availability
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ AI Assistant │◄──►│ MCP Server │◄──►│ SAP Datasphere │
│ (Claude, Cursor)│ │ │ │ (OAuth 2.0) │
└─────────────────┘ │ • Metadata Ops │ └─────────────────┘
│ • Asset Discovery│
│ • Sync Control │ ┌─────────────────┐
│ • Lineage Trace │◄──►│ AWS Services │
└──────────────────┘ │ • S3 Tables │
│ • Glue ETL │
│ • Data Catalog │
└─────────────────┘
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ SAP Datasphere │ │ Glue ETL Jobs │ │ AWS S3 Tables │
│ │ │ │ │ │
│ • OData APIs │───►│ • Spark Engine │───►│ • Apache Iceberg│
│ • OAuth Auth │ │ • Schema Mapping │ │ • ACID Txns │
│ • CSDL Metadata │ │ • Data Transform │ │ • Query Ready │
└─────────────────┘ └──────────────────┘ └─────────────────┘
sap_datasphere_mcp_server.py: Model Context Protocol server for AI integration- OAuth 2.0 authentication with SAP Datasphere
- Unified metadata search across Datasphere and AWS Glue
- Business context preservation and lineage tracking
comprehensive_asset_discovery_and_sync.py: User-controlled selective replication- Apache Iceberg integration with AWS S3 Tables
- Glue ETL jobs for scalable data processing
- Real-time progress monitoring and validation
enhanced_datasphere_connector.py: OAuth 2.0, enhanced API accessenhanced_glue_connector.py: Rich metadata, business context preservationenhanced_metadata_extractor.py: CSDL metadata and business annotations
sync_orchestrator.py: Multi-threaded job processingmetadata_sync_core.py: Core synchronization logicasset_mapper.py: Cross-system asset mapping and transformation
web_dashboard.py: FastAPI server with WebSocket support- Three-environment deployment (Dog/Wolf/Bear)
- Real-time replication monitoring and job management
🤖 AI Assistant Integration → Claude Desktop, Cursor IDE ready
🔍 Metadata Search → Unified search across SAP and AWS
📋 Asset Discovery → OAuth-enabled space and asset discovery
🔄 Sync Management → AI-controlled synchronization operations
📈 Lineage Exploration → Trace data relationships and dependencies
💼 Business Context → Rich metadata with governance information
📊 SAP_SC_FI_T_Products → 2.5M records (replicated to S3 Tables)
📅 Time Dimension Table → 197,136 records
🏷️ Product Categories → 222 records
👥 Customer Data → Multiple tables with business context
📈 Analytical Models → Financial & operational with hierarchies
🏢 Datasphere Spaces → SAP_CONTENT, SAP_SC_FI_AM, SAP_SC_HR_AM
- ⚡ MCP Response Time: Sub-100ms for AI assistant queries
- 🔄 Concurrent Operations: Up to 10 simultaneous MCP requests
- 📈 Replication Throughput: 2.5M records via Glue ETL jobs
- 🛡️ Reliability: 99.9% uptime with auto-recovery and OAuth refresh
Add to your Claude Desktop mcp.json configuration:
{
"mcpServers": {
"sap-datasphere": {
"command": "python",
"args": ["start_mcp_server.py", "--environment", "dog"],
"cwd": "/path/to/sap-datasphere-mcp",
"env": {
"MCP_ENVIRONMENT": "dog",
"SAP_CLIENT_ID": "your_oauth_client_id",
"SAP_CLIENT_SECRET": "your_oauth_client_secret"
}
}
}
}Once configured, you can ask your AI assistant:
"List all SAP Datasphere spaces and their assets"
"Search for tables containing customer data"
"Show me the schema for SAP_SC_FI_T_Products"
"What's the sync status between Datasphere and AWS?"
"Trigger a high-priority sync for financial data assets"
"Explore the data lineage for the sales analytics model"
Add to your Cursor settings for development workflows:
{
"mcp.servers": {
"sap-datasphere": {
"command": ["python", "start_mcp_server.py"],
"args": ["--environment", "dog"],
"env": {
"MCP_ENVIRONMENT": "dog"
}
}
}
}# Configure MCP server for your environment
python mcp_server_config.py
# Available environments:
# - dog: Development (localhost:8001)
# - wolf: Testing (localhost:5000)
# - bear: Production (AWS Lambda){
"base_url": "https://your-tenant.eu20.hcs.cloud.sap",
"client_id": "your-oauth-client-id",
"client_secret": "your-oauth-client-secret",
"token_url": "https://your-tenant.authentication.eu20.hana.ondemand.com/oauth/token",
"redirect_uri": "http://localhost:8080/callback"
}{
"region": "us-east-1",
"s3_tables_bucket": "sap-datasphere-s3-tables",
"glue_database": "sap_datasphere_s3_tables",
"glue_job_role": "GlueServiceRole-SAP-Replication"
}# Example: Replicate SAP_SC_FI_T_Products to S3 Tables
{
"source_asset": "SAP_SC_FI_T_Products",
"target_format": "ICEBERG",
"partition_strategy": "BY_DATE_AND_COMPANY",
"replication_mode": "INCREMENTAL",
"data_validation": true
}search_metadata(query, asset_types, source_systems) # Search across systems
discover_spaces(include_assets, force_refresh) # OAuth space discovery
get_asset_details(asset_id, source_system) # Detailed asset info
get_sync_status(asset_id, detailed) # Sync monitoring
explore_data_lineage(asset_id, direction, max_depth) # Lineage tracing
trigger_sync(asset_ids, priority, dry_run) # Sync controlGET /api/assets # List all discovered assets
POST /api/replicate/start # Start data replication job
GET /api/replicate/status/{job_id} # Get replication progress
GET /api/replicate/logs/{job_id} # Get live replication logs
POST /api/replicate/cancel/{job_id} # Cancel replication jobGET /api/status # System health check
GET /api/metrics # Performance metrics
WS /ws # WebSocket for real-time updates- 🔐 OAuth 2.0: Secure SAP Datasphere authentication
- 🛡️ AWS IAM: Role-based AWS access control
- 🔒 HTTPS/TLS: Encrypted communications
- 📝 Audit Logging: Complete operation audit trails
- 🔑 Token Management: Automatic refresh and rotation
- Natural Language Queries: Ask AI assistants about your data assets
- Intelligent Recommendations: AI-guided integration pattern selection
- Automated Documentation: AI-generated data catalogs and lineage
- Selective Replication: User-controlled data movement to AWS S3 Tables
- Real-time Federation: Direct queries from AWS to SAP Datasphere
- Hybrid Analytics: Unified analytics across SAP and AWS platforms
- Business Context Preservation: Maintain rich metadata across systems
- Automated Classification: AI-powered data classification and tagging
- Compliance Tracking: Complete audit trails and governance workflows
sap-datasphere-mcp/
├── 📁 .kiro/ # Kiro specs and steering rules
│ └── specs/sap-aws-data-sync/ # Comprehensive project specifications
├── 📁 config/ # Configuration files
├── 📁 templates/ # Web UI templates
├── 📁 tests/ # Unit and integration tests
├── 📄 sap_datasphere_mcp_server.py # MCP server for AI integration
├── 📄 start_mcp_server.py # MCP server launcher
├── 📄 comprehensive_asset_discovery_and_sync.py # Data replication engine
├── 📄 enhanced_datasphere_connector.py # OAuth-enabled SAP connector
├── 📄 enhanced_glue_connector.py # Rich metadata AWS connector
├── 📄 web_dashboard.py # Multi-environment web dashboard
├── 📄 sync_orchestrator.py # Job orchestration engine
├── 📄 metadata_sync_core.py # Core synchronization logic
└── 📄 requirements.txt # Dependencies
# MCP Server tests
python test_mcp_server.py --environment dog
# Integration tests with real APIs
python test_enhanced_glue_integration.py
python test_comprehensive_saml2_bearer_validation.py
# Data replication tests
python test_real_asset_discovery.py
python comprehensive_interactive_test.py
# End-to-end validation
python test_sync_orchestrator.py- 🤖 AI Request Tracking: Monitor MCP tool usage and performance
- 📊 OAuth Token Management: Automatic refresh and expiration tracking
- 🔍 Cache Performance: Hit/miss rates and optimization metrics
- 📝 Audit Logs: Complete AI assistant interaction history
- 🔄 Real-time Progress: Live job status with WebSocket updates
- 📈 Throughput Metrics: Records per second and data volume tracking
- 🛡️ Data Validation: Quality checks and business rule compliance
- 🚨 Error Handling: Automatic retry with exponential backoff
- AWS CloudWatch: Native monitoring for Lambda and Glue jobs
- Prometheus: Metrics export for MCP server performance
- Grafana: Custom dashboards for replication and sync metrics
- ELK Stack: Centralized logging for all components
- CSDL Metadata Extraction: Complete OData schema definitions
- Business Context Preservation: Rich annotations and governance information
- Multi-language Support: Global deployment with localized metadata
- Hierarchical Relationships: Preserve analytical model structures
- Apache Iceberg Integration: ACID transactions and schema evolution
- Glue ETL Automation: Spark-based scalable data processing
- Real-time Validation: Comprehensive data quality and business rule checks
- Incremental Synchronization: Efficient change detection and processing
- Natural Language Queries: Ask questions about your data in plain English
- Integration Pattern Recommendations: AI-guided federation vs replication decisions
- Automated Documentation: AI-generated data catalogs and lineage diagrams
- Intelligent Error Resolution: AI-assisted troubleshooting and optimization
We welcome contributions! This project uses Kiro for AI-assisted development.
# Fork and clone the repository
git clone https://github.com/MarioDeFelipe/sap-datasphere-mcp.git
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install development dependencies
pip install -r requirements-dev.txt
# Configure MCP server for development
python mcp_server_config.py
# Run comprehensive tests
python test_mcp_server.py --environment dog- MCP Tools: Add new AI-accessible operations
- Data Connectors: Enhance SAP and AWS integrations
- Replication Patterns: Implement new integration strategies
- AI Agents: Develop specialized data integration assistants
This project is licensed under the MIT License - see the LICENSE file for details.
- Model Context Protocol for enabling AI assistant integration
- SAP Datasphere Team for comprehensive API capabilities
- AWS Glue & S3 Tables Teams for robust analytics infrastructure
- Apache Iceberg Community for ACID-compliant data lake format
- FastAPI Community for the excellent web framework
- Kiro AI Assistant for accelerating development workflows
- 📚 Documentation: MCP Server Guide
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
- 📖 SAP Datasphere Docs: Official Documentation
- 🤖 MCP Protocol: Model Context Protocol
- Enhanced AI Agents: Specialized agents for different integration patterns
- Vector Database Integration: Semantic search across metadata
- Real-time Event Streaming: Live data change notifications
- Advanced Lineage Visualization: Interactive data flow diagrams
- Multi-Cloud Support: Azure Synapse, Google BigQuery integration
- Machine Learning Integration: Predictive data quality and optimization
- Enterprise Governance: Advanced compliance and audit capabilities
- Self-Service Analytics: Business user-friendly data discovery