Transform natural language queries into production-ready Apache Flink SQL jobs with AI-powered validation, deployment, and continuous learning that gets smarter with every successful job.
Flinkenstein is an intelligent streaming data platform that combines:
- Natural Language Processing: Convert business questions to Flink SQL
- AI-Powered Validation: Automatic SQL syntax and schema validation
- Click-to-Deploy: Once validated, push SQL directly to Flink and preview with one click
- Streaming Analytics: Real-time data processing with Apache Flink
- Multi-Domain Data: Retail, Financial Services, and Fraud Detection scenarios
- RAG-Enhanced AI: Context-aware SQL generation using knowledge base
- Continuous Learning: System gets smarter with every successful Flink job
Flinkenstein includes a comprehensive RAG (Retrieval-Augmented Generation) knowledge base stored in Qdrant that provides:
- Flink SQL Patterns: Common streaming analytics patterns and examples
- Schema Information: Topic structures and data formats for all domains
- Best Practices: Performance tips, windowing strategies, and join patterns
- Error Solutions: Common Flink errors and their resolutions
- Domain Context: Business logic and relationships for retail, fraud, and crossover scenarios
The knowledge base is automatically populated when you run the demo command, ensuring your AI has the context needed to generate accurate, production-ready Flink SQL.
-
Comprehensive Flink SQL Knowledge (
flinkenstein-knowledge)- 23 advanced patterns covering windowing, joins, CTEs, UDFs
- Complex query patterns for time-filtering, multi-topic joins, aggregations
- Top 10 Flink error prevention patterns
- Advanced use case patterns (anomaly detection, pattern recognition)
- Demo flow patterns (Phase 1-3 progression)
-
Topic Metadata (
flinkenstein-topic-metadata)- 22 metadata items covering all topics in your system
- Source topic details: fields, descriptions, use cases, schemas
- Sink topic information: output patterns, notes
- Topic categorization: retail, fraud, crossover
- Schema availability and status
-
Cluster-Specific Learning (
flinkenstein-cluster-*)- Dynamically created collections for each session
- Successful Flink SQL statements with full context
- Job validation results and sink topic data
- Environment-specific patterns that improve over time
Flinkenstein gets smarter with every successful Flink job!
The system continuously learns from your environment through success reinforcement:
- Deploy Flink SQL through the natural language interface
- Automatic Validation monitors job status and data flow
- Success Learning captures successful SQL when:
- ✅ Flink job is running successfully
- ✅ Sink topic contains actual data
- ✅ No errors or failures detected
- Knowledge Accumulation saves successful patterns to cluster-specific collections
- Persistent Learning preserves all learning across restarts
Each environment you point Flinkenstein at becomes increasingly intelligent:
- First Run: 44 generic Flink SQL patterns + dynamic topic discovery
- After Successes: Generic patterns + environment-specific successful SQL
- Over Time: System learns your exact topics, schemas, and data patterns
- Portability: Take your learned intelligence to new clusters
- Higher Success Rate: Each successful job improves future SQL generation
- Environment Awareness: System learns your specific Kafka topics and schemas
- Pattern Recognition: Identifies what works in your data environment
- Continuous Improvement: No manual intervention needed - learns automatically
Startup → Load Generic Patterns (23) + Topic Metadata (22) + Restore Cluster Learning
↓
Runtime → Monitor Flink Jobs + Validate Success + Learn from Success
↓
Teardown → Backup All Collections (Generic + Topic + Cluster Learning)
↓
Next Startup → Restore All Learning + Continue Improving
All learning data is automatically preserved across demo sessions:
- Automatic Backup:
teardown-with-backupcommand backs up all Qdrant collections - Learning Restoration:
restore-learningcommand restores previous learning - Continuous Improvement: Each session builds upon previous learning
- No Data Loss: Learning accumulates over time, improving success rates
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │ │ Backend │ │ Flink Cluster │
│ (React) │◄──►│ (Node.js) │◄──►│ + Kafka │
│ Port: 5173 │ │ Port: 3001 │ │ + Schema Reg │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Monaco Editor │ │ LLM Service │ │ Qdrant │
│ SQL Editor │ │ (Gemini/ │ │ Knowledge │
│ + Validation │ │ Claude/GPT) │ │ Base │
└─────────────────┘ └─────────────────┘ └─────────────────┘
- Flox installed and configured (for package management)
- Docker Desktop running
- Git
git clone https://github.com/8BitTacoSupreme/flinkenstein
cd flinkenstein# Activate Flox environment and start everything
flox activate
demoThat's it! The demo command will automatically:
- Set up the environment (directories, env files, scripts)
- Start all infrastructure (Kafka, Flink, Qdrant, Schema Registry)
- Populate comprehensive Flink SQL knowledge base (23 patterns)
- Populate topic metadata (22 items)
- Start the backend service
- Start the frontend service
- Generate demo data for all domains
- Verify everything is working
🧠 Knowledge Base: The system now loads both generic Flink SQL patterns AND topic-specific metadata, giving your LLM comprehensive context for better SQL generation.
💡 Pro tip: Run help to see all available commands!
# Setup environment first
setup
# Start infrastructure manually
start
# Start backend manually
dev-backend
# Start frontend manually
dev-frontend
# Run demo manually
demo- Frontend: http://localhost:5173
- Flink UI: http://localhost:8082
- Kafka Control Center: http://localhost:9021
- Schema Registry: http://localhost:8081
- Customer lifecycle analysis
- Order status tracking
- Inventory management
- Return pattern analysis
- Customer feedback insights
- Transaction risk scoring
- Behavioral pattern analysis
- KYC compliance monitoring
- Device fingerprinting
- Risk alert generation
- Payment fraud detection
- Account takeover detection
- Suspicious order patterns
- Return fraud analysis
As you use Flinkenstein, the system becomes increasingly intelligent:
- First Query: System uses generic patterns + discovers your topics
- Successful Jobs: System learns what works in your environment
- Repeated Patterns: System suggests optimizations based on learned success
- Environment Mastery: System becomes an expert on your specific data landscape
Create .env files in each service directory:
Backend (backend/.env):
PORT=3001
NODE_ENV=development
FLINK_SQL_GATEWAY_URL=http://localhost:8083
KAFKA_INTERNAL_BOOTSTRAP_SERVERS=broker:9092
KAFKA_BOOTSTRAP_SERVERS=localhost:19092
SCHEMA_REGISTRY_URL=http://localhost:8081
RUNNING_IN_DOCKER=false
# LLM API Keys (optional - can be set per request)
OPENAI_API_KEY=
GOOGLE_API_KEY=
ANTHROPIC_API_KEY=The system supports multiple AI providers:
- Google Gemini: Fast, reliable SQL generation
- Anthropic Claude: High-quality, context-aware responses
- OpenAI GPT: Versatile, well-tested performance
API keys can be configured per request in the frontend.
The system includes three data generators:
-
Retail Data (
scripts/generate-retail-data.js)- Customers, products, orders, returns, feedback, inventory
- Referential integrity and business logic
-
Fraud Data (
scripts/generate-fraud-data.js)- Transactions, risk alerts, compliance events, KYC
- FSI-specific fraud patterns
-
Crossover Data (
scripts/generate-crossover-data.js)- Retail-specific fraud detection
- Combines retail and fraud domains
flinkenstein/
├── backend/ # Node.js API server
├── frontend/ # React application
├── infrastructure/ # Docker Compose setup
├── scripts/ # Data generation & utilities
└── flox.nix # Flox environment definition
# Infrastructure
./scripts/flox-scripts.sh start # Start Docker services
./scripts/flox-scripts.sh stop # Stop Docker services
./scripts/flox-scripts.sh restart # Restart Docker services
# Development
./scripts/flox-scripts.sh dev-backend # Start backend in dev mode
./scripts/flox-scripts.sh dev-frontend # Start frontend in dev mode
# Demo
./scripts/flox-scripts.sh demo # Run full demo setup
./scripts/flox-scripts.sh teardown # Clean up demo environment
# Learning System
./scripts/startup-with-learning.js # Start with learning system
./scripts/teardown-with-learning-backup.sh # Teardown with learning backup
./scripts/restore-learning-data.sh # Restore learning data from backups
# Health checks
./scripts/flox-scripts.sh health # Check service health- New Data Domains: Add generators in
scripts/ - SQL Patterns: Extend knowledge base in
scripts/populate-knowledge-base.js - UI Components: Add React components in
frontend/src/components/ - API Endpoints: Extend Express routes in
backend/src/routes/
Services not starting:
./scripts/flox-scripts.sh health
docker-compose logs <service-name>Data generation errors:
cd scripts
./teardown-domain-separated.sh
./demo-domain-separated.shSQL validation failures:
- Check topic names match domain prefixes
- Verify schema registry is healthy
- Review LLM API key configuration
Flink job failures:
- Check Kafka topic existence
- Verify schema compatibility
- Review connector properties
Learning system issues:
- Verify Qdrant is running:
curl http://localhost:6333/collections - Check knowledge base collections: Look for
flinkenstein-knowledge,flinkenstein-topic-metadata, andflinkenstein-cluster-*collections - Verify knowledge base population:
curl http://localhost:6333/collections/flinkenstein-knowledge | jq '.result.points_count' - Check topic metadata:
curl http://localhost:6333/collections/flinkenstein-topic-metadata | jq '.result.points_count' - Review learning logs in backend console for job monitoring messages
- Ensure Flink UI is accessible for job status checks
- Use
kb-restorecommand to repopulate knowledge bases if needed
- Backend:
scripts/logs/log-backend.txt - Frontend:
scripts/logs/log-frontend.txt - Data Generators:
scripts/logs/log-*.txt - Docker:
docker-compose logs <service>
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
[Your License Here]
For issues and questions:
- Check the troubleshooting section
- Review logs and error messages
- Open an issue with detailed information
- Include environment details and reproduction steps
Happy Streaming! 🚀

