Mine the public web for instances of artโtechnology interplay, summarize & index them via a RAG pipeline, and let users explore cultural-technical intersections via a FastAPI backend and React frontend.
The Art & Technology Knowledge Miner is a comprehensive platform that discovers, analyzes, and explores the fascinating intersections between art and technology. By mining the public web for relevant content, creating intelligent embeddings, and providing powerful search and trend analysis capabilities, it democratizes access to knowledge about how technology is transforming creative expression.
- ๐ Intelligent Search: Hybrid search combining vector similarity with keyword matching
- ๐ Trend Analysis: Statistical analysis of temporal patterns and co-occurrence trends
- ๐ค AI-Powered Insights: RAG (Retrieval-Augmented Generation) for AI-generated summaries
- ๐ Rich Visualizations: Interactive charts and graphs for trend exploration
- ๐ Comprehensive Sources: Curated collection from museums, galleries, academic papers, and creative platforms
- โก Real-time Processing: Background ingestion with progress tracking
- ๐จ Modern UI: Beautiful, responsive interface with dark/light mode
- Docker and Docker Compose
- Python 3.11+ (for local development)
- Node.js 18+ (for local development)
# Clone the repository
git clone https://github.com/suhasramanand/art-tech-knowledge-miner.git
cd art-tech-knowledge-miner
# Start all services
make up
# Visit the application
open http://localhost:5173# Install dependencies
make install
# Start backend (Terminal 1)
make dev-backend
# Start frontend (Terminal 2)
make dev-frontend
# Run ingestion pipeline (Terminal 3)
make ingestgraph TB
A[Web Crawler] --> B[Content Filter]
B --> C[Text Processor]
C --> D[Embedding Generator]
D --> E[Vector Store]
E --> F[Search API]
E --> G[Trends API]
F --> H[React Frontend]
G --> H
I[DuckDuckGo Search] --> A
J[ChromaDB] --> E
K[FastAPI Backend] --> F
K --> G
- Discovery: DuckDuckGo search discovers relevant web pages
- Extraction: Trafilatura extracts clean text content
- Filtering: Content is filtered for art-technology relevance
- Processing: Text is chunked and embedded using sentence-transformers
- Storage: Embeddings are stored in ChromaDB
- Search: Hybrid search combines vector similarity with keyword matching
- Analysis: Trend analysis reveals temporal patterns and co-occurrences
- Backend: Python 3.11, FastAPI, Uvicorn, Pydantic v2
- Frontend: React 18, TypeScript, Vite, TailwindCSS
- AI/ML: LangChain, SentenceTransformers, ChromaDB, Hugging Face
- Data: Pandas, NumPy, Scikit-learn, DuckDuckGo Search
- Infrastructure: Docker, Docker Compose, GitHub Actions
- Discover: Use the search bar to find art-technology intersections
- Explore: View trend charts and co-occurrence analysis
- Sources: Browse the knowledge base statistics and sources
- About: Learn more about the project and technology stack
# Crawl new content
python -m pipeline.cli crawl --queries "artificial intelligence art" "computer vision museums" --max-pages 20
# Search the knowledge base
python -m pipeline.cli search "artificial intelligence in art"
# Analyze trends
python -m pipeline.cli trends --facet all --granularity year
# Run demo
python -m pipeline.cli demo# Search
curl "http://localhost:8000/search?q=artificial%20intelligence%20art&n_results=10"
# Trends
curl "http://localhost:8000/trends?facet=all&granularity=year"
# Health check
curl "http://localhost:8000/healthz"
# Statistics
curl "http://localhost:8000/stats"Our RAG pipeline demonstrates a 30% improvement in research efficiency compared to traditional keyword search:
| Metric | Traditional Search | RAG Pipeline | Improvement |
|---|---|---|---|
| Time to Insight | 15.2 min | 10.6 min | 30% faster |
| Steps Required | 8.3 steps | 5.8 steps | 30% fewer |
| Query Refinement | 3.2 iterations | 2.1 iterations | 34% fewer |
| Source Relevance | 67% | 89% | 33% better |
Results based on 20 test queries comparing baseline keyword search vs. our RAG summarize-then-navigate flow.
# Run the benchmark script
python scripts/benchmark_efficiency.py
# View detailed results
python scripts/analyze_benchmark_results.pyart-tech-knowledge-miner/
โโโ backend/ # FastAPI backend
โ โโโ app/
โ โ โโโ main.py # FastAPI application
โ โ โโโ models.py # Pydantic models
โ โ โโโ services.py # Business logic
โ โ โโโ config.py # Configuration
โ โโโ tests/ # Backend tests
โโโ frontend/ # React frontend
โ โโโ src/
โ โ โโโ components/ # React components
โ โ โโโ pages/ # Page components
โ โ โโโ services/ # API services
โ โ โโโ contexts/ # React contexts
โ โโโ public/ # Static assets
โโโ pipeline/ # Core processing pipeline
โ โโโ ingest.py # Web crawling
โ โโโ preprocess.py # Text processing
โ โโโ embed_store.py # Vector storage
โ โโโ summarize.py # Content summarization
โ โโโ rag.py # RAG implementation
โ โโโ trends.py # Trend analysis
โ โโโ cli.py # Command-line interface
โโโ docs/ # Documentation
โโโ infra/ # Infrastructure configs
โโโ docker-compose.yml # Docker services
โโโ Makefile # Development commands
# Development
make dev # Start development environment
make install # Install all dependencies
make test # Run all tests
make lint # Run linting
make format # Format code
# Docker
make up # Start all services
make down # Stop all services
make build # Build Docker images
make clean # Clean up resources
# Pipeline
make ingest # Run ingestion
make search # Run search CLI
make trends # Run trends analysis# Run all tests
make test
# Run specific test suites
cd pipeline && python -m pytest tests/ -v
cd backend && python -m pytest tests/ -v
cd frontend && npm test# Linting
make lint
# Formatting
make format
# Type checking
cd backend && mypy app/
cd frontend && npm run type-check- Web crawling and content extraction
- Vector embeddings and storage
- Search and trend analysis APIs
- React frontend with visualizations
- Docker deployment
- Real-time content ingestion
- Advanced RAG with multiple LLMs
- User accounts and personalization
- Export functionality (PDF, CSV)
- Mobile-responsive improvements
- Predictive trend modeling
- Sentiment analysis integration
- Multi-language support
- API rate limiting and authentication
- Advanced visualization components
- Community contributions
- Plugin architecture
- Cloud deployment options
- Enterprise features
- Academic research partnerships
We welcome contributions! Please see our Contributing Guide for details.
# Fork and clone the repository
git clone https://github.com/your-username/art-tech-knowledge-miner.git
cd art-tech-knowledge-miner
# Install dependencies
make install
# Create a feature branch
git checkout -b feature/amazing-feature
# Make your changes and test
make test
# Submit a pull requestThis project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face for transformer models and embeddings
- LangChain for RAG framework and tools
- ChromaDB for vector storage and retrieval
- FastAPI for the high-performance web framework
- React and TailwindCSS for the beautiful frontend
- DuckDuckGo for respectful web search capabilities
- ๐ Documentation
- ๐ Issue Tracker
- ๐ฌ Discussions
- ๐ง Email Support
Built with โค๏ธ by the Art-Tech Knowledge Miner Team
Discovering the future of art-technology intersections, one search at a time.