Event-aware lecture summarizer using V-JEPA visual encoder for real-time, context-aware summaries and retrieval.
| Option | Description | Link |
|---|---|---|
| Cloud Demo | Try instantly in your browser (no install) | lecture-mind.onrender.com |
| Local Setup | Full features on your machine | Local Setup Guide |
Note: The cloud demo runs in demo mode with placeholder processing. For full functionality with real AI models, use the local installation.
- Visual Encoding: DINOv2 ViT-L/16 for 768-dim frame embeddings
- Text Encoding: sentence-transformers (all-MiniLM-L6-v2) for query embeddings
- Audio Transcription: Whisper integration for lecture transcription
- Multimodal Search: Combined visual + transcript ranking with configurable weights
- Event Detection: Automatic slide transition and scene change detection
- FAISS Index: Fast similarity search with IVF optimization for large collections
pip install lecture-mindpip install lecture-mind[ml]pip install lecture-mind[audio]pip install lecture-mind[all]git clone https://github.com/matte1782/lecture-mind.git
cd lecture-mind
pip install -e ".[dev,ml,audio]"# Process a lecture video
lecture-mind process lecture.mp4 --output data/
# Query the processed lecture
lecture-mind query data/ "What is gradient descent?"
# List detected events
lecture-mind events data/
# Get help
lecture-mind --helpfrom vl_jepa import (
VideoInput,
FrameSampler,
VisualEncoder,
TextEncoder,
MultimodalIndex,
EventDetector,
)
# Load and sample video frames
with VideoInput.from_file("lecture.mp4") as video:
sampler = FrameSampler(fps=1.0)
frames = sampler.sample(video)
# Encode frames (uses placeholder encoder by default)
encoder = VisualEncoder.load()
embeddings = encoder.encode_batch(frames)
# Build searchable index
index = MultimodalIndex()
index.add_visual(embeddings, timestamps=[f.timestamp for f in frames])
# Query the lecture
text_encoder = TextEncoder.load()
query_emb = text_encoder.encode("machine learning basics")
results = index.search(query_emb, k=5)
for result in results:
print(f"Timestamp: {result.timestamp:.1f}s, Score: {result.score:.3f}")lecture.mp4
│
▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ VideoInput │────▶│FrameSampler│────▶│ Frames │
└─────────────┘ └─────────────┘ └─────────────┘
│
┌──────────────────────────┼──────────────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│VisualEncoder│ │EventDetector│ │AudioExtract │
│ (DINOv2) │ │ │ │ (FFmpeg) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Embeddings │ │ Events │ │ Transcriber │
│ (768-dim) │ │ │ │ (Whisper) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
└──────────────────────────┼──────────────────────────┘
▼
┌─────────────────┐
│ MultimodalIndex │
│ (FAISS) │
└─────────────────┘
│
▼
┌─────────────────┐
│ Search/Query │
└─────────────────┘
| Operation | Target | Actual |
|---|---|---|
| Query latency (1k vectors) | <100ms | 30.6µs |
| Search latency (100k vectors) | <100ms | 106.4µs |
| Frame embedding (placeholder) | <50ms | 0.36ms |
| Event detection | <10ms | 0.24ms |
See BENCHMARKS.md for detailed performance analysis.
- Python 3.10+
- NumPy >= 1.24.0
- OpenCV >= 4.8.0
- ML: PyTorch >= 2.0, transformers, sentence-transformers, FAISS
- Audio: faster-whisper >= 1.0.0
- UI (v0.3.0): Gradio >= 4.0.0
# Run tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=src/vl_jepa --cov-report=term
# Lint and format
ruff check src/ && ruff format src/
# Type check
mypy src/ --strict
# Run benchmarks
pytest tests/benchmarks/ -v --benchmark-only- v0.1.0: Foundation (placeholder encoders, basic pipeline)
- v0.2.0: Real Models + Audio (DINOv2, Whisper, multimodal search)
- v0.3.0: Web UI + Cloud Demo (FastAPI, Docker, security hardening)
- v0.4.0: Student Playground (flashcards, multi-lecture library, offline mode)
- v1.0.0: Production (optimization, real decoder, deployment)
MIT License - see LICENSE for details.
If you use Lecture Mind in your research, please cite:
@software{lecture_mind,
title = {Lecture Mind: Event-aware Lecture Summarizer},
author = {Matteo Panzeri},
year = {2026},
url = {https://github.com/matte1782/lecture-mind}
}