## Overview 🧠
This project is an AI-powered web application that automates the creation of study materials. Built with Streamlit, this tool can process information from multiple sources—including YouTube video transcripts, PDF documents, and even text from images (OCR).
The application intelligently chunks and summarizes the content using advanced AI models. It then generates question-and-answer pairs and exports them as a ready-to-use Anki flashcard deck (.apkg file), complete with spaced repetition metadata for efficient learning.
- 📄 Multi-Source Input: Ingests text directly from YouTube video links, uploaded PDFs, or images.
- 🤖 OCR Capabilities: Extracts text from images using EasyOCR and Tesseract for preprocessing.
- ✂️ Intelligent Text Chunking: Automatically preprocesses and intelligently splits long, noisy transcripts or documents into manageable segments.
- 🧠 AI-Powered Summarization: Utilizes the
facebook/bart-large-cnnmodel to create concise summaries of the text chunks. - ❓ Automatic Q&A Generation: Employs the
google/flan-t5-basemodel to generate relevant question-and-answer pairs from the content, perfect for flashcards. - 🗂️ Direct Anki Export: Seamlessly packages the generated Q&A pairs into a standard Anki deck file (
.apkg) usinggenanki.
- App Framework: Streamlit
- AI & Machine Learning: Transformers (Hugging Face), PyTorch, spaCy, NLTK
- Data Extraction & OCR: PyMuPDF (for PDFs), EasyOCR, Pytesseract, youtube-transcript-api
- Data Handling: Pandas, NumPy
- Flashcard Generation: genanki
To get this project running on your local machine, follow these steps:
- Clone the repository:
git clone https://github.com/VertexCodeStudio/summarise-flashcards.git
- Navigate to the project directory:
cd summarise-flashcards - Install the required Python packages:
pip install -r requirements.txt
- Download the necessary spaCy language model:
python -m spacy download en_core_web_sm
- Run the Streamlit application:
streamlit run src/app.py
The application will then be running in your web browser.