This project extracts text from a PDF using Tesseract OCR and generates Multiple Choice Questions (MCQs) using the LLaMA3 model via the Groq API. It allows users to convert educational content from PDFs (like NCERT textbooks) into quiz questions automatically.
- Extracts text from images using
pytesseract - Uses LLaMA3 (
llama3-8b-8192) to generate MCQs - Allows user to specify question count and difficulty
- Command-line interface for quick interaction
- Python
- Tesseract OCR
- Groq API (LLaMA3)
- dotenv (
.envsupport for API key)
First install Tesseract OCR Engine and make sure to include it in your PATH.
git clone https://github.com/manmeetsantre/IEEE_TechRush.git
cd IEEE_TechRush
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
-
Download the latest Windows installer from: https://github.com/UB-Mannheim/tesseract/wiki
(File name:tesseract-ocr-w64-setup-5.5.0.20241111.exe) -
Run the installer:
- Note the installation path (e.g.,
C:\Program Files\Tesseract-OCR) - Check the option to add Tesseract to system PATH
- Note the installation path (e.g.,
-
If you didn't add it to PATH during install:
- Go to System Properties > Environment Variables
- Edit the
Pathvariable and add:C:\Program Files\Tesseract-OCR
:: Clone the repository
git clone https://github.com/manmeetsantre/IEEE_TechRush.git
cd IEEE_TechRush
:: (Optional) Initialize a virtual environment
python -m venv venv
:: Activate the virtual environment
venv\Scripts\activate
:: Install required packages
pip install -r requirements.txtThis project uses a .env file to load your Groq API key securely without exposing it in the code.
- In the project root directory, create a file named
.env - Add the following line inside it:
GROQ_API_KEY= {your_actual_groq_api_key_here}
Replace
your_actual_groq_api_key_herewith your API key from https://console.groq.com/keys
The project uses the python-dotenv library to load environment variables from the .env file.
Inside the code, it accesses the key like this:
import os
api_key = os.getenv("GROQ_API_KEY")This keeps your API credentials secure and separated from the source code.