LearnLinkAI is an AI powered search engine designed to recommend educational content from platforms like YouTube and the web. It leverages natural language processing, embeddings, and APIs to fetch, rank, and store relevant content. The application uses FastAPI for the backend, integrates with Google APIs for search, and employs AssemblyAI for audio transcription. Content is ranked using cosine similarity based on embeddings generated by a HuggingFace model, and results are stored in a SQLite database for efficient retrieval
- Multi-Platform Search: Retrieves educational content from YouTube and web sources using Programmable Search and YouTube APIs.
- AI-Powered Ranking: Uses sentence-transformers (all-MiniLM-L6-v2) to generate embeddings and rank results by cosine similarity.
- Audio Transcription: Supports audio file transcription via AssemblyAI for voice-based queries.
- Database Storage: Stores content metadata and embeddings in a SQLite database for persistence and efficient querying.
- FastAPI Backend: Provides a high-performance API with CORS support for integration with frontends (e.g., Next.js).
- Gemini API Integration: Fetches concise educational answers for queries using Google's Gemini model.
To run LearnLinkAIfile, ensure you have Python 3.8+ and the dependencies listed in requirements.txt. Key dependencies include:
- FastAPI: For the API server.
- HuggingFace Embeddings: For text embeddings (sentence-transformers/all-MiniLM-L6-v2).
- SQLAlchemy: For SQLite database interactions.
- Google APIs: For YouTube and web search.
- AssemblyAI: For audio transcription.
- PyTorch: For handling embeddings.
Install dependencies using: bash pip install -r requirements.txt
-
Clone the Repository: bash git clone cd LearnLinkAIfile
-
Environment Variables: Create a .env file in the project root with the following: plaintext YOUTUBE_API_KEY= GOOGLE_API_KEY= GOOGLE_CSE_ID= GEMINI_API_KEY= ASSEMBLYAI_API_KEY=
-
Database Initialization: The SQLite database (content.db) is automatically created when you run the application for the first time, thanks to the Base.metadata.create_all call in database.py.
-
Run the Application: Start the FastAPI server using Uvicorn: bash uvicorn main:app --reload
The API will be available at http://localhost:8000.
- GET /: Returns a welcome message for the API.
- POST /search: Searches for educational content based on a query. Expects a JSON payload with query (string), max_results (int, default 5), and platforms (list of strings, e.g., ["youtube", "web"]).
-
Example: json { "query": "machine learning basics", "max_results": 50, "platforms": ["platform 1", "platform 2"] }
-
Response includes query, counts (YouTube, web, ranked, stored_new), and ranked results.
-
- POST /aiinfo: Fetches concise AI-generated answers for a query using the Gemini API. Expects a JSON payload with query.
-
Example: json { "query": "machine learning basics", "max_results": 5, "platforms": ["ai"] }
-
Response includes query and AI-generated answers with references.
-
- POST /transcribe: Transcribes an uploaded audio file using AssemblyAI. Expects a file upload.
-
Example (using curl): bash curl -X POST -F "file=@audio.wav" http://localhost:8000/transcribe
-
Response includes the transcribed text.
-
- main.py: FastAPI application with search, AI info, and transcription endpoints.
- database.py: SQLAlchemy setup for SQLite database and ORM model for content storage.
- requirements.txt: List of Python dependencies.
- .env: Environment variables for API keys (not included in version control).
-
Search for Content: Use the /search endpoint to fetch and rank educational content. Results are ranked by relevance using cosine similarity on embeddings and stored in the database for future queries.
-
AI-Generated Answers: Use the /aiinfo endpoint to get concise answers with references for educational queries.
-
Transcribe Audio: Upload audio files to the /transcribe endpoint to convert voice queries into text.
- Performance: The application uses batched embeddings for efficient ranking and minimizes database commits for speed.
- Scalability: The SQLite database is suitable for small to medium-scale applications. For larger datasets, consider switching to a more robust database like PostgreSQL.
- Error Handling: The API includes comprehensive error handling for API failures, invalid inputs, and transcription errors.
- Hardware: For faster embeddings, use a GPU by setting model_kwargs={"device": "cuda"} in main.py. CPU is used by default for compatibility.