This is the working repository for the course project of the course ELEC-E5550 Statistical Natural Language Processing (SNLP) taught at Aalto University.
We build a course recommendation system based on the course descriptions using retrieval-based methods.
- Course description scraping using Selenium
- Course description translation using Helsinki-NLP's Opus-MT
- Evaluating different embedding models
- Retrieval evaluation using NDGC
- LLM component for user interaction
- Application using Streamlit, Supabase, and pgvector
.
├── README.md
├── app # Web app and DB connection
├── bert # BERT model
├── embedding # Embedding models
├── evaluation # IR evaluation
├── preprocessing # Preprocessing the course descriptions
├── scraper # Scraping the course description
├── reader # Template for the LLM component
└── README.md