Skip to content

Course project for ELEC-E5550 Statistical Natural Language Processing

Notifications You must be signed in to change notification settings

jiemingyou/NLP-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP-project

Open in Streamlit

This is the working repository for the course project of the course ELEC-E5550 Statistical Natural Language Processing (SNLP) taught at Aalto University.

We build a course recommendation system based on the course descriptions using retrieval-based methods.

  • Course description scraping using Selenium
  • Course description translation using Helsinki-NLP's Opus-MT
  • Evaluating different embedding models
  • Retrieval evaluation using NDGC
  • LLM component for user interaction
  • Application using Streamlit, Supabase, and pgvector

Project structure

.
├── README.md
├── app                    # Web app and DB connection
├── bert                   # BERT model
├── embedding              # Embedding models
├── evaluation             # IR evaluation
├── preprocessing          # Preprocessing the course descriptions
├── scraper                # Scraping the course description
├── reader                 # Template for the LLM component
└── README.md