This project is a part of the course Natural Language Processing at the University of Information Technology
| No | Student ID | Full name | |
|---|---|---|---|
| 1 | 23520179 | Phùng Minh Chí | 23520179@gm.uit.edu.vn |
| 2 | 23520183 | Nguyễn Hữu Minh Chiến | 23520183@gm.uit.edu.vn |
| 3 | 23521467 | Lê Ngọc Phương Thảo | 23521467@gm.uit.edu.vn |
- Course Natural Language Processing
- Course code: CS221
- Class code: CS221.P22
- Semester: HK2 (2024 - 2025)
- Instructor: TS Nguyễn Trọng Chỉnh
- Clone the repository:
git clone hhttps://github.com/chisphung/CS221-GenresPrediction-from-Overview
- Install dependencies:
pip install -r requirements.txt
To preprocess the dataset, run the following command:
python tools/preprocess.py You can also download the preprocessed dataset with the following command:
python tools/download.pyTo train the BERT models, run the following command:
python - m tools.train <pretrained_model_name> <dataset_path>Replace <pretrained_model_name> with the name of the pretrained model you want to use (e.g., bert-base-uncased) and <dataset_path> with the path to your dataset.
To evaluate the model, run the following command:
python -m src.evaluateModify the target list path and weights path to match your setup
To save your time, we are current support 3 pretrained models:
bert-base-uncasedtrained on preprocessded + undersampled datasetdistilled-bert-base-uncasedtrained on preprocessded datasetbert-base-casedtrained on raw + undersampled dataset
You can download them from the following links:
After downloading, you can place them in the weights folder.
To make a single prediction using the trained model, run the following command:
python -m src.mainTo deploy the model using streamlit, run the following command:
streamlit run src/app.py