Welcome to the NLP coursework repository for the Honours Bachelor of Artificial Intelligence program at Durham College. This repository contains weekly labs, assignments, and the final project completed during the Winter 2024 term.
NLP/
├── Assignment 1/
├── Final Project/
├── Labs/
│ ├── Week 2/
│ ├── Week 3/
│ ├── Week 4/
│ ├── Week 5/
│ ├── Week 6/
│ ├── Week 9/
│ ├── Week 10/
│ └── Week 11/
├── NLP_textbook.pdf
└── .gitattributes
- Assignment 1/: First major assignment focusing on foundational NLP concepts.
- Final Project/: Capstone project applying advanced NLP techniques.
- Labs/: Weekly lab exercises covering topics such as tokenization, POS tagging, and sentiment analysis.
- NLP_textbook.pdf: Primary course textbook for reference.
- Text preprocessing and normalization
- Tokenization and part-of-speech tagging
- Named Entity Recognition (NER)
- Sentiment analysis
- Word embeddings (Word2Vec, GloVe)
- Transformer architectures (BERT, GPT)
- Sequence labeling and classification
- Final project: End-to-end NLP pipeline development
- Python 3.8+
- Jupyter Notebooks
- NLTK, spaCy, scikit-learn
- PyTorch, Hugging Face Transformers
- Pandas, NumPy, Matplotlib
To set up the environment:
-
Clone the repository:
git clone https://github.com/ROCCYK/NLP.git
-
Navigate to the project directory:
cd NLP
-
Create and activate a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install the required packages:
pip install -r requirements.txt
-
Launch Jupyter Notebook:
jupyter notebook
- NLP_textbook.pdf: Core textbook covering theoretical and practical aspects of NLP.
- Hugging Face Transformers Documentation: Comprehensive guide to transformer models and their applications.
- spaCy Documentation: Detailed documentation on spaCy's NLP capabilities.
- Ensure that all dependencies are installed as per
requirements.txt
. - Some notebooks may require additional datasets; instructions are provided within the respective notebooks.
- For any issues or questions, please refer to the course's communication channels.