This project aims to fine-tune some existing models from the Hugging Face Transformers library. AS a source of data i used some public articles (questions for interview) from GitHub
- Collect data (as such as possible)
- Preprocess data (clean, turn it into question-answer pairs or dialogue)
- Augment data (add noise, add duplicates, add outliers)
- Split data (train, validation, test)
- Configure model (choose model architecture, hyperparameters)
- Train model (fit model to data)
- Evaluate model (check model performance on validation data)
Follow these steps to set up and run the project on your local machine.
-
Clone the repository:
git clone git@github.com:iashchak/ai-tools.git
-
Change to the project directory:
cd ai-tools
-
Install the required packages:
- Init a new conda environment with
environment.yml
file (preffered) - Update current one with
environment.yml
fileconda env update --file environment.yml
- Python 3.8 or higher
- PyTorch 1.9 or higher
- Hugging Face Transformers library
To run the project, execute the Jupyter Notebook notebooks/process_interview_questions
. This will download the dataset, create question-answer pairs, train the model, and test it with some example questions.
- Data collection
- Dataset creation (question-answer pairs)
- Model training using Hugging Face Transformers
- Model evaluation and testing
- Improve dataset quality with better question generation
- Increase the size and diversity of the dataset
- Improve model performance with hyperparameter tuning
- Implement a user-friendly interface for interacting with the model
Please read CONTRIBUTING.md for details on how to contribute to the project.
This project is licensed under the MIT License. See the LICENSE file for details.