Finetune-GPT2-Model-on-downstream-tasks

This repository contains code and resources for fine-tuning the GPT-2 model on downstream tasks using the Hugging Face's transformers library. The project utilizes text data from "The Buddha's Path of Virtue: A Translation of the Dhammapada by F. L. Woodward", available through Project Gutenberg.

Learning Objectives

By the end of this experiment, you will be able to:

Load and pre-process data from a text file.
Load and use a pre-trained tokenizer.
Finetune a GPT-2 language model on a specific text dataset.

Dataset Description

The dataset used in this project is taken from one of Project Gutenberg's eBooks named "The Buddha's Path of Virtue: A Translation of the Dhammapada by F. L. Woodward". This dataset includes a variety of teachings and philosophies attributed to Buddha, providing a rich linguistic resource for language modeling.

How to Use

Requirements

Before starting, ensure that you have Python installed on your system and the following Python packages:

transformers
torch
datasets
tokenizers

You can install these packages using pip:

pip install transformers torch datasets tokenizers

Steps to Run

Clone the Repository:

git clone https://github.com/Praveen76/Finetune-GPT2-Model-on-downstream-tasks.git
cd Finetune-GPT2-Model-on-downstream-tasks

Prepare the Data:
- Ensure that the dataset file dhammapada.txt is located in the data/ directory.
Training the Model:
- Run the training script:
```
python finetune_gpt2.py
```

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue if you have suggestions or improvements.

License

This project is licensed under the MIT License.

Issues:

If you encounter any issues or have suggestions for improvement, please open an issue in the Issues section of this repository.

Contact:

The code has been tested on Windows system. It should work well on other distributions but has not yet been tested. In case of any issue with installation or otherwise, please contact me on Linkedin

Happy coding!!

About Me:

I’m a seasoned Data Scientist and founder of TowardsMachineLearning.Org. I've worked on various Machine Learning, NLP, and cutting-edge deep learning frameworks to solve numerous business problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Finetune-GPT2-Model-on-downstream-tasks

Learning Objectives

Dataset Description

How to Use

Requirements

Steps to Run

Contributing

License

Issues:

Contact:

About Me:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Finetune-GPT2-Model-on-downstream-tasks

Learning Objectives

Dataset Description

How to Use

Requirements

Steps to Run

Contributing

License

Issues:

Contact:

About Me: