Finetune-GPT2-Model-on-downstream-tasks

This repository contains code and resources for fine-tuning the GPT-2 model on downstream tasks using the Hugging Face's transformers library. The project utilizes text data from "The Buddha's Path of Virtue: A Translation of the Dhammapada by F. L. Woodward", available through Project Gutenberg.

Learning Objectives

By the end of this experiment, you will be able to:

Load and pre-process data from a text file.
Load and use a pre-trained tokenizer.
Finetune a GPT-2 language model on a specific text dataset.

Dataset Description

The dataset used in this project is taken from one of Project Gutenberg's eBooks named "The Buddha's Path of Virtue: A Translation of the Dhammapada by F. L. Woodward". This dataset includes a variety of teachings and philosophies attributed to Buddha, providing a rich linguistic resource for language modeling.

Read more about the text and its source here.

How to Use

Requirements

Before starting, ensure that you have Python installed on your system and the following Python packages:

transformers
torch
datasets
tokenizers

You can install these packages using pip:

pip install transformers torch datasets tokenizers

Steps to Run

Clone the Repository:

git clone https://github.com/Praveen76/Finetune-GPT2-Model-on-downstream-tasks.git
cd Finetune-GPT2-Model-on-downstream-tasks

Prepare the Data:
- Ensure that the dataset file dhammapada.txt is located in the data/ directory.
Training the Model:
- Run the training script:
```
python finetune_gpt2.py
```

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue if you have suggestions or improvements.

License

This project is licensed under the MIT License.

Issues:

If you encounter any issues or have suggestions for improvement, please open an issue in the Issues section of this repository.

Contact:

The code has been tested on Windows system. It should work well on other distributions but has not yet been tested. In case of any issue with installation or otherwise, please contact me on Linkedin

Happy coding!!

About Me:

I’m a seasoned Data Scientist and founder of TowardsMachineLearning.Org. I've worked on various Machine Learning, NLP, and cutting-edge deep learning frameworks to solve numerous business problems.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
Finetune_GPT2_Model.ipynb		Finetune_GPT2_Model.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finetune-GPT2-Model-on-downstream-tasks

Learning Objectives

Dataset Description

How to Use

Requirements

Steps to Run

Contributing

License

Issues:

Contact:

About Me:

About

Releases

Packages

Languages

License

Praveen76/Finetune-GPT2-Model-on-downstream-tasks

Folders and files

Latest commit

History

Repository files navigation

Finetune-GPT2-Model-on-downstream-tasks

Learning Objectives

Dataset Description

How to Use

Requirements

Steps to Run

Contributing

License

Issues:

Contact:

About Me:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages