This is a implementation of reasearch paper " Attention is all you need ". Here, I have used only decoder model.
- It is trained on Tiny_shakespere dataset.
- It utilizes bigram model.
- It generates new text based on its training and learning done on the dataset
- It is inspired from the youtube video of Andrej Karpathy.