Authors:
Isaiah Jones (isaiah.jones@u.northwestern.edu), Northwestern University
Ruth Bagley (ruthbagley16@gmail.com), Northwestern University
Shubhanshi Gaudani (shubhanshigaudani2024@u.northwestern.edu), Northwestern University
COMP SCI 496: Generative Deep Models
Professor Bryan Pardo
The goal of the project is to generate Shakespearean sonnets based on 5 keywords given to the model. Shakespearean sonnets have a specific meter and rhyme scheme that hopefully our model will learn to generate, and we will evaluate our generated sonnets based on their adherence to this structure and their similarity to the keywords provided as input.
In class, we have seen many models that generate text ‘organically’, however we were curious to see how to make this generated text such that it inherits a new form (i.e. that of a sonnet) while still retaining the initial content/subject matter. Upon further research we found many attempts at algorithmic sonnet generation, the Hafez et al. approach generates sonnets based on an inputted arbitrary subject by using an RNN and a FSA. A separate paper by Benhardt et al. builds on the same approach to determine rhymes by using both phoneme structure and the stress pattern for each word to decide the degree of rhyme. While all of these papers perform sonnet generation from topics, we are interested in transforming a text of given length into a sonnet.
One of the main inspirations for our work was GPoet-2, a GPT-2 based limerick generator. Also with a rhyme scheme and high structure. They enforce the rhyme scheme by fine-tuning the model with a reversed corpus, though they mention the quality in limerick takes a hit when trained this way [1].
Another inspiration for our work is GPT-2 Neural Network Poetry by Gwern Branwen. They trained GPT-2 on the Gutenberg Poetry Corpus containing 3 million lines of free and available poetry before generation. While his model was at generating prose, it still would require some additional finetuning for the specific task of sonnet generation. In addition, we found a fair amount of RNN-based models and transformer based models for smaller poetry tasks such as haikus and other short-formed poems [2].
We used Shakespearean sonnet dataset from Kaggle[3] to fine tune the pre-trained model. The sonnet dataset is a text file containing Shakespeare’s 154 sonnets and can be found in Sonnets.txt
. We refined the dataset to include 5 key words that capture the main essence of the sonnet. The 5 words were selected manually from the sonnet, and later we also tried using a TFIDF based approach to extract these words but found the manual approach to be better in terms of the results obtained. The refined dataset can be found in Sonnets with keywords.txt
So an example from the dataset is:
Keywords:
weary, travel, zealous, imaginary, mind
Sonnet:
Weary with toil, I haste me to my bed,
The dear repose for limbs with travel tired;
But then begins a journey in my head
To work my mind, when body's work's expired:
For then my thoughts--from far where I abide--
Intend a zealous pilgrimage to thee,
And keep my drooping eyelids open wide,
Looking on darkness which the blind do see:
Save that my soul's imaginary sight
Presents thy shadow to my sightless view,
Which, like a jewel hung in ghastly night,
Makes black night beauteous, and her old face new.
Lo! thus, by day my limbs, by night my mind,
For thee, and for myself, no quiet find.
The OpenAI GPT-2 in the past has exhibited abilities of writing very coherent essays with enriched vocabulary that goes beyond what we anticipated large language models are able to produce. The GPT-2 has an architechture similar to the decoder-only transformer [4]. The GPT2 was, however, a very large, transformer-based language model trained on a massive dataset, and we leverage its pre-trained model in order to fine-tune it on our much smaller dataset.
In order to fine-tune the model, we:
- Cloned the GPT-2 Medium (345M parameters) model
- Unfroze the last 6 layers of the model
- Retrained using the Transformers library
In terms of the tokenizer, we used the existing tokenizer by GPT-2. The overall code for fnetuning the GPT-2 can be found in the file GPT_2_finetuning_for_sonnets.ipynb
We varied the number of unfrozen layers from 4-8. For both 4 and 8 the model produced gibberish text, whereas for 5, 6 and 7 the model produced standard english. For the 7 layer variation, text had large chunks of generated text directly lifted from the initial sonnets while the 5 layer variation often had youtube links and sports commentary. Therefore, our final model has 6 unfreezed layers. The specific loss variation can be seen in the below graph:
2. Varying the number of epochs (Depending on the resulting training loss and validation loss values):
We varied the number of epochs from 7 until 40. We found that the validation loss was consisstently the lowest between 12 to 14 epochs after which it increased indicating overfitting to the data. Our final model was therefore trained for 13 epochs.
We discovered that our model was very sensitive to learning rate. The current learning rate of the model is 5e-3, when we tried to decrease or increase the power or scalar value for the exponential, the model either did not produce any output or produced gibberish/random numbers and symbols. We therefore narrowed it down to using 5e-3.
The GPT2 model is available in 4 sizes: GPT2-small, GPT2-medium, GPT2-large, GPT2-extralarge. We realised that even Colab Pro runs out of memory for both the GPT2-large and GPT2-extralarge models, however GPT2-medium was supported and therefore we ended up using it. However overall in terms of results there wasn't a great difference in terms of results between GPT2-small and GPT2-medium. Our initial experiments were with GPT2-medium.
We created a 'sonnet score' evaluation metric for sonnet structure, which uses Poesy[5] to judge length, meter, and rhyme of the poem and compare to Shakespearean sonnet standards (which are 14 lines, 10 syllables per line in iambic pentameter, and an abab cdcd efef gg rhyme scheme). The score varies from 0 to 1, with 1 being perfect. The rhyme detection is not always perfectly accurate, but usually it's quite close, making our sonnet score a reasonably reliable metric. The poem evaluation is found in sonnetScore.ipynb
Overall, we feel quite satisfied with the quality of work generated by our model. With our work, we demonstrate our ability to use GPT-2 for our specific usecase and have it generate outputs that are similar enough to what we desire. Although our model was unable to exactly capture the complex structure and rhyme scheme of a Shakespearean sonnet, our model clearly demonstrates a level of structural learning. Through our hyperparameter search, we were also able to limit the amount of straight recall done by the model so that it is purely generating new bodies of work.
Here are two sonnets generated by our model:
Sonnet 1.
'Tis better policy (of perfect truth being made)
never kept old excuse newsworn stands least blushing stand!
Therefore desire is daily renewed
beseem thine evermore enlarged loving quest
increase her storey doomless phoenix
ere long date near which she would convert
else mistakingly appear'd deadowed or diecast away
twain by fortune do sing hymns at heaven gateways
where late sweetest creatures live
sang song players possessed did common forsake
their faces become sourest spies deservalled
informer scouchered simplicity strangle',
maketh bad what they bring forth now
must bear all wrong directed thereby reign.
Sonnet Score: 0.731
Recitation of this sonnet by James Wilkinson can be found in james_recitation.m4a
!
Sonnet 2.
When thou shalt be disposed to set me light
And place my merit in the eye of scorn.
I will not praise that purpose
not once vassalized mine own desert;
but when thy love hath cast his utmost sum
past paying rent: dost pay it as if
Not paid before was no need then took from thee
this kindness gift confoundingly
make defence for myself against confounding age's
cruel knife and crooked sword-barren siege bars
so shall he fight eternal war with Time decays?
Sonnet Score: 0.454
Additional examples are in generated_examples.txt
In a one-shot setting, and provided with the keywords "love, fame, excitement, angst, cheese", two GPT-3 models produced the following sonnets.
text-davinci-003
To seek sweet pleasure and exciting show,
Where anguish and despair their darts bestow,
The cheese of life is all to him a woe;
E'en though he strive with all his power and might
To win the laurel and his soul's delight,
Yet still the same old story is in sight,
That love and fame bring with them endless plight.
And so, in vain, his efforts will not cease,
For love and fame have brought him naught but grief;
The cheese of life is still an utter tease,
And all his striving brings him naught but grief.
Sonnet score: 0.878
text-davinci-002
Love is like a rose,
Fame is like a bee,
Excitement is like a cat,
Angst is like a cheese,
But only you can make them what they are.
Sonnet score: 0.332
In a zero-shot setting with no keywords, ChatGPT produced the following two sonnets. The first is original, but the second is an exact copy of Shakespeare, with one repeated line which reduced the sonnet score quite a bit.
Sonnet 1
With beauteous eyes and hair of shining gold,
Thou art the fairest maid that e'er I've seen.
In thy sweet voice, a thousand tales are told,
And every glance doth speak of beauty keen.
Thy cheeks, like roses, bloom with youthful glow,
And in thy lips, a subtle, red delight.
Thy graceful form doth move with gentle flow,
A vision fair to see, both day and night.
But beauty fades, and youth doth pass away,
And though thou art a goddess in mine eyes,
Thou must, like all before thee, fade to gray,
And with thy passing, all our joys will die.
So let us cherish every moment spent,
For time doth steal away what it hath lent.
Sonnet score: 0.998
Sonnet 2
In fair Verona, where we lay our scene,
Two households, both alike in dignity,
In fair Verona, where we lay our scene,
From ancient grudge break to new mutiny,
Where civil blood makes civil hands unclean.
From forth the fatal loins of these two foes
A pair of star-cross'd lovers take their life;
Whose misadventured piteous overthrows
Do with their death bury their parents' strife.
The fearful passage of their death-mark'd love,
And the continuance of their parents' rage,
Which, but their children's end, naught could remove,
Is now the two hours' traffic of our stage;
The which if you with patient ears attend,
What here shall miss, our toil shall strive to mend.
Sonnet Score: 0.817
[1]. https://arxiv.org/pdf/2205.08847.pdf
[2]. https://www.gwern.net/GPT-2
[3]. https://www.kaggle.com/datasets/blacksheep2105/shakespearean-sonnets
[4]. https://jalammar.github.io/illustrated-gpt2/
[5]. https://github.com/quadrismegistus/poesy