- Analyze lyric data with Natural Language Processing techniques
- Tokenization
- Sentiment analysis
- N-grams, frequency analysis
- Named entity recognition
- Become familiar with text prediction algorithms using machine learning
- Explore text prediction methodologies
Access the genre of choice for notebook containing analysis:
To view the summary for all genres, check out:
- We carried out next word prediction algorithm using the music data from a specific genre, using the following:
- Markov Chains
- Maximum Likelihood Estimator Algorithm
A Markov chain is a stochastic technique, but it differs from a general stochastic technique in that a Markov chain must be "memory-less." That is, (the probability of) future actions are not dependent upon the steps that led up to the present state. This is called the Markov property. 1
For more details, we recommend the following video.
For further reading, consider this Medium Article.
Also refer to the MLE documentation.
This article served as a starting point to our endeavor in text prediction.
Check out the Language Model Module from NTLK for more information on the different models to choose from.
We obtained our lyric data from Shazam Core API at RapidAPI.com
The specific API endpoints used were:
-
@ World Chart by Genre endpoint:
- feed a genre and the limit number of songs to retrieve
- obtain top chart for genre with trackID, artist, song name
-
@Track Details endpoint:
- feed trackID
- obtain lyrics for song
We then generated a dataframe with the lyrics and dropped any chart songs for which lyrics could not be obtained through the API.
We used Google's Text-To-Speech library to generate mp4 files of our Markov Chains and AI generated lyrics.
Here are lyric snippets for each genre:
country_mle_text.mp4
country_mle_text_OG.mp4
mle_lyrics_EDM.mp4
hiphop_markovchains_snippet.mp4
hiphop_mle_snippet.mp4
rnb_marchov_snippet.mp4
rnb_mle_snippet.mp4
pop_marchov_snippet.mp4
mle_lyrics_Rock.mp4
pop_mle_text.mp4
pop_mle_snippet.mp4
MLE is an N-gram model Algorithm
Some examples from the Country Lyrics Model:
-
The probability of 'woman' appearing in the text is: 0.00139
-
The probability of 'feel like' to be followed by 'a' is: 0.5
-
The probability of 'feel like a' to be followed by woman' is: 1.0
From the Country Lyrics Model:
-
The perplexity of 'aliens are' is: inf
-
The perplexity of 'old man' is: 7.667
-
The perplexity of 'bell rock' is: 3.4
-
The perplexity of 'jingle bell' is: 1.333
-
The perplexity of 'country boy' is: 1.273
Feel free to read up more on Perplexity and Language Models or watch this video.
Here are the overall results for the sentiment analysis:
These are the most used words in the Top Chart Songs for the analyzed genres:
This is the word cloud for the most used words across genres:
These are the frequencies of each Named Entity found in the Top Chart Songs for all genres:
-
We were not able to compare MLE to other language model algorithms due to time constraints.
-
Detokenizer: Returned text is readable, but lacking in punctuation and paragraph structure.
-
NER: The Person and GPE Named entities were greatly mis-identified by the Spacy's NER. The image below is from the country dataset.