I have always found it interesting how Taylor Swift started her career as a seemingly innocent, reserved country singer and has evolved to a very confident, empowered, and somewhat sassy pop star. In this project, I wanted to investigate the main topics that Taylor Swift sings about, how the topics have changed over time, and if these topics have an effect on Album Sales.
After applying some NLP techniques to the lyrics, my NMF model with TF-IDF vectorizer identified 9 topics that Taylor Swift songs fall under:
- Permanency & Belonging
- Positivity through the bad
- Bright and Happy
- Moving on from the past, thinking about the future
- Don't want to move on, don't want the time to pass
- Looking back (sad)
- Looking back (pondering, with somewhat regret)
- New Relationship
- Reflection of the Past
I then used K-Means clustering to fit these topics into 5 clusters, which were then visualized with TSNE (2 components):
- Mostly topic 3 (Moving on from the past, thinking about the future)
- Mostly topic 5 (looking back, sad)
- Mixture of topics 2 and 4 (bright and happy; don't want to move on, don't want time to pass)
- Mixture of topics 6, 7, 8, and 0 (looking back, pondering, some regret; reflection of the past; new relationship; permanency and belonging)
- Mostly topic 1 (positivity through the bad)
I found that Cluster 3 has been a steady theme throughout Taylor Swift's career, while clusters 0 and 1 have fluctuated; clusters 2 and 4 have stayed relatively the same.
My analysis also found that cluster 1 (looking back on the past, sad) was by far the most successful (in terms of Album Sales) while cluster 0 (moving on from the past, thinking about the future) was the least successful, which I found kind of sad and ironic.
In this project, you will find a PDF of my presentation, a code file for my EDA and modeling, my data source, other sources used, and the skills/analysis acquired.
- Natural Language Processing (NLP) techniques
- Unsupervised Learning Techniques
- Tokenization
- LSA
- NMF
- TF-IDF
- CountVectorizer
- K-Means Clustering
- DBSCAN
- PCA
- TSNE
- Cosine Similarity
- Tableau