Movie Plots Analysis

Introduction

Movies have long been a captivating medium, weaving stories that resonate with audiences across generations. This project delves into the essence of cinematic narratives, aiming to identify the fundamental plots that have shaped our understanding of storytelling. By analyzing the scripts of the top 3 IMDb hits from the 1970s to the 2020s, we embark on a journey to uncover the underlying patterns that define these cinematic masterpieces.

Data Cleaning

The data is sourced from a JSON file containing movie scripts. The following cleaning steps were performed:

Lowercasing all text
Removing symbols and line breaks
Eliminating punctuation
Removing stop words
Lemmatizing the data for better contextual analysis

Classification

A TF-IDF matrix of the movie scripts is created, and similarity distances are computed using cosine similarity. Hierarchical clustering is applied to identify basic plots, and the results are visualized using dendrograms.

Similar Movie Finder

Cosine similarity is used to find similar movies based on the TF-IDF vectors of their scripts. A function, find_similar, is provided to identify the most similar movie for a given title.

Clustering

Clustering is the building block of this project helping us see if there exists a natural sectioning of movies from last 50 years to fall into constant plots/themes purely based on their script proving our initial hypothesis true. For this purpose, scripts are clustered based on their similarity using the KMeans algorithm. The optimal number of clusters is determined using the Elbow Method.

Keyword Extraction

Keywords act as the building blocks of narratives, encapsulating the essence of each story. Through the extraction of keywords using TF-IDF vectors, we gain a nuanced understanding of the pivotal elements that define these cinematic experiences. The top 10 keywords for each movie serve as beacons, illuminating the thematic landscape.

Analyzing the Scripts Classified Plot-wise

Plot-wise classification opens a portal to the diverse worlds crafted by filmmakers. The subsequent generation of word clouds for each cluster transforms textual data into visually engaging insights. By visually representing common themes, these word clouds offer a unique perspective on the shared narratives within each cluster.

Analyzing the Scripts Decade-wise

Decades are chapters in the cinematic timeline, each marked by unique storytelling trends. By categorizing scripts based on their respective decades, we embark on a journey through the evolving landscape of movie genres. The resulting word clouds serve as time capsules, preserving the essence of storytelling trends across different eras.

Exploratory Data Analysis

Beyond the narrative nuances, exploratory data analysis brings quantitative insights to the forefront. The conversion of running time to minutes and the subsequent visualization of movie durations over the years offer a comprehensive view of temporal trends. This analytical journey bridges the gap between storytelling and data, providing a holistic understanding of cinematic evolution.

References

My work was underlined by studies done in the research paper "The emotional arcs of stories are dominated by six basic shapes" by Andrew J. Reagan, Lewis Mitchell, Dilan Kiley, Christopher M. Danforth, and Peter Sheridan Dodds, published on July 8, 2016. The paper investigates emotional arcs in stories, which provides additional context and inspiration for the clustering approach used in this project.

Note: This readme provides an overview and explanations of the code. For specific code details, refer to the source code file.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
data+scriptssss.json		data+scriptssss.json
dva_project.ipynb		dva_project.ipynb
movie_clusters.png		movie_clusters.png
research_on_storytelling_plots.pdf		research_on_storytelling_plots.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Plots Analysis

Table of Contents

Introduction

Data Cleaning

Classification

Similar Movie Finder

Clustering

Keyword Extraction

Analyzing the Scripts Classified Plot-wise

Analyzing the Scripts Decade-wise

Exploratory Data Analysis

References

About

Releases

Packages

Languages

fatimaazfar/Movie-Script-Analysis

Folders and files

Latest commit

History

Repository files navigation

Movie Plots Analysis

Table of Contents

Introduction

Data Cleaning

Classification

Similar Movie Finder

Clustering

Keyword Extraction

Analyzing the Scripts Classified Plot-wise

Analyzing the Scripts Decade-wise

Exploratory Data Analysis

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages