subtitle-based film similarities
Do similar films use a similar language? SUB ROSA addresses this question by giving users the ability to examine movies for speech-related features. These features are extracted from subtitle data using methods from Natural Language Processing, Stylometry and Information Retrieval.
For detailed information about these methods, please read this paper.
This work was realized by Jan Luhmann as part of the course ”Drama Mining und Film-Analyse” (summer semester 2019) under the supervision of Manuel Burghardt and Jochen Tiepmar at the University of Leipzig.
Subtitle data was kindly provided by the team of OpenSubtitles.
- Make sure you have Python 3 installed. Also install dependencies using
pip
:
pip install Flask numpy scikit-learn
- Clone this repository.
git clone https://github.com/bbrause/subrosa.git
- Move to the repository folder and start the app.
cd subrosa
python3 app/app.py