A dataset for textual analysis on arguably the best written comedy television show ever.
Dataset for people who love data science and Seinfeld.
- Details about all the episodes.
- Includes attributes like Director, Episode Name, Air Date etc...
- Complete Scripts of all the episodes.
Upcoming Update will Include :
- Stage locations and cast
The data is scraped from the fan website http://www.seinology.com/.
- Train language models on the corpus.
- Compare the vocabulary with other works on television, film or literature.
- Find corellation between language complexity and popularity.
- Train models to generate scripts based on the data.
- Analyze obscure wods used in the vocabulary of the series.
These are just basic examples, sky is the limit.
The data has been crawled from the http://www.seinology.com/ website.
Changes and Improvement suggestions are welcome. Feel free to comment new additions that you think are useful or drop a PR on the github project.
Wanna buy me coffee - paypal.me/AShrivastava961