Skip to content

Use IMDb ratings datasets instead of scraping #2

@thynan

Description

@thynan

Hey, I just found out that there are regularly (I believe daily) updated files from IMDb which contain allmost(?) all of their ratings (over 1.6 million items are in the file). They are available for download for non commercial use.

https://datasets.imdbws.com -> check the title.ratings.tsv.gz file.

I already implemented a feature to test using this with your plugin:

  • I downloaded the file manually to a location where the plugin can read it
  • I loaded the file into a dictionary, and wrote a lookup function based on imdb ID
  • I used this lookup as the first option for getting an episode rating. OMDb is the first fallback, and then IMDb scraping is the second fallback.

I'm currently running your plugin with those additions on my whole library, with TV episodes and IMDb scraping fallback enabled.

Right now I already processed over 5200 items, and I have only 77 OMDB API calls, 31 MDBList API calls, and 42 IMDb scrapes. The rest is all coming from the file.

Let me know if you are interested in my code, then I can create a PR or upload my changes in a fork. Right now it's just a first basic implementation as a POC - I'm sure you can do a better job than me at integrating this. I just wanted to let you know those dataset files exist, as they are super useful to get the ratings for a big library without running into any API restrictions, especially with episode IMDb scraping.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions