Skip to content
This repository has been archived by the owner on Aug 16, 2024. It is now read-only.

Scrape data about an entire Channel or just a Playlist, or get stats about your Own Watch History.

License

Notifications You must be signed in to change notification settings

CriticalHunter/Youtube_Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

There was an inactivity of more than 2 years on this project. I am achieving this due to multiple reasons. Yet I will create a separate private repo for this and will work on adding new metrics and visualizations although at a slow pace. If it ends nice will merge that with this repo.

Youtube_scrape

MIT license Twitter

Scrape data about an entire Channel or just a Playlist, using Youtube API. No OAuth is required.

✔️ Features

Following features are available :

CLI

  1. create_new :
    1. It creates a sqlite database to store all data.
    2. Database will be placed in the same folder as the project file, named 'youtube.db'
    3. It will have 4 tables - tb_channel, tb_playlist, tb_videos, video_history
    4. You can use programs like DB Browser , which is lightweight, to view the database.
  2. Oldest Video on A Topic :
    1. It is an isolate program, that can be run independently.
    2. It doesn't depend on main code or any database.
  3. Scrape A Channel:
    1. Allows to scrape Channel Details and it's playlists.
    2. It can also scrape details for each video of that channel.
      1. If this option is not chosen, the playlist table won't have Playlist Duration.
  4. Scrape A Single Playlist:
    1. Allows to scrape info about a single Playlist and details about all it's videos.
  5. Load Your History:
    1. Make sure you have downloaded google Takeouts for your account to the PWD.
    2. Make sure you have follwing path './takeout/history/watch-history.html'
    3. Option to keep videos of your history on a separate table or integrate them with main table tb_videos
      1. In order to use next features, you have to integrate them.
  6. Most Watched Video:
    1. You can list your most watched 'n' videos
  7. Early Viewed:
    1. You can list 'n' videos, which you saw earliest after they were uploaded.
    2. There are some discrepencies, as many videos are reuploaded after you have seen it.
      1. Program ignores those
    3. It now only works when you watched it in IST.
  8. Generate Download List:
    1. This will create a text file, that will list Youtube URLs that can be downloaded by Youtube-DL or IDM etc.
    2. It will select videos which are marked 'Worth = 1' i the database.
      1. This operation is to be done by the user directly on the database (using DB Browser or such)
    3. There is option to list videos of a single Channel or from entire DAtabase.
    4. Caution : Once a video is processed by this function, it will be marked 'Is_Downloaded = 1'. Next time this function is run, new video IDs will be considered.
      1. Hence User must make sure, all videos in download_list.txt are downloaded before rewriting the file.

💻 Setup Guide

Below is a detailed guide on setting up the environment.

Youtube API

First you need to have you Youtube API key. Below is a link of a video, that will guide you. Watch from 0:00 - 5:30 Getting Youtube API Key

  1. Note - Youtube API is rate limited to 10000 hits/day.
  2. You can view your quotas at here - console
  3. Cost of operations is decribed here -Youtube API docs
  4. Code has been optimized to decrease quota usage. You can easily work with 50000 videos/day. For more please check your quota limit.

Installation

You need to install google-api-python-client to run this project. github API link Install this library in a virtualenv using pip.

Mac/Linux

pip3 install virtualenv
virtualenv venv
. venv/bin/activate
pip3 install -r requirements.txt

Windows

pip3 install virtualenv
virtualenv venv
venv\Scripts\activate
pip3 install -r requirements.txt

Working Guide

  1. Get Your Youtube API key as shown in above video.
  2. Pip install the requirements.txt
  3. Run the program YT_Scrape.py

The script will ask for required data in the command line and is pretty self-explanatory (Once it runs)

View Samples

♥️ Contributing

There are several ways to help.

  1. Spread the word: More users means more possible people testing and contributing to the app which in turn means better stability and possibly more and better features. You can Twitter or share it on LinkedIn. Every little bit helps !

  2. Make a feature or improvement request: Something can be be done better? Something essential missing? Let us know!

  3. Report bugs

  4. Contribute: You don't have to be programmer to help.

    1. Treat Me A Coffee Instead Paypal

Pull Requests

Pull requests are of course very welcome! Please make sure to also include the issue number in your commit message, if you're fixing a particular issue (e.g.: feat: add nice feature with the number #31).

About

Scrape data about an entire Channel or just a Playlist, or get stats about your Own Watch History.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages