Merge pull request prathimacode-hub#985 from BMaster123/main

prathimacode-hub · web-flow · commit 16aaa9c37ad5 · 2021-08-23T21:06:17.000+05:30
Added YouTube Trending Videos Scraper
diff --git a/WebScrapingScripts/YouTube Trending Videos Scraper/Images/Output.PNG b/WebScrapingScripts/YouTube Trending Videos Scraper/Images/Output.PNG
diff --git a/WebScrapingScripts/YouTube Trending Videos Scraper/README.md b/WebScrapingScripts/YouTube Trending Videos Scraper/README.md
@@ -0,0 +1,43 @@
+# Package/Script Name
+
+## Aim
+
+The aim of this project is to make a web scraper that scrapes YouTube.
+
+## Purpose
+
+The purpose of this project is to get the top 10 trending videos in each of the categories
+
+## Short description of package/script
+
+- The program gets the top 10 trending videos in each of YouTube's 4 categories and stores them in an excel fileby opening Chrome and navigating to the pages it needs to scrape.
+- Imported libraries 
+    - BeautifulSoup
+    - Selenium
+    - Pandas
+    - Requests
+
+## Workflow of the Project
+
+1. Import libraries
+2. Make the list of URLs to be scraped from
+3. Scrape the URLs
+4. Store the results in a pandas dataframe
+5. Create an excel file
+
+
+## Setup instructions
+
+- Make sure BeautifulSoup, Selenium, and Pandas are installed
+- Make sure Chrome webdriver is installed
+- Run the program and an excel file will be created with the scraped information
+
+
+## Output
+
+![image](WebScrapingScripts\YouTube Trending Videos Scraper\Images\Output.PNG)
+
+
+## Author(s)
+
+Bhavesh Mandalapu
diff --git a/WebScrapingScripts/YouTube Trending Videos Scraper/requirements.txt b/WebScrapingScripts/YouTube Trending Videos Scraper/requirements.txt
@@ -0,0 +1 @@
+Libraries used : BeautifulSoup, Selenium, Pandas, and requests
diff --git a/WebScrapingScripts/YouTube Trending Videos Scraper/yt-trending-scraper.py b/WebScrapingScripts/YouTube Trending Videos Scraper/yt-trending-scraper.py
@@ -0,0 +1,47 @@
+from bs4 import BeautifulSoup
+import requests
+from selenium import webdriver
+import pandas as pd
+
+# URLs of the trending pages
+urls = [
+    "bp=6gQJRkVleHBsb3Jl",
+    "bp=4gINGgt5dG1hX2NoYXJ0cw%3D%3D",
+    "bp=4gIcGhpnYW1pbmdfY29ycHVzX21vc3RfcG9wdWxhcg%3D%3D",
+    "bp=4gIKGgh0cmFpbGVycw%3D%3D",
+]
+
+# Opens Chrome
+driver = webdriver.Chrome()
+
+# List of scraped video titles
+list = []
+
+# Goes to each of the URLs in the urls list and gets the video titles of the top 10 trending videos
+for url in urls:
+    driver.get(f"https://www.youtube.com/feed/trending?{url}")
+
+    content = driver.page_source.encode("utf8").strip()
+    soup = BeautifulSoup(content, "lxml")
+    titles = soup.find_all("a", id="video-title")
+
+    for i in range(0, 10):
+        print(titles[i].text)
+        list.append(titles[i].text)
+
+# Makes a dictionary containing the 4 categories as the keys and the values as the top 10
+# trending videos in the section
+trending_dict = {
+    "Now": list[:10],
+    "Music": list[10:20],
+    "Gaming": list[20:30],
+    "Movies": list[29:39],
+}
+
+# Makes a dataframe of trending_dict so that it can be made into an excel file
+df = pd.DataFrame(trending_dict)
+
+# Makes the excel file
+writer = pd.ExcelWriter("YouTube-Trending.xlsx")
+df.to_excel(writer, "Sheet1", index=False)
+writer.save()

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+Libraries used : BeautifulSoup, Selenium, Pandas, and requests`