Skip to content

keithhetrick/web-scraping-with-node

Repository files navigation

web-scraping-with-node

A collection of web scraping projects using Node.js & their corresponding technologies.

Web Scrapers built using:

Table of Contents

Formula One Scraper (Node.js, Cheerio, Node-Fetch, PDFKit)

  • Scrapes the Formula One website for the latest news, results,standings, converts the scraped data into a PDF file, and saves it to a local folder.

Book Scraper (Node.js, Cheerio, Axios, Json2Csv, CsvToJson)

  • Scrapes the website for the latest books, converts the scraped data into a CSV file & saves it to a local folder.

Hacker News Scraper (Node.js, Cheerio, Got-Scraping, Crawlee, Docker)

  • Version 1: Scrapes the website for the latest news.
  • Version 2: The CheerioCrawler version using Crawlee is similar, but since Crawlee "simulates" the actions of a real user, the browser settings are defaulted to "headless: false", so the designated browser opens & the whole program runs as automated. Also, any & all Datasets are stored in a storage folder in the root directory, & containerized using Docker.

Product Scraper (Node.js, Cheerio, Playwright, Crawlee, Docker)

  • Version 1: Scrapes a website for a specific product & takes a screenshot of the webpage. Code is currently set for mintmobile.com.
  • Version 2: The PlaywrightCrawler version using Crawlee is similar, but since Crawlee "simulates" the actions of a real user, the browser settings are defaulted to "headless: false", so the designated browser opens & the whole program runs as automated. Also, any & all Datasets are stored in a storage folder in the root directory, & containerized using Docker.

Amazon Scraper (Node.js, Cheerio, Puppeteer, Playwright)

  • Version 1: Scrapes Amazon for a specific product & takes a screenshot of the webpage.
  • Version 2: The Playwright version is similar, but since Playwright "simulates" the actions of a real user, the browser settings are defaulted to "headless: false", so the designated browser opens & the whole program runs as automated.

Yelp Scraper (Node.js, Cheerio, Unirest)

  • Scrapes Yelp for the latest restaurants, their corresponding information & saves it in a local folder.

Google Search Scraper (Node.js, Cheerio, Unirest)

  • Scrapes Google for the latest search results.

Google Jobs Scraper (Node.js, Cheerio, Unirest, PDFKit)

  • Running as a background app via PM2 (Process Management), Job scrapers scrapes Google for the latest jobs in an specific area, converts the scraped data into a PDF file, saves to a local folder, & uploaded/sent as an email via custom-made Email Sender App.

Google Images Scraper (Node.js, Cheerio, Unirest)

  • Scrapes Google for the latest images in an area, and downloads them to a local folder.

Website Image Scraper (Node.js, Puppeteer)

  • Scrapes a website for all of its images, and downloads them to a local folder.

Youtube Trending Scraper (Node.js, Express, yt-trending-scraper, EJS)

  • Scrapes YouTube for the latest trending videos by country & category.

Multiple Website Scraper (Node.js, Puppeteer, Node-Cron)

  • Scrapes multiple websites for images, texts, can perform operations such as button clicking, form submission, as well as saves the scraped data to a local folder. Can also be automated using Node-Cron.

License

MIT

Author

@keithhetrick

About

A collection of web scraping projects using Node/Javascript

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published