Node Puppeteer Crawler

This is an example for NodeJS & Puppeteer Crawler on MongoDB with crawling playlists of youtube music Crawl all (ALMOST) songs' information of declared playlists in youtube music

Has features below (and more actually):

Nodejs + Puppeteer crawler
- Simple queue
- Perfect SPA crawling
- Can crawl multiple sites
  - Just focus on one site
Data management API
Mongo Saving
- Simple DB Api

Requirement

MongoDB (3.2 +)
NodeJS (8.0.0 +) - (Npm)

Usage

1. Start Mongodb service

./src/sh/mongodb_serv.sh # not tested recently

2. Start crawling, server running on port 5556

npm start

3. Exist APIs

/api/music # (get) get all data, Pagination: ?skip=10&limit=20&sort={"price":-1} , etc. skip = page * limit
/api/music?title=If I were a boy&artist=adele # (get) query
/api/music # (post, Content-Type: application/json, []) save multiple results
/api/music/:id # (get) get one item with mongodb _ID
/api/music/:id # (delete) delete on item with mongodb _ID
/api/music # (put) put all data status: 1, can use like this for other column $set={"status":2}
/api/music/:id # (put) put one record status: 1, could use like this $set={"status":2} for other columns

4. For daily jobs

./src/sh/crawl.sh # tested recently many times

dev

target config folder src/targets-config/*

(crawl/api/custom command was tested)

npm run api # Just run API
npm run crawl # Run targets
npm run crawl:dev # Run dev targets (src/targets-config/*.dev.js)
npm run custom # Run custom.js
npm run example # Run example.back.js, will not save datas
npm run start # Run API & Crawl

Paramsters on using

./node_modules/babel-cli/bin/babel-node.js ./src/** --**

--dont-save-data # wo'nt save data
--dev-mode # Just run *.dev.js and will restrict crawl times

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
logs		logs
screenshot		screenshot
src		src
.babelrc		.babelrc
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Node Puppeteer Crawler

Has features below (and more actually):

Requirement

Usage

1. Start Mongodb service

2. Start crawling, server running on port 5556

3. Exist APIs

4. For daily jobs

dev

Paramsters on using

About

Releases

Sponsor this project

Packages

Contributors 2

Languages

hiyali/node-crawler-on-mongodb

Folders and files

Latest commit

History

Repository files navigation

Node Puppeteer Crawler

Has features below (and more actually):

Requirement

Usage

1. Start Mongodb service

2. Start crawling, server running on port 5556

3. Exist APIs

4. For daily jobs

dev

Paramsters on using

About

Topics

Resources

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Contributors 2

Languages

Packages