ts-scraper

Web scraper written on TypeScript

It is a webscraper which can be extendable to do multiple tasks on scraped content. It propogates through the links it finds in the page. It makes use of ts-jobrunner library to run everything in terms of jobs.

Installation

npm install --save ts-scraper

Core API

CoreScraper(abstract)
    - protected init(): void
    - protected canFetchUrl(url): boolean
    - protected createJob(link): CoreJob
    - protected onFetchComplete(link, response): void
    - public start(): void
PageScraper(abstract)
    - public async start()
    - public abstract parse(jquery: JQuery): any;
ScrapeJob(abstract)
    - public run()
    - abstract createPageScraper(url: string): PageScraper

There are three components in this library CoreScraper, PageScraper and ScrapeJob.
ScrapeJob extends CoreJob from ts-jobrunner library. Its object exposes function createPageScraper(url) which creates PageScraper which actally mines/scrapes the page.
PageScraper exposes a function parse($) which takes jQuery object. You can mine the page as your wish and return the parsed response
CoreScraper is the main object which runs the scraping process. Its object has to have above mentioned functions.
- init() all initiations can be put here
- canFetchUrl(url) should tell whether to fetch the found link url
- createJob(link) should return a CoreJob type job, which then be queued
- onFetchComplete(link, response) will get triggered when a ScrapeJob job is completed ie., when a PageScraper is done. You can have code which handles the response returned by PageScraper here
- start() will actually the scraping process (start() on JobRunner)

Example

Please find example usage in src/test/test-scraper folder

Suggestions and contributions are open. Happy coding :)

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
src		src
.gitignore		.gitignore
.npmignore		.npmignore
.travis.yml		.travis.yml
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json
tslint.json		tslint.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ts-scraper

Installation

Core API

Example

About

Uh oh!

Releases

Packages

Uh oh!

Languages

pskd73/ts-scraper

Folders and files

Latest commit

History

Repository files navigation

ts-scraper

Installation

Core API

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages