Web scraper written on TypeScript
It is a webscraper which can be extendable to do multiple tasks on scraped content. It propogates through the links it finds in the page. It makes use of ts-jobrunner library to run everything in terms of jobs.
npm install --save ts-scraper
CoreScraper(abstract)
- protected init(): void
- protected canFetchUrl(url): boolean
- protected createJob(link): CoreJob
- protected onFetchComplete(link, response): void
- public start(): void
PageScraper(abstract)
- public async start()
- public abstract parse(jquery: JQuery): any;
ScrapeJob(abstract)
- public run()
- abstract createPageScraper(url: string): PageScraper
- There are three components in this library
CoreScraper,PageScraperandScrapeJob. ScrapeJobextends CoreJob fromts-jobrunnerlibrary. Its object exposes functioncreatePageScraper(url)which createsPageScraperwhich actally mines/scrapes the page.PageScraperexposes a functionparse($)which takes jQuery object. You can mine the page as your wish and return the parsed responseCoreScraperis the main object which runs the scraping process. Its object has to have above mentioned functions.init()all initiations can be put herecanFetchUrl(url)should tell whether to fetch the found linkurlcreateJob(link)should return aCoreJobtype job, which then be queuedonFetchComplete(link, response)will get triggered when aScrapeJobjob is completed ie., when aPageScraperis done. You can have code which handles the response returned byPageScraperherestart()will actually the scraping process (start()onJobRunner)
Please find example usage in src/test/test-scraper folder
Suggestions and contributions are open. Happy coding :)