A template for scraping data from a single web page in TypeScript (Node.js). The URL of the web page is passed in via input, which is defined by the [input_schema.json].
The scraped data in this template are page headings but you can easily edit the code to scrape whatever you want from the page.
- Scrapeless SDK - toolkit for building Actors
- Puppeteer - a Node.js library that controls Chrome or Chromium browsers programmatically
actor.input()
gets the input where the page URL is definedclient.browser.create()
get the browser websocket endpointpage.goto(url)
goto the target websiteactor.addItems()
save the crawled data to dateset
fork or clone the repository to your github, link your github repository to Scrapeless Actor. Then:
- Build the Actor
- Run the Actor