This repository contains examples of web scrapers used in Internet Source
node to demonstrate their use cases and capabilities.
- Install the newest version of python from https://python.org/downloads. Python 3.7+ is required.
- Download this repository (here we placed it to D drive, so full path is
D:\python-scraper-examples
) - Open
Command Prompt
and navigate to the repository root folder - Create and activate virtual environment
python -m venv env
env\Scripts\activate.bat
- Install scraper dependencies
pip install -r requirements.txt
- Download chromium browser for
webapp_scraper
:
python -m playwright install chromium
- Register web scrapers in
PolyAnalyst
:- Open
PolyAnalyst Administrative Tool
- Go to
Server setting
- Open
Web scrapers
context menu and click onAdd item
- Enter the scraper name in the
Name
field. This name will be displayed in the drop-downScraper
menu in theInternet Source
node wizard - Enter a command in the
Command
field. For example,
D:\python-scraper-examples\env\Scripts\python.exe D:\python-scraper-examples\megaputer_blog.py
- Click
Save changes
to apply new settings
- Open
- Add
Internet Source
node to workspace - Choose one of scrapers registered earlier in the drop-down
Scraper
menu - Set parameters if selected scraper supports them
- Execute node
This project is licensed under the MIT License - see the LICENSE file for details