Object-oriented Python web scraper, capable of retrieving and storing products and prices data from popular e-commerce platforms. Currently, this applications works effectively for one of the biggest Polish electronic online stores. My initial intention for this project was to build a general purpose object-oriented Python web scraper, however my work shifted to scraping products' prices data over time and storing it. Then, I came up with a scalable project structure idea and implemented it.
Application is divided into two main directories: database/
, holding code responsible for database connections, ORM models and CRUD operations, and src/
, holding main application logic, including (the most important):
base/
- abstract base classes, determining the structure of scraping modules responsible for storing scraped pages and extracting specific data from themme/
- actual implementation of modules defined inbase/
for one of electronic e-commerce stores, ie. site-specific code for data extractionbrowser.py
- running & managing Playwright browsers, this is where pages are visited and initially scrapedmanagers.py
- higher-level, universal (in terms of being site-independent) scraper logic, wrapping up and making use of the site-specific code
- Python
- Playwright
- SQLAlchemy ORM
- Postgres Database