Skip to content

As part of DiffScraper, the controller sends jobs to scraping bots, and collects the offers extracted by bots.

License

Notifications You must be signed in to change notification settings

godfriedmeesters/controller

Repository files navigation

DiffScraper Controller

As part of DiffScraper, the controller sends jobs to scraping bots, and collects the offers extracted by bots.

System Requirements

DiffScraper Controller needs a connection to a central Redis database (to send jobs to bots). In addition, to store offers returned by bots, a PostgreSQL database needs to be available. Redis and PostgreSQL connection details can be set through the enviromental variables defined in .env.

The DDL for the PostgreSQL database, including predefined scraping bots and comparisons, can be found here.

Installation Guide

A controller can be started as follows: docker-compose up -f docker-compose.controller.yml -d

Several environmental variables can be changed; for example the variable CRON defines the interval between which comparisons are executed.

DiffScraper Controller CLI

In a production system, comparisons are launched through a CRON scheduler. For testing purposes, it is also possible to launch comparisons directly through a CLI.

For example, to launch comparison defined in the PostgreSQL comparisons table:

Enter the controller container: docker exec -it controller bash

This command will launch comparison 13 from the table comparisons: ts-node cli.ts launchComparison 13

As with the launch of a scheduled comparison, for every outlet defined in a comparison, a scraping job will be created on one of the Redis queues, that will in turn be pulled by bots.

About

As part of DiffScraper, the controller sends jobs to scraping bots, and collects the offers extracted by bots.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published