As part of DiffScraper, the controller sends jobs to scraping bots, and collects the offers extracted by bots.
DiffScraper Controller needs a connection to a central Redis database (to send jobs to bots). In addition, to store offers returned by bots, a PostgreSQL database needs to be available. Redis and PostgreSQL connection details can be set through the enviromental variables defined in .env.
The DDL for the PostgreSQL database, including predefined scraping bots and comparisons, can be found here.
A controller can be started as follows:
docker-compose up -f docker-compose.controller.yml -d
Several environmental variables can be changed; for example the variable CRON defines the interval between which comparisons are executed.
In a production system, comparisons are launched through a CRON scheduler. For testing purposes, it is also possible to launch comparisons directly through a CLI.
For example, to launch comparison defined in the PostgreSQL comparisons table:
Enter the controller container:
docker exec -it controller bash
This command will launch comparison 13 from the table comparisons:
ts-node cli.ts launchComparison 13
As with the launch of a scheduled comparison, for every outlet defined in a comparison, a scraping job will be created on one of the Redis queues, that will in turn be pulled by bots.