Description
Hello @vsoch!
Over the past few days, I used watchme quite a lot to perform web scraping tasks and I found the tool very handy for such tasks. To put some context, I needed a way to export data to external datasources, such as Prometheus (via Pushgateway) in this case.
I decided to develop a new layer on watchme in order to implement exporters. It could be used for example to export data to messaging queues or databases. With the recent development, you can now do :
watchme create weather-watcher --exporter pushgateway
This will create a [exporter-pushgateway]
section in the watchme configuration, following templates that are specifically designed for exporters.
[watcher]
active = true
type = urls
[exporter-pushgateway]
url = localhost:9091
type = pushgateway
active = true
Note : I am aware that there is already an export function, but I could not iterate on it, as I found that it was used to export all the content available in the repository.
I decided to export data in the run
function of the task lifecycle.
# Finally, finish and export the runs.
if test is False:
self.finish_runs(results)
self.export_runs(results, exporters)
I also added the option to specify a regex when trying to perform scraping specifying an url selection. It looks like this :
[task-temperature]
url = https://www.accuweather.com/en/lu/luxembourg/228714/weather-forecast/228714
selection = .local-temp
get_text = true
func = get_url_selection
active = true
type = urls
regex = [0-9]+
header_user-agent = Mozilla/5.0
This option goes very handy to target only numbers for web scraping.
I developed quite a lot of functions in order to enable exporters and regexes and all the modifications are available on my github on the repo named watchme-prometheus
In the end, I was able to run scheduled tasks, exporting data every two seconds from a weather website and exporting data to Pushgateway : https://imgur.com/a/MJDuIUA
I am curious to know if you would be interested by such modifications.
In any cases, I had a ton of fun developing this, and the way the app was built made it very easy to iterate.
Thank you!