Import logfiles without duplicates
Matomo is an open source web analytics tool, which is a self-hosted alternative to Google Analytics and allows anonymization of user data.
There are multiple ways to input data into Matomo: javascript in the browser, tracking pixel and the direct import of log files from the server (e.g. apache, nginx).
To import log files into Matomo, there is an import script import_logs.py
.
Unfortunately there is one problem with this method: The import script just imports a logfile, but does not check if the data is allready in the Matomo Database.
This tool, is the link between your log files and import_logs.py
MatomoPythonLogImporter checks if there is new data in the logfile, and only submits new data to Matomo.
-
Copy this repository into the Matomo folder, e.g.
/matomo/MatomoPythonLogImporter
-
Edit the paths in
config.sh
-
Edit
config_matomo.config
The format should be{logfilename}={MatomoSiteID}
. It is possibile to connect multiple log files to one Matomo Site ID -
Set up a cronjob:
The following example runs the importer every 5 minutes
*/5 * * * * /usr/bin/sh […]/matomo/MatomoPythonLogImporter/log_importer.sh
This bash script ist pretty straight forward and it should be possible to adapt it fairly quickly to your webserver settings. It is highly recommended to create a test website in matomo, and run the script with one logfile, before addding all domains of your server.
Do whatever you desire with this code. If you find a bug or have an improvement let me know in an issue or pull request