A repository that parses ESGF Apache Logs and generates E3SM file request metrics for Native and CMIP6 formats.
Metrics include:
- Cumulative number of requests
- Cumulative GB of data downloaded
-
Install Docker with docker-compose
-
Clone this repository
git clone https://github.com/tomvothecoder/esgf_metrics.git
-
Copy
.env.template
as.env
and configure the environment variables -
Build the Docker containers using
docker-compose
. Containers includepostgres
andesgf_metrics
.sudo docker-compose up --build
-
The
esgf_metrics
container will now automatically run theesgf_metrics
package usingcrontab
at 8:00AM every Tuesday. It will identify new logs, parse them, and generate updated metrics and plots.- There is a separate cronjob on LLNL climate servers that collect access logs from
ESGF nodes every day at 10PM. Logs are stored in
/p/cscratch/esgf-http-logs
. - All
esgf_metrics
parsed logs and metrics are stored in thepostgres
service's Postgres database.
- There is a separate cronjob on LLNL climate servers that collect access logs from
ESGF nodes every day at 10PM. Logs are stored in
-
supervisorctl
sudo supervisorctl stop all sudo supervisorctl start all sudo supervisorctl restart all sudo supervisorctl status supervisorctl tail -f esgf_metrics stdout
-
systemctl
sudo systemctl start docker sudo systemctl stop docker sudo systemctl restart docker
-
Check service logs
sudo docker-compose logs esgf_metrics sudo docker-compose logs postgres
-
Check crontab configuration
sudo docker exec -ti esgf_metrics bash -c "crontab -l"
-
Install Miniconda
-
Create and activate the Conda environment
cd esgf_metrics conda env create -n conda-env/dev.yml conda activate esgf_metrics_dev
-
Create a development branch
git checkout -b dev-branch
-
Update source code and commit changes
-
Push development branch and open a PR
1) Read in logs, here's an example line:
"128.211.148.13 - - [22/Sep/2019:12:01:01 -0700] "GET /thredds/fileServer/user_pub_work/E3SM/1_0/historical/1deg_atm_60-30km_ocean/land/native/model-output/mon/ens1/v1/20180215.DECKv1b_H1.ne30_oEC.edison.clm2.h0.1850-01.nc HTTP/1.1" 200 91564624 "-" "Wget/1.14 (linux-gnu)"\n"
2) Split each log line into a list:
['128.211.148.13',
'-',
'-',
'[22/Sep/2019:12:01:01',
'-0700]',
'"GET',
'/thredds/fileServer/user_pub_work/E3SM/1_0/historical/1deg_atm_60-30km_ocean/land/native/model-output/mon/ens1/v1/20180215.DECKv1b_H1.ne30_oEC.edison.clm2.h0.1850-01.nc',
'HTTP/1.1"',
'200',
'91564624',
'"-"',
'"Wget/1.14',
'(linux-gnu)"']
3) Parse each log line for the directory:
"/thredds/fileServer/user_pub_work/E3SM/1_0/historical/1deg_atm_60-30km_ocean/land/native/model-output/mon/ens1/v1/20180215.DECKv1b_H1.ne30_oEC.edison.clm2.h0.1850-01.nc"
4) Parse directory for the dataset id:
Before:
"/E3SM/1_0/historical/1deg_atm_60-30km_ocean/land/native/model-output/mon/ens1/v1/"
After:
# NOTE: Refer to the templates below for how to translate this
"E3SM.1_0.historical.1deg_atm_60-30km_ocean.land.native.model-output.mon.ens1.v1"
5) Parse directory for file id:
"20180215.DECKv1b_H1.ne30_oEC.edison.clm2.h0.1850-01.nc"
6) Parse for additional info (e.g., timestamp, facets)
This list below includes an example log line from an Apache log and the project specific templates which can be used to parse log lines.
-
Example Log Line
123.123.123.123 - - [22/Sep/2019:12:01:01 -0700] "GET /thredds/fileServer/user_pub_work/E3SM/1_0/historical/1deg_atm_60-30km_ocean/land/native/model-output/mon/ens1/v1/20180215.DECKv1b_H1.ne30_oEC.edison.clm2.h0.1850-01.nc HTTP/1.1" 200 91564624 "-" "Wget/1.14 (linux-gnu)"\n
-
Directory Format Template
%(source)s.%(model_version)s.%(experiment)s.%(grid_resolution)s.%(realm)s.%(regridding)s.%(data_type)s.%(time_frequency)s.%(ensemble_member)s
-
Dataset Template
%(root)s/%(source)s/%(model_version)s/%(experiment)s/%(grid_resolution)s/%(realm)s/%(regridding)s/%(data_type)s/%(time_frequency)s/%
-
Example Log Line
123.123.123.123 - - [14/Jul/2019:06:58:07 -0700] "GET /thredds/fileServer/user_pub_work/CMIP6/CMIP/E3SM-Project/E3SM-1-0/piControl/r1i1p1f1/Lmon/tran/gr/v20180608/tran_Lmon_E3SM-1-0_piControl_r1i1p1f1_gr_000101-050012.nc HTTP/1.1" 206 1573717 "-" "Wget/1.20.1 (linux-gnu)
-
Directory Format Template
%(root)s/%(mip_era)s/%(activity_drs)s/%(institution_id)s/%(source_id)s/%(experiment_id)s/%(member_id)s/%(table_id)s/%(variable_id)s/%(grid_label)s/%(version)s
-
Dataset ID Template
%(mip_era)s.%(activity_drs)s.%(institution_id)s.%(source_id)s.%(experiment_id)s.%(member_id)s.%(table_id)s.%(variable_id)s.%(grid_label)s
E3SM CMIP6 Variables Guideline
-
Example Log Line
123.123.123.123 - - [18/Jul/2019:00:52:54 -0700] "GET /thredds/fileServer/user_pub_work/E3SM/1_0/cmip6_variables/piControl/CMIP6/CMIP/E3SM-Project/E3SM-1-0/piControl/r1i1p1f1/Amon/prc/gr/v20190206/prc_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_000101-050012.nc HTTP/1.0" 404 - "-" "Wget/1.12 (linux-gnu)"