-
Notifications
You must be signed in to change notification settings - Fork 13
HelpLogrep
Timid Robot Zehta edited this page Aug 29, 2016
·
2 revisions
A handy tool for sophisticated, ad-hoc analysis of webserver logs.
logrep [--mode MODE] [--include | --exclude CLASSES] [-H | -R]
[--output FIELDS] [--filter FILTERS] [--last LAST_N]
[--sort LIM:FIELDS:DIRECTION] [--config CFG_FILE] [--quiet]
[LOG_FILE]
-m MODE There are three modes:
--mode - "grep" parses an entire log file (default).
- "tail" reads from the end of the file.
- "top" shows running performance stats.
-i, -e CLASSES Include or exclude the given URL "classes". You can
--include configure logrep to classify URLs by a set of
--exclude regular expressions. See the installation docs and
/etc/wtop.cfg for how to configure your own classes.
--include and --exclude are mutually exclusive.
Examples:
--include "home,search,wiki"
--exclude "img,xml,js"
-f FILTERS -f filters act on named fields.
--filter There is support for strings & numbers, greater
than (>), less than (<), equals (=), not-equals
(!=), and regular expression match (~ and !~).
For example: Filter successful requests that were
over 10kB in size that do not have "example.com"
in the Referer field:
-f "status=200,bytes>10000,refdom!~example.com"
AVAILABLE FIELDS:
msec millisecond response time
fbmsec millisecond response time (first byte)
ip The IP address of the client
lip The IP address of the server
url The path of the request, ex. "/home"
ref "Referer" header
refdom domain part of the "Referer" header
bytes Bytes sent
ua User-agent header
uas First 30 characters of ua
class URL class, configurable in wtop.cfg
status HTTP status code, eg 200, 301, 404
proto Protocol version, eg "HTTP/1.1"
method HTTP method, eg "GET", "POST"
bot Is a robot? 1 or 0. Only a guess.
botname eg "Googlebot", "Nutch", "Slurp", etc
ts Unix timestamp of the request
year
month
day
hour
minute
country country name (see Geocoding, below)
cc ISO-639 country code (see below)
-H, -R Shorthand for a useful but incomplete filter of
robot user-agents. Equivalent to --filter 'bot=0'
or --filter 'bot=1'
-o FIELDS Output only the given fields, tab-delimited. All
--output of the fields listed for --filter are available.
Example:
$ logrep -o 'cc,msec,url'
UK 34 /Madonna.jpg
CA 34 /Padma-Lakshmi.jpg
UK 34 /Shaun-Woo.jpg
US 184 /Ben-Stiller.jpg
...
AGGREGATE FUNCTIONS:
In -m grep mode you can use aggregate functions
on numeric fields such as bytes and msec. Any
non-aggregate fields in the list will be used to
group records together.
avg(FIELD) mean average
count(*) record count
dev(FIELD) deviation (square root of variance)
iqm(FIELD) Interquartile Mean (see IQM below)
max(FIELD) highest seen value
min(FIELD) lowest seen value
miqm(FIELD) moving interquartile mean (see IQM below)
sum(FIELD) summation of all values
var(FIELD) population variance
Example (grouped by status):
$ logrep -o 'status,count(*),avg(msec)'
200 4196 242.58
302 5 79.75
404 1 9.00
304 798 15.76
-s LIM:FIELDS:DIRECTION
--sort Use this option to sort & limit aggregate records.
LIMIT is the number of records to return, FIELDS
is a comma-delimited list of column positions
starting with 1, and DIRECTION is either
'descending' (default) or 'ascending'.
Example (total bytes sent, by hour & minute)
$ logrep -o 'hour,minute,sum(bytes)' -s'3600:1,2:a'
12 0 1895927
12 1 7418972
12 2 2103828
12 3 7419371
12 4 1680468
...
Example (the 10 most popular URLs):
$ logrep -o 'url,count(*)' -s '10:2'
/home 23718
/wiki 8211
/about 2703
...
-l LAST_N (grep mode) Only read the last N log lines.
--last
-c CFG_FILE Feed logrep a custom config file. By default it
--config will search for a file to use in the following
order:
VirtualEnv + /etc/wtop.cfg
PYTHONUSERBASE + /etc/wtop.cfg
USER_BASE + /etc/wtop.cfg
Python Lib + /etc/wtop.cfg
/etc/wtop.cfg
Platform appropriate path separaters are used.
-q, --quiet Quiet mode. Does not print warnings to stderr.
-d, --debug Print debug messages to stderr.
--line-buffered Force output to be line buffered. By defaut, output is
buffered when standard output is not a tty.
LOG_FILE The path to a log file. By default logrep will
read from the file path specified in wtop.cfg
If you specify "-", logrep will read from STDIN.
GEOCODING:
logrep will use the MaxMind GeoIP library if it is installed. This
will enable two extra fields for filtering and output: country
(eg "United Kingdom"), and cc (ISO-639 country code, eg "UK"). These
are a *guess* at the country the HTTP client is from.
IQM:
logrep will use the python-iqm module if it is installed. This will
enable two extra aggregation fields: iqm, miqm
KNOWN BUG:
Some installations of Apache have HostnameLookups defaulted to On.
This means that the %h field will contain the fully-qualified domain
name of the client (xdsl456.foo.example.com) instead of the IP
address (123.1.2.3). Geocoding will work but will require a DNS
lookup to resolve the IP address. Using the "cc" or "country"
field in this case will generate a *LOT* of DNS traffic and can
hang the program. It is recommended to explicitly set
HostnameLookups Off in your Apache configuration.
EXAMPLES:
"wtop" for all human traffic:
$ logrep -m top -f 'bot=0' access.log
Status code & response times for all Googlebot homepage hits:
$ logrep -f 'botname=Googlebot' -i home -o status,msec
Tail for pages about Angelina Jolie or Brad Pitt sent from example.com
$ logrep -m tail -f 'url~jolie|pitt,ref~example.com' access.log
Get maximum response size and average response time for requests
grouped by URL class:
$ logrep -o 'class,max(bytes),avg(msec)' access.log
0.7.9 2014 Oct 03 https://github.com/ClockworkNet/wtop