Web server log files are the primary source of information to reconstruct the course of events when vulnerable web applications are exploited. However, extracting the relevant information from huge files can be a difficult task. LORG aims to implement various state of the art approaches to detect attacks against web applications within HTTP traffic logs (e.g. Apache's access_log), including signature-based, statistical and machine learning techniques. Detected incidents are subsequently classified into hand-crafted and automated to distinguish whether the attacker is a man or a machine. GeoIP- and DNSBL lookups can be performed to see if the attacks originate from a certain geolocation or botnet. Furthermore attacks can be quantified in terms of success or failure, based on anomalies within the size of HTTP responses, HTTP response codes or active replay of attacks.
Pre-alpha. Previously developed as http://sourceforge.net/projects/webforensik/
Usage: lorg [-i input_type] [-o output_type] [-d detect_mode]
[-a add_vector] [-c client_ident] [-b dnsbl_type]
[-q quantification] [-t threshold] [-v verbosity]
[-n] [-u] [-h] [-g] [-p] input_file [output_file]
-i allowed input formats: common combined vhost logio cookie
-o allowed output formats: html json xml csv
-d allowed detect modes: chars phpids mcshmm dnsbl geoip all
-a additional attack vectors: path argnames cookie agent all
-c allowed client identfiers: host session user logname all
-b allowed dnsbl types: tor proxy zombie spam dialup all
-q allowed quantification types: status bytes replay all
-t threshold level as value from 0 to n (default: 10)
-v verbosity level as value from 0 to 3 (default: 1)
-n do not summarize results, output single incidents
-u urldecode encoded requests (affects reports only)
-h try to convert numerical addresses into hostnames
-g enable geotargeting (separate files are needed!)
-p perform a naive tamper detection test on logfile
- Presentation [2013]: Web Application Forensics - Slides for a Talk at Hack in Paris 2013
- Thesis [2012]: Implementation of a Framework for Advanced HTTPD Logfile Security Analysis
- Project Description [2012, German]: WebForensik - Forensische Analyse von Apache HTTPD Logfiles
Q: What does LORG stand for?
A: Emm... 'Logfile Outlier Recognition and Gathering'. Also it is an old Irish word ([ˈl̪ˠɔɾˠə]) for trace, track, trail.
Q: Why would anyone call a programm LORG?
A: Unfortunately all the other cool names had already been taken.
Q: What is LORG all about?
A: It's a PHP-CLI programm that implements various detection techniques to automatically scan your HTTPD logfiles for attacks against web applications.
Q: A CLI programm? Why in hell use PHP?
A: At the beginning it seemed a good idea, because PHPIDS could be easily integrated. Then things got bigger than expected...
Q: What logfile formats are supported?
A: Out of the box, common, combined (Apache, nginx) and some other formats are supported. All mod_log_config (http://httpd.apache.org/docs/current/mod/mod_log_config.html) compatible formats like 'custom' => '%h %l %u %t \"%r\" %>s %b %{X-Forwarded-For}'
can be parsed, if correctly defined in $allowed_input_types
in the code.
Q: What about W3C-extended (IIS) log file formats?
A: Convert, using e.g. http://rebex.net/rconvlog/
Q: How do i can various separate logfiles at once?
A: Merge, using e.g. http://code.google.com/p/logmerge. If you have several access.log.*.gz files, try something like gunzip access.log.*.gz && cat access.log.* > merged.log
.
Q: How to whitelist/exclude certain noisy clients (e.g. legimimate pentesting security scanners) from detection?
A: grep -v
is your friend.
Q: Why does the urldecode switch (-u
) have no effect on detection results?
A: The -u
switch only affects visualization (= output file). For detection, all HTTP requests are automatically url-decoded before processing.
Q: Is it possible to output all incidents, including harmless ones?
A: Yes. Use -t 0
.
Q: How fast is LORG?
A: LORG's performance is primarily dependend on the selected detect mode (-d
). While the 'chars' mode performs acceptable (about 50.000 loglines per minute), advanced learning algorithms like 'mcshmm' will take much longer (while beeing more accurate). To speed up processing, try to set $only_check_webapps = true
in the code and do not use any additional attack vectors (-a
) or DNS/DNSBL (-h
, -b
) lookups, as they can be performance killers.
Q: How much memory does LORG require?
A: LORG was written with low memory in mind. Depending on the selected detection modes it might still require up to the size of the processed logfile in the worst case. If summarization is disabled (-n
), LORG should not require more than 4MB of memory as all loglines are parsed, analyzed and directly written to the output file without caching.
Q: Will it work on Windows?
A: No.
- To make the PHPIDS detection mode work, a symlink
IDS -> .
needs to be created in thephpids/
folder - Geotargeting is incompatible with the
php5-geoip
Debian/Ubuntu package - When showing the results within an SIMILE Exhibit map, your web browser might hang if several thousand or more results are to be displayed
- And many, many more...