Logparser common log format error (nginx/apache) #1810
Closed
Description
Bug report
Using the logparser plugin to parse nginx access log files does not parse http basic auth requests when the username contains a digit or spaces.
Applies to both the COMMON_LOG_FORMAT and COMBINED_LOG_FORMAT grok pattern. Issue may be relevant for apache logs as well.
Relevant telegraf.conf:
# Stream and parse log file(s).
[[inputs.logparser]]
files = ["/var/log/nginx/access.log"]
from_beginning = false
[inputs.logparser.grok]
patterns = ["%{COMMON_LOG_FORMAT}"]
measurement = "nginx_access_log"
System info:
Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u2 (2016-01-02) x86_64 GNU/Linux
Telegraf - version 1.0.0
nginx version: nginx/1.6.2
Steps to reproduce:
- Set up telegraf.conf file as above
- Echo the examples to the logfile (see additional info)
- Telegraf will not match the grok pattern to the log
Expected behavior:
Telegraf matches the log file using either the COMMON_LOG_FORMAT or the COMBINED_LOG_FORMAT and passes the log onto the outputs.
Actual behavior:
When the username contains digits the log is ignored. When containing spaces words are parsed as other attributes (e.g. client_ip will be parsed as one of the words).
Additional info:
Here are some example logs that causes the error:
Using numbers in the http basic auth username:
127.0.0.1 - username123 [25/Sep/2016:00:19:43 +0200] "GET / HTTP/1.1" 401 590 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36"
Using spaces in the http basic auth username:
127.0.0.1 - my username here [25/Sep/2016:00:17:36 +0200] "GET / HTTP/1.1" 401 590 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36"