Description
Describe the bug
Promtail consistently seems to fail when sent syslog messages that exceed a certain length:
level=warn ts=2021-02-27T06:30:00.666651553Z caller=syslogtarget.go:183 msg="error parsing syslog stream" err="found EOF after \"<142>1 2021-02-27T00:29:59.581-06:00 testhost testapp - - - <INFO> <main> <1614407399507> SyslogTest::<none>(): vwxyz{|ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcde
...
MNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdef\", expecting a SYSLOGMSG containing 8198 octets"
The syslog messages are being sent in RFC 5424 octet-framing from a client app running on one machine directly to Promtail running on another machine; there is no rsyslog or syslog-ng in between.
Using a syslog message generation program allows me to vary the message size for testing. Messages with a length less than 8198 are properly consumed by Promtail. (They will show up just fine when seen in Grafana). Messages with a length >= 8198 cause the above failure.
Replacing Promtail with rsyslog (configured with a $MaxMessageSize of 64k) shows that rsyslog correctly receives the same messages that Promtail rejects.
Issue #2232 mentions that Promtail does not enforce any max message length, but the above possibly suggests otherwise.
To Reproduce
Steps to reproduce the behavior:
- Start Promtail with syslog scraper.
- Connect from non-local client to Promtail.
- Send octet-framed syslog message with MSGLEN >= 8198.
Expected behavior
The message should be consumed and handled just like any other message.
Environment:
- Two Linux VMs, one running Promtail, one running a syslog client.
- Loki 2.1.0 and associated Promtail
Screenshots, Promtail config, or terminal output
Promtail configuration:
...
scrape_configs:
- job_name: syslog
syslog:
listen_address: 0.0.0.0:1514
idle_timeout: 120s
label_structured_data: yes
use_incoming_timestamp: yes
relabel_configs:
- source_labels: ['__syslog_message_hostname']
target_label: 'host'
Below is the output of tcpdump taken on the Promtail VM:
12:13:21.845847 IP (tos 0x0, ttl 64, id 3584, offset 0, flags [DF], proto TCP (6), length 8255)
dev-2.56630 > dev-3.fujitsu-dtcns: Flags [P.], cksum 0x4134 (incorrect -> 0xf98f), seq 1:8204, ack 1, win 229, options [nop,nop,TS val 3532191262 ecr 3492927400], length 8203
0x0000: 4500 203f 0e00 4000 4006 ebb6 ac10 6470 E..?..@.@.....dp
0x0010: ac10 6471 dd36 05ea f243 fc79 f91d 2772 ..dq.6...C.y..'r
0x0020: 8018 00e5 4134 0000 0101 080a d288 f61e ....A4..........
0x0030: d031 d7a8 3831 3938 203c 3134 323e 3120 .1..8198.<142>1.
0x0040: 3230 3231 2d30 322d 3237 5431 323a 3133 2021-02-27T12:13
0x0050: 3a32 302e 3731 382d 3036 3a30 3020 7465 :20.718-06:00.te
0x0060: 7374 686f 7374 2074 6573 7461 7070 202d sthost.testapp.-
0x0070: 202d 202d 203c 494e 464f 3e20 3c6d 6169 .-.-.<INFO>.<mai
0x0080: 6e3e 203c 3136 3134 3434 3936 3030 3639 n>.<161444960069
0x0090: 333e 2053 7973 6c6f 6754 6573 743a 3a3c 3>.SyslogTest::<
0x00a0: 6e6f 6e65 3e28 293a 2076 7778 797a 7b7c none>():.vwxyz{|
0x00b0: 4142 4344 4546 4748 494a 4b4c 4d4e 4f50 ABCDEFGHIJKLMNOP
0x00c0: 5152 5354 5556 5758 595a 5b5c 5d5e 5f60 QRSTUVWXYZ[\]^_`
0x00d0: 6162 6364 6566 6768 696a 6b6c 6d6e 6f70 abcdefghijklmnop
0x00e0: 7172 7374 7576 7778 797a 7b7c 4142 4344 qrstuvwxyz{|ABCD
0x00f0: 4546 4748 494a 4b4c 4d4e 4f50 5152 5354 EFGHIJKLMNOPQRST
0x0100: 5556 5758 595a 5b5c 5d5e 5f60 6162 6364 UVWXYZ[\]^_`abcd
0x0110: 6566 6768 696a 6b6c 6d6e 6f70 7172 7374 efghijklmnopqrst
0x0120: 7576 7778 797a 7b7c 4142 4344 4546 4748 uvwxyz{|ABCDEFGH
0x0130: 494a 4b4c 4d4e 4f50 5152 5354 5556 5758 IJKLMNOPQRSTUVWX
0x0140: 595a 5b5c 5d5e 5f60 6162 6364 6566 6768 YZ[\]^_`abcdefgh
0x0150: 696a 6b6c 6d6e 6f70 7172 7374 7576 7778 ijklmnopqrstuvwx
0x0160: 797a 7b7c 4142 4344 4546 4748 494a 4b4c yz{|ABCDEFGHIJKL
0x0170: 4d4e 4f50 5152 5354 5556 5758 595a 5b5c MNOPQRSTUVWXYZ[\
0x0180: 5d5e 5f60 6162 6364 6566 6768 696a 6b6c ]^_`abcdefghijkl
0x0190: 6d6e 6f70 7172 7374 7576 7778 797a 7b7c mnopqrstuvwxyz{|
0x01a0: 4142 4344 4546 4748 494a 4b4c 4d4e 4f50 ABCDEFGHIJKLMNOP
0x01b0: 5152 5354 5556 5758 595a 5b5c 5d5e 5f60 QRSTUVWXYZ[\]^_`
0x01c0: 6162 6364 6566 6768 696a 6b6c 6d6e 6f70 abcdefghijklmnop
0x01d0: 7172 7374 7576 7778 797a 7b7c 4142 4344 qrstuvwxyz{|ABCD
0x01e0: 4546 4748 494a 4b4c 4d4e 4f50 5152 5354 EFGHIJKLMNOPQRST
0x01f0: 5556 5758 595a 5b5c 5d5e 5f60 6162 6364 UVWXYZ[\]^_`abcd
...
0x1f00: 5556 5758 595a 5b5c 5d5e 5f60 6162 6364 UVWXYZ[\]^_`abcd
0x1f10: 6566 6768 696a 6b6c 6d6e 6f70 7172 7374 efghijklmnopqrst
0x1f20: 7576 7778 797a 7b7c 4142 4344 4546 4748 uvwxyz{|ABCDEFGH
0x1f30: 494a 4b4c 4d4e 4f50 5152 5354 5556 5758 IJKLMNOPQRSTUVWX
0x1f40: 595a 5b5c 5d5e 5f60 6162 6364 6566 6768 YZ[\]^_`abcdefgh
0x1f50: 696a 6b6c 6d6e 6f70 7172 7374 7576 7778 ijklmnopqrstuvwx
0x1f60: 797a 7b7c 4142 4344 4546 4748 494a 4b4c yz{|ABCDEFGHIJKL
0x1f70: 4d4e 4f50 5152 5354 5556 5758 595a 5b5c MNOPQRSTUVWXYZ[\
0x1f80: 5d5e 5f60 6162 6364 6566 6768 696a 6b6c ]^_`abcdefghijkl
0x1f90: 6d6e 6f70 7172 7374 7576 7778 797a 7b7c mnopqrstuvwxyz{|
0x1fa0: 4142 4344 4546 4748 494a 4b4c 4d4e 4f50 ABCDEFGHIJKLMNOP
0x1fb0: 5152 5354 5556 5758 595a 5b5c 5d5e 5f60 QRSTUVWXYZ[\]^_`
0x1fc0: 6162 6364 6566 6768 696a 6b6c 6d6e 6f70 abcdefghijklmnop
0x1fd0: 7172 7374 7576 7778 797a 7b7c 4142 4344 qrstuvwxyz{|ABCD
0x1fe0: 4546 4748 494a 4b4c 4d4e 4f50 5152 5354 EFGHIJKLMNOPQRST
0x1ff0: 5556 5758 595a 5b5c 5d5e 5f60 6162 6364 UVWXYZ[\]^_`abcd
0x2000: 6566 6768 696a 6b6c 6d6e 6f70 7172 7374 efghijklmnopqrst
0x2010: 7576 7778 797a 7b7c 4142 4344 4546 4748 uvwxyz{|ABCDEFGH
0x2020: 494a 4b4c 4d4e 4f50 5152 5354 5556 5758 IJKLMNOPQRSTUVWX
0x2030: 595a 5b5c 5d5e 5f60 6162 6364 6566 0a YZ[\]^_`abcdef.
It has been remarked on #2232 that "the error from octet-counted is because it expects the message to be of the exact size mentioned in it's prefix." As best as I can tell, the message is of the correct size. And it doesn't explain why smaller messages are parsed correctly (i.e., if it were an issue with the RFC 5424 implementation of the sender, it should fail for every message).
For additional testing, I allowed the octet count in the message to remain the same while the message itself was either truncated or expanded. In other words, the message prefix would always be "8198 <..." but I might send 8100 bytes of data, or 16k of data. No matter the amount of data that was sent, Promtail failed as above.
Go is not something I'm too familiar with, but I took a look at the master branch of Promtail and the current branch of influxdata/go-syslog. From parser.go:104, this seems to be where the error is generated:
// Next we MUST see a SYSLOGMSG with length equal to MSGLEN
if tok = p.scan(); tok.typ != SYSLOGMSG {
e := fmt.Errorf(`found %s after "%s", expecting a %s containing %d octets`, tok, tok.lit, SYSLOGMSG, p.s.msglen)
From scanner.go:122:
func (s *Scanner) scanSyslogMsg() Token {
// Check the reader contains almost MSGLEN characters
n := int(s.msglen)
b, err := s.r.Peek(n)
if err != nil {
return Token{
typ: EOF,
lit: b,
}
}
My guess is that the call to s.r.Peek(n)
might be failing with an overflow of an 8192-sized buffer (8198 == 8192 + len(prefix+1)).
This has me stumped, so I've resorted to opening this ticket. Perhaps I am missing something obvious somewhere, but I simply can't seem to get this to work. Any help would be appreciated.
Activity