Description
Logstash information:
Please include the following information:
- Logstash version (e.g.
bin/logstash --version
) any - Logstash installation source (e.g. built from source, with a package manager: DEB/RPM, expanded from tar or zip archive, docker)
- How is Logstash being run (e.g. as a service/service manager: systemd, upstart, etc. Via command line, docker/kubernetes)
Plugins installed: (bin/logstash-plugin list --verbose
)
JVM (e.g. java -version
):
If the affected version of Logstash is 7.9 (or earlier), or if it is NOT using the bundled JDK or using the 'no-jdk' version in 7.10 (or higher), please provide the following information:
- JVM version (
java -version
) - JVM installation source (e.g. from the Operating System's package manager, from source, etc).
- Value of the
LS_JAVA_HOME
environment variable if set.
OS version (uname -a
if on a Unix-like system):
Description of the problem including expected versus actual behavior:
When BufferedTokenizer is used to dice the input, after a buffer full error, the input should be consumed till next separator and start correctly with the data after that separator
Steps to reproduce:
Mostly inspired by logstash-plugins/logstash-codec-json_lines#45 (comment)
- Configure Logstash to use the json_lines codec present in PR Re-established previous behaviour without a default limit for 'decode_size_limit_bytes' logstash-plugins/logstash-codec-json_lines#45
In Gemfile add:
gem "logstash-codec-json_lines", :path => "/path/to/logstash-codec-json_lines"
- From shell run
bin/logstash-plugin install --no-verify
- start Logstash with following pipeline
input {
tcp {
port => 1234
codec => json_lines {
decode_size_limit_bytes => 100000
}
}
}
output {
stdout {
codec => rubydebug
}
}
- Use the following script to generate some load
require 'socket'
require 'json'
hostname = 'localhost'
port = 1234
socket = TCPSocket.open(hostname, port)
data = {"a" => "a"*105_000}.to_json + "\n"; socket.write(data[0...90_000])
data = {"a" => "a"*105_000}.to_json + "\n"; socket.write(data[90_000..] + "{\"b\": \"bbbbbbbbbbbbbbbbbbb\"}\n")
socket.close
Provide logs (if relevant):
Logstash generates 3 ebents:
{
"message" => "Payload bigger than 100000 bytes",
"@version" => "1",
"@timestamp" => 2024-10-01T10:49:55.755601Z,
"tags" => [
[0] "_jsonparsetoobigfailure"
]
}
{
"b" => "bbbbbbbbbbbbbbbbbbb",
"@version" => "1",
"@timestamp" => 2024-10-01T10:49:55.774574Z
}
{
"a" => "aaaaa......a"
"@version" => "1",
"@timestamp" => 2024-10-01T10:49:55.774376Z
}
Instead of 2, one with the _jsonparsetoobigfailure
error for the message made of a
and then a valid with b
s.
The extended motivation is explained in logstash-plugins/logstash-codec-json_lines#45 (comment)