Description
Events without @timestamp
can be difficult to process in certain situations. According to code that dates back to 2013 (and probably before), its presence should be guaranteed. Logstash has never enforced this by preventing the removal of @timestamp
per se, however.
So whenever a problem arises from a missing @timestamp
, people may simply see an innocuous error like the following, and conclude that there's a bug in Logstash:
[2018-02-22T13:49:42,425][FATAL][logstash.runner ] An unexpected error occurred! {:error=>#<LogStash::Error: timestamp field is missing>, :backtrace=>["org/logstash/ext/JrubyEventExtLibrary.java:205:in sprintf'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.4.0-java/lib/logstash/outputs/elasticsearch/common.rb:169:inevent_action_params'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.4.0-java/lib/logstash/outputs/elasticsearch/common.rb:44:in event_action_tuple'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.4.0-java/lib/logstash/outputs/elasticsearch/common.rb:38:inmulti_receive'", "org/jruby/RubyArray.java:2414:in map'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.4.0-java/lib/logstash/outputs/elasticsearch/common.rb:38:inmulti_receive'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/output_delegator_strategies/shared.rb:13:in multi_receive'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/output_delegator.rb:49:inmulti_receive'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/pipeline.rb:436:in output_batch'", "org/jruby/RubyHash.java:1342:ineach'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/pipeline.rb:435:in output_batch'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/pipeline.rb:381:inworker_loop'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/pipeline.rb:342:in `start_workers'"]}```
One point of view about this is that the event should actually be considered invalid and whatever lead to this situation is likely to cause data loss.
There are two schools of thoughts or approaches we can take for these:
- We can try to support these events as best as possible
- We can take a hard line and not try to support these events, and assume that the customer's configuration has a problem that will lead to data loss. We can adjust the level of "severity" we take with this approach.
In either case, in order to prevent needless confusion and support requests, we should at the very least output a log message that's a bit more verbose than the one above, in explaining what the problem actually is. With this event it's missing @timestamp
and the customer tries to index it to a timestamped index (like logstash-%{+YYYY.MM.dd}
).
Are non-timestamped events legitimate?
The behaviour currently (unless the above situation happens) is to successfully index the event. I would argue there's a legitimate use case for removing @timestamp
in Logstash pipelines. It's counterintuitive because Logstash was born to process time-based data. But once someone has a Logstash installation, Logstash is a great way to ingest non time-based data periodically to their ElasticSearch cluster.
Examples of non time-based data to reimport regularly:
- zip codes (or other regional databases)
- IP geolocation databases
- other kinds of CSV data dumps
Another supporting argument for the legitimacy of non time-based support in Logstash is the fact that our plugins are gradually getting better at supporting non time-sensitive scenarios:
- More explicit support in input-file for reading full files, rather than tailing them (CSV imports)
- We've had requests (#9) and attempts (filter-lrucache) at making filter-elasticsearch be able to cache static datasets in memory, to achieve a behaviour, similar to filter-jdbc_static. This is a sign that people have static datasets in ES.
The last point in demonstrating the legitimacy of processing non-timestamped data with Logstash is the very first Logstash configuration one sees, during the ElasticSearch training.
Here we're ingesting bunch of data, but we're not interested in keeping @timestamp
(among others).
Approach 1: Support these events as best as possible
The current behaviour is that events with no @timestamps
work in most cases. People can do this, and are doing it. There are still some improvements we can make:
- Output a more explicit message when someone tries to index an event without a
@timestamp
to a timestamped index.- Part of #777 was about putting in place this clearer messaging.
- Instead of a full Logstash crash (current behaviour), output an
ERROR
for every event in this situation, and keep Logstash running.- #777 implements this behaviour
- If the customer has DLQ enabled, we could even send the event there. Perhaps their timestamp is just named incorrectly and they may be able to reprocess these events without any other disruption.
Approach 2: Interpret this as a problem with the user's configuration and crash
A lot of the feedback around #777 was not about the PR, but went in the direction that "not having a @timestamp
on the event" was not officially supported. So this definitely stuck a chord with a few people :-)
In any case, nobody denies that we can improve the error message when the timestamp error happens.
Now, do we want to take a harder line to the effect that we do not want to support a missing @timestamp
at all? If so, the current behaviour is not strict enough. We may want to take the following additional steps:
- Pre-emptively refuse any event without
@timestamp
as soon as we see them (whether or not the timestamp is needed). - Eventually add a breaking change to Logstash, preventing the removal of
@timestamp
. - Until this is prevented, should we always crash on any such event? (regardless of whether the event is problematic)
1+2=3: Let users decide
Any given user may really want Logstash to behave like 1, or to behave like 2. Presumably, the default could be the friendlier approach 1, and people who take great pains to construct a pipeline that has the least data loss possible (e.g. use only non-lossy transports & plugins, careful use of PQ & DLQ) could optionally decide that such events should indeed cause a hard crash.
Why this discussion?
This issue came out of the discussion here #777, which in turn came from reports such as #739. Requesting feedback on #777 in Slack immediately led to discussions of going towards approach #2.