How should output-elasticsearch deal with a missing @timestamp?

Events without `@timestamp` can be difficult to process in certain situations. According to code that [dates back to 2013](https://github.com/elastic/logstash/commit/95a5bcb9620785f3abfa427a044b97c017ac3d93#diff-b09e11a609fa0d943a749a72972396ccR14) (and probably before), its presence should be guaranteed. Logstash has never enforced this by preventing the removal of `@timestamp` per se, however.

So whenever a problem arises from a missing `@timestamp`, people may simply see an innocuous error like the following, and conclude that there's a bug in Logstash:

```
[2018-02-22T13:49:42,425][FATAL][logstash.runner ] An unexpected error occurred! {:error=>#<LogStash::Error: timestamp field is missing>, :backtrace=>["org/logstash/ext/JrubyEventExtLibrary.java:205:in sprintf'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.4.0-java/lib/logstash/outputs/elasticsearch/common.rb:169:inevent_action_params'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.4.0-java/lib/logstash/outputs/elasticsearch/common.rb:44:in event_action_tuple'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.4.0-java/lib/logstash/outputs/elasticsearch/common.rb:38:inmulti_receive'", "org/jruby/RubyArray.java:2414:in map'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.4.0-java/lib/logstash/outputs/elasticsearch/common.rb:38:inmulti_receive'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/output_delegator_strategies/shared.rb:13:in multi_receive'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/output_delegator.rb:49:inmulti_receive'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/pipeline.rb:436:in output_batch'", "org/jruby/RubyHash.java:1342:ineach'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/pipeline.rb:435:in output_batch'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/pipeline.rb:381:inworker_loop'", "/Users/Gabriel/Documents/ElasticSearch/logstash-5.6.2/logstash-core/lib/logstash/pipeline.rb:342:in `start_workers'"]}```
```

One point of view about this is that the event should actually be considered invalid and whatever lead to this situation is likely to cause data loss.

There are two schools of thoughts or approaches we can take for these:

1. We can try to support these events as best as possible
2. We can take a hard line and not try to support these events, and assume that the customer's configuration has a problem that will lead to data loss. We can adjust the level of "severity" we take with this approach.

In either case, in order to prevent needless confusion and support requests, we should at the very least output a log message that's a bit more verbose than the one above, in explaining what the problem actually is. With this event it's missing `@timestamp` and the customer tries to index it to a timestamped index (like `logstash-%{+YYYY.MM.dd}`).

## Are non-timestamped events legitimate?
The behaviour currently (unless the above situation happens) is to successfully index the event. I would argue there's a legitimate use case for removing `@timestamp` in Logstash pipelines. It's counterintuitive because Logstash was born to process time-based data. But once someone has a Logstash installation, Logstash is a great way to ingest non time-based data periodically to their ElasticSearch cluster.

Examples of non time-based data to reimport regularly:

* zip codes (or other regional databases)
* IP geolocation databases
* other kinds of CSV data dumps

Another supporting argument for the legitimacy of non time-based support in Logstash is the fact that our plugins are gradually getting better at supporting non time-sensitive scenarios:
* More explicit support in input-file for reading full files, rather than tailing them (CSV imports)
* We've had requests ([#9](https://github.com/logstash-plugins/logstash-filter-elasticsearch/issues/9)) and attempts ([filter-lrucache](https://github.com/sw-jung/logstash-filter-lrucache)) at making filter-elasticsearch be able to cache static datasets in memory, to achieve a behaviour, similar to filter-jdbc_static. This is a sign that people have static datasets in ES.

The last point in demonstrating the legitimacy of processing non-timestamped data with Logstash is the very first Logstash configuration one sees, during the ElasticSearch training.
![Remove noise from Logstash events](https://webmat.keybase.pub/random/es-training-logstash.png)
Here we're ingesting bunch of data, but we're not interested in keeping `@timestamp` (among others).

## Approach 1: Support these events as best as possible
The current behaviour is that events with no `@timestamps` work in most cases. People can do this, and are doing it. There are still some improvements we can make:
* Output a more explicit message when someone tries to index an event without a `@timestamp` to a timestamped index.
	* Part of [#777](https://github.com/logstash-plugins/logstash-output-elasticsearch/pull/777) was about putting in place this clearer messaging.
* Instead of a full Logstash crash (current behaviour), output an `ERROR` for every event in this situation, and keep Logstash running.
 	* [#777](https://github.com/logstash-plugins/logstash-output-elasticsearch/pull/777)  implements this behaviour
* If the customer has DLQ enabled, we could even send the event there. Perhaps their timestamp is just named incorrectly and they may be able to reprocess these events without any other disruption.
	* One could argue that the DLQ may fill up before people notice this & so on, and they may lose data eventually.
	* Idea floated in some form in [#718](https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/718) and tracked as part of a bigger plan in [#772](https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/772)
## Approach 2: Interpret this as a problem with the user's configuration and crash
A lot of the feedback around [#777](https://github.com/logstash-plugins/logstash-output-elasticsearch/pull/777) was not about the PR, but went in the direction that "not having a `@timestamp` on the event" was not officially supported. So this definitely stuck a chord with a few people :-)

In any case, nobody denies that we can improve the error message when the timestamp error happens.

Now, do we want to take a harder line to the effect that we do not want to support a missing `@timestamp` at all? If so, the current behaviour is not strict enough. We may want to take the following additional steps:
* Pre-emptively refuse any event without `@timestamp` as soon as we see them (whether or not the timestamp is needed).
* Eventually add a breaking change to Logstash, preventing the removal of `@timestamp`.
* Until this is prevented, should we always crash on any such event? (regardless of whether the event is problematic)
## 1+2=3: Let users decide
Any given user may really want Logstash to behave like 1, or to behave like 2. Presumably, the default could be the friendlier approach 1, and people who take great pains to construct a pipeline that has the least data loss possible (e.g. use only non-lossy transports & plugins, careful use of PQ & DLQ) could optionally decide that such events should indeed cause a hard crash.

## Why this discussion?
This issue came out of the discussion here #777, which in turn came from reports such as #739. Requesting feedback on #777 [in Slack](https://elastic.slack.com/archives/C0D8WACJY/p1527278712000263) immediately led to discussions of going towards approach #2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How should output-elasticsearch deal with a missing @timestamp? #779

Are non-timestamped events legitimate?

Approach 1: Support these events as best as possible

Approach 2: Interpret this as a problem with the user's configuration and crash

1+2=3: Let users decide

Why this discussion?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How should output-elasticsearch deal with a missing @timestamp? #779

Description

Are non-timestamped events legitimate?

Approach 1: Support these events as best as possible

Approach 2: Interpret this as a problem with the user's configuration and crash

1+2=3: Let users decide

Why this discussion?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions