Description
ignore_malformed
is sometimes used to deal with messy data in order to not fail indexing an entire document when only one or two fields are malformed. However once you start using it, this option doesn't give you any feedback about which documents succeeded or failed indexing, which is trappy. It makes it possible to think you are querying all your data when actually the queried field only has the correct format in a minority of documents. It can also make it hard to answer questions like "why does this document not match this query" eg. if the date field has a hard-to-spot typo. I'm especially more worried about this as we are considering opening of the scope of the ignore_malformed
option (#12366).
We had a discussion about it with @clintongormley and thought that maybe we should add feedback about parsing failures back to the _source
document, similarly to how Logstash's grok plugin can add tags to failed documents.
Exact details are up for discussion but for instance we could add an _ignored
field with the list of fields that failed parsing. This may never collide with a document's field since we reject fields that start with an underscore.