Skip to content

handling of invalid bulk requests with illegal_argument_exception  #815

Open
@colinsurprenant

Description

@colinsurprenant

ES will return a status 400 illegal_argument_exception error at the bulk request level for any malformed bulk requests. Some examples:

The problem is that all 400 illegal_argument_exception errors are infinitely retried but these are not transient error and will always result in that error when retried.

I am not sure what the best action should be here really. If we decide to DLQ at the bulk level for these errors, then the output will become a passthru into the DLQ which I do not think is necessarily a good idea.

We basically have 3 choices: a) Retry indefinitely, b) DLQ and c) Stop.

a) Retry indefinitely: If using PQ, retrying indefinitely will result in a backlog growing in PQ, and once the problem is fixed then upon restarting LS, events will flow back correctly into ES and no data should be lost. Without PQ, backpressure will be applied to whichever input is being used and depending on that input, events might get lost upon restarting LS.

b) DLQ: If DLQing these, then all bulk requests will end up in DLQ. Upon restarting LS after applying a fix then all DLQed events will have to be reprocessed which involves a separate and manual process. Note that using PQ has no impact here.

c) Stop: Stopping implies completely stopping the pipeline which might impact other inputs & filters and might result in loosing events if PQ is not enabled.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions