Skip to content

Remove MasterNodeRequest#TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT #107984

Open

Description

Today we quietly set a default of 30s for MasterNodeRequest#masterNodeTimeout, relying on callers to override it where appropriate. However it is essentially always appropriate to override this timeout in production code. Forgetting to do so leads to subtle but critical bugs that only arise when a cluster is struggling to process cluster state updates as fast as normal, in which there is no way to lengthen these timeouts at runtime to bring the cluster back to health. There are several such bugs in Elasticsearch today (e.g. #107857, but I've found others too).

Instead in production code we should move away from having any quiet default for this timeout and instead require callers to specify an appropriate value when creating the request instance in the first place. For example:

  • Instances that relate to a REST request should always set this timeout according to the ?master_timeout request parameter, whose default of RestUtils#REST_MASTER_TIMEOUT_DEFAULT (30s) is explicitly documented and conceptually separate from MasterNodeRequest#DEFAULT_MASTER_NODE_TIMEOUT (also coincidentally 30s).
  • Instances that relate to internal activities should probably set this timeout very long since it's normally better to wait patiently instead of failing sooner, especially if the failure simply triggers a retry. Alternatively, they could expose the relevant timeout using a setting.
  • In test code it doesn't matter so much, we can default to 30s unless the test specifically needs a different value, but there is no particular need to avoid passing this value on request construction either.

It's not a trivial change since there are currently over 200 implementations of MasterNodeRequest, all of which currently rely on the default behaviour, so it's going to take a few steps to complete this work. Therefore I'm opening this meta-issue to track the work needed in this area.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions