Skip to content

xds/balancer: Re-evaluate increasing loadStoreTimeout in clusterimpl balancer #8380

Open
@purnesh42H

Description

@purnesh42H

The new LRS client, introduced in #8250, attempts to send a final load report to the LRS server when its LoadStore.Stop() method is called.

The Stop(context) function uses a context with a deadline as a safeguard to prevent indefinite blocking. However, the function does not send the final report immediately. Instead, it signals the underlying stream to send the report during its next scheduled reporting cycle. This means if the deadline is too short than the remaining time until the next reporting interval, the Stop() operation will time out before the client even has a chance to attempt the final report.

As part of #8310, the clusterimpl balancer now uses this new LRS client and calls LoadStore.Stop() with a hardcoded 1-second deadline.

This fixed timeout may not be sufficient. Load balancing control planes like Traffic Director can have a LoadReportingInterval that is significantly longer than one second. In such cases, the final load statistics would consistently be lost.

Although the previous internal grpc xdsclient load report functionality did not have this feature of making last load reporting attempt before closing so switching to new LRS client is not changing any existing behavior, we should still re-evaluate that the 1-second deadline. It should ideally be increased to align with the maximum expected load reporting interval to ensure the final report is sent most of the time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Area: xDSIncludes everything xDS related, including LB policies used with xDS.P2Type: Internal CleanupRefactors, etc

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions