Skip to content

[API] Add Subnet-Specific Health Checks #1264

@patrick-ogrady

Description

@patrick-ogrady

Although one Subnet on an AvalancheGo node may be unhealthy, operators may still wish to interact with other Subnets running on it. AvalancheGo's existing health check, however, returns unhealthy if any Subnet is unhealthy. This behavior led to an outage in Subnet APIs during this incident even though most Subnets were able to serve queries because API providers prevented a node serving queries if this "global" check failed (as that was the only mechanism they had to gauge health of the underlying node).

We should add a new health check or add an argument to the existing check (https://docs.avax.network/apis/avalanchego/apis/health#healthhealth) that allows for just checking the health of a specific Subnet. This will allow API providers to serve queries to any subset of healthy Subnets on a node.

I don't think we should remove the "global" health check in this change (which still is useful for getting a "full sense" of a node's status).

Metadata

Metadata

Assignees

Labels

incident responsemonitoringThis primarily focuses on logs, metrics, and/or tracing

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions