[Feature] Remove Etcd Dependency via DNS-Based Node Discovery

### Search before asking

- [x] I had searched in the [issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no similar feature requirement.


### Description


## 1\. Context & Motivation

**Current State:** BanyanDB currently relies on `etcd` as a hard dependency for cluster coordination, metadata storage, and node discovery (Meta Nodes). This requires maintaining a separate `etcd` cluster, managing leases for health checks, and handling complex certificate management for secure communication.

**Goal:** Transform BanyanDB into a "Zero-Dependency" architecture by replacing the `etcd`-based registry with a decentralized **DNS-based Node Discovery** mechanism. This simplifies deployment on Kubernetes (StatefulSets) and static environments (VMs/Edge).

-----

## 2\. Technical Design Specification

### 2.1 Core Abstraction: `NodeRegistry`

We will introduce a modular `NodeRegistry` interface to decouple the discovery logic from the specific implementation.

  * **Old Flow:** `Liaison` -\> Watch `etcd` Key -\> Update gRPC Connection.
  * **New Flow:** `Liaison` -\> Poll `NodeRegistry` -\> Update gRPC Connection.

### 2.2 Discovery Mechanism (DNS)

The primary implementation will be the **DNS Registry**, operating in a "Pull-based" model.

  * **Query Strategy:**
    1.  **Primary:** Query **SRV Records** (RFC 2782) to discover target hostnames and dynamic ports (critical for K8s Headless Services).
    2. **Fallback: Static Registry:** To support environments without DNS or for emergency overrides, loads a fixed list of peers from a local file (`topology.yml`). Support hot reloading of this file. 

  * **Polling & Caching:**
      * Implement a **Custom gRPC Resolver** (Go) that polls DNS at a configurable interval (default: 30s). In the startup process, the interval should be 5 seconds to reflect the topology change. There should be two flags to set up the intervals.
      * **Two-Layer Caching:** Respect DNS TTL (Infrastructure layer) and maintain an internal snapshot (Application layer).
  * **Resilience (Serve Stale):**
      * If the DNS server returns a failure (e.g., `SERVFAIL`, Timeout), the resolver **MUST NOT** flush the current address list.
      * It must log a warning and return the **stale** (last known good) list of addresses to ensure partition tolerance.

### 2.3 Peer Discovery

  * **Liaison Node Discovery** Liaison nodes will discover the data nodes 
  * **Data Node Mesh:** Data nodes will discover peers by resolving the same DNS name they publish themselves.
  * **Lifecycle:** Hot nodes discover Warm/Cold nodes.

### 2.4 Two-Phase Discovery

Instead of reading the full Node struct from etcd before connecting, the Liaison/Data node will first connect via DNS and then query the node directly for its details.

Add a new gRPC service to return the Node. 

### 2.5 Troubleshooting DNS Discovery

In the absence of etcdctl, operators need new tools.

**State gRPC service**: bydbctl/UI -> calls (Liaison/Data).GetClusterState() -> returns the internal list derived from DNS. The service will return more internal state than DNS in the future. 
**Metrics**: New metrics are required:

- discovery_dns_lookup_duration_seconds
- discovery_dns_lookup_failures_total
- discovery_cluster_size (Gauge)


-----

## 3\. Task List

  - [ ] Implement `DNSNodeRegistry` with `net.LookupSRV` and `net.LookupHost`.
  - [ ] Implement `StaticNodeRegistry` for fallback/file-based discovery.
  - [ ] Update Helm Charts.
  - [ ] Create E2E test suite for startup.
  - [ ] Update Documentation ( Concept and operational document )

### Use case

_No response_

### Related issues

_No response_

### Are you willing to submit a pull request to implement this on your own?

- [ ] Yes I am willing to submit a pull request on my own!

### Code of Conduct

- [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Remove Etcd Dependency via DNS-Based Node Discovery #13621

Search before asking

Description

1. Context & Motivation

2. Technical Design Specification

2.1 Core Abstraction: `NodeRegistry`

2.2 Discovery Mechanism (DNS)

2.3 Peer Discovery

2.4 Two-Phase Discovery

2.5 Troubleshooting DNS Discovery

3. Task List

Use case

Related issues

Are you willing to submit a pull request to implement this on your own?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Remove Etcd Dependency via DNS-Based Node Discovery #13621

Description

Search before asking

Description

1. Context & Motivation

2. Technical Design Specification

2.1 Core Abstraction: NodeRegistry

2.2 Discovery Mechanism (DNS)

2.3 Peer Discovery

2.4 Two-Phase Discovery

2.5 Troubleshooting DNS Discovery

3. Task List

Use case

Related issues

Are you willing to submit a pull request to implement this on your own?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2.1 Core Abstraction: `NodeRegistry`