RFC: improve efficiency of tablet filtering in go/vt/discovery
and topo
#16761
Description
RFC Description
Today discovery.Healthcheck
(in go/vt/discovery
- mainly used by vtgate
) supports filtering tablets watched by:
--keyspaces_to_watch
(very common)--tablet-filter
on hostname--tablet-filter-tags
on tabletTags map[string]string
tags
Behind the scenes, this filtering happens in the tablet watcher's loadTablets()
, but not very efficiently: first all tablets in the cell are grabbed from topo unconditionally, then the optional filtering of tablets occurs. At times this filtering excludes a significant amount of the topo KVs we fetched. More on "why" later
On clusters with 1000s of tablets this becomes a scalability problem for the topology store that has to handle topo Get calls for all of the tablets fetched. In an extreme case, the txthrottler
(which uses discovery.Healtcheck
to stream tablet stats) opens 1 x topology watcher per cell just to find tablets in it's local shard. Let's say we have a 3 x cell deployment with 1000 tablets per cell and txthrottler
is running in a 3-tablet shard: this means txthrottler
will be reading 3000 topo KVs frequently just to find it's 2 other members, and this problem grows with number of tablets. In our production the problem is significantly larger
Now why does tablet watcher fetch EVERY tablet from the cell? Today it kind of has to 🤷. Using a Consul topo as an example, tablets are stored by alias in paths named /tablets/<tablet alias>/
and there is no efficient way to grab just 1 x keyspace or shard - you have to read everything
There is a way to mitigate this inefficiency (but not resolve it): --tablet_refresh_known_tablets=false
- this causes vtgate
to store tablet records it reads forever, which has it's own drawbacks and doesn't resolve the initial inefficient read of tablets for the entire cell
This issue is a RFC/feature request for a more efficient way to fetch topo records for a single shard and/or keyspace. Unfortunately improving the situation likely means a change to the layout of the topo
Some early ideas:
- "Pointer"/alias KVs - add KVs like
/keyspaces/<keyspace>/<shard>/<tablet>
that simply "point" to the actual/tablet/<alias>
record, kind of like an index- This doesn't seem to be a built in feature of most topo stores so it would need to be done at a KV-level.
- Tablet records are stored in per-keyspace/shard paths. But this would come at the cost of more ListDir operations
<your idea here>
🙇
Use Case(s)
Large vitess deployments that use filtering (likely --keyspaces_to_watch
) where rate of topo gets is a risk/concern