Skip to content

Automatic discovery of Primary and Replica nodes #228

Open
@Kobaruon

Description

@Kobaruon

Is your feature request related to a problem? Please describe.
When a primary node failover on PostgreSQL cluster specifically Patroni in my case, pgcat needs to be configured again to correctly route queries.

[pools.simple_db]
pool_mode = "transaction"
default_role = "any"
query_parser_enabled = true
primary_reads_enabled = false
sharding_function = "pg_bigint_hash"

[pools.simple_db.users.0]
username = "dba"
password = "password"
pool_size = 100
statement_timeout = 0

[pools.simple_db.shards.0]
servers = [
    [ "10.85.246.169", 5432, "replica"],
    [ "10.85.241.153", 5432, "primary"], # this node failover and another node becomes primary
    [ "10.85.245.26", 5432, "replica"]
]
database = "simple_db"

When this happens, it bans that node correctly, but it can not detect another node that becomes primary on its own. As a result of this, write queries get errors.

[2022-11-10T12:59:00.125226Z ERROR pgcat::pool] Banning instance Address { id: 1, host: "10.85.241.153", port: 5432, shard: 0, database: "simple_db", role: Primary, replica_number: 1, address_index: 1, username: "dba", pool_name: "simple_db" }, error: TimedOut
[2022-11-10T12:59:00.125272Z ERROR pgcat::pool] Banning Address { id: 1, host: "10.85.241.153", port: 5432, shard: 0, database: "simple_db", role: Primary, replica_number: 1, address_index: 1, username: "dba", pool_name: "simple_db" }
[2022-11-10T12:59:00.125340Z ERROR pgcat::client] Could not get connection from pool: { pool_name: "simple_db", username: "dba", shard: 0, role: "Some(Primary)", error: "AllServersDown" }

Describe the solution you'd like
I think this behaviour can be improved by checking nodes with a query like this to see if they are primary or replica.

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

If this query returns true pgcat can mark that node replica, if it's false then it knows that's a primary node.

Describe alternatives you've considered
Another alternative can be creating a test table when pgcat starts, and it can run INSERT/DELETE/UPDATE on that table to see if it runs successfully.

What do you think? Is this a viable solution to this problem?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions