Open
Description
We just had an issue in our gitlab postgres database where the disc usage were getting extreme on pg_xlog folder:
Following folder used 132G out of the 150G of the pvc:
/home/postgres/pgdata/pgroot/data/pg_xlog
We solved it by removing the replication_slots that had become inactive:
postgres=# SELECT * FROM pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
---------------------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------
gitlab_postgresql_2 | | physical | | | f | | | | 12/674C1C68 |
gitlab_postgresql_1 | | physical | | | f | | | | 12/674C1C68 |
(2 rows)
postgres=# select pg_drop_replication_slot('gitlab_postgresql_1');
pg_drop_replication_slot
--------------------------
(1 row)
postgres=# select pg_drop_replication_slot('gitlab_postgresql_2');
pg_drop_replication_slot
--------------------------
(1 row)
postgres=# SELECT * FROM pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-----------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------
(0 rows)
After a few min the pg_xlog was cleared and everything looked good.
As a comparison our staging environment is working fine in regards of the cleanup and there the replication slots are active:
postgres=# SELECT * FROM pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
---------------------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------
gitlab_postgresql_1 | | physical | | | t | 135 | | | 19/260004C0 |
gitlab_postgresql_2 | | physical | | | t | 30008 | | | 19/260004C0 |
(2 rows)
Running postgresql version: "9.6"
Operator version: 1.3.0
3 replicas and default configuration regarding archiving and replication in both the operator and
After the applied fix I'm left with a few questions:
- Is there any issue not having replication slots in place?
- Is there anything I configured or have done to prevent it from happening in the first place?
- Is there a proper way of recovering when the replication slots have become inactive?