This repository has some neat scripts for working with the PostgreSQL db. Note that most of the scripts are accessing internal DB tables like pg_catalog and pg_info. Unfortunately these internal tables are subject to change, and may differ between DB versions. So if things stop working for you, please check what happened to the tables. (this stuff is working for me, unfortunately I don't have the resources to update this for each and every postgres version).
# first time: create virtual env
python3 -m venv .venv
# venv activate (each usage)
source .venv/bin/activate
# first time: install requirements
pip3 install -r requirements.txtThe psycopg2 postgres client is used here.
The current directory or home directory must have the file: .psqldiff for configuring the connect string:
[PSQL]
conf=postgresql://<PG_USER>:<PG_PASSWD>>@<PG_HOSTNAME>:<PG_PORT>/<PG_DB>
One reason why a SELECT statement can have bad performance: the SELECT statement has JOIN clauses, where the columns in the ON clause are not backed by indexes. This utility checks, if this is the case.
This utility uses the very remarkable sqlglot library, for parsing of sql statements.
usage: psqljoincheck.py [-h] -i INPUT [-d] [-s]
Check if all join clauses are backed by indexes.
Requires configuration file for db connection string, by default it looks for
./.psqldiff and ~/.psqldiff
with the following configuration
[PSQL]
conn="postgresql://<DBUSER>:<DBPASSWRD>@<HOST>:<PORT>/<DBNAME>"
options:
-h, --help show this help message and exit
-i INPUT, --input INPUT
input sql file
-d, --debug debug on
-s, --show_listing show listing of sql
Shows the difference between the columns declared in two sql statements.
Example: for the following two sql tables
CREATE TABLE sh1.aaa (
only_a_1 character varying(10),
only_a_2 boolean,
common_1 character varying(10) NOT NULL,
common_2 character varying(20),
common_3 integer,
common_ch_type1 character varying(10)
);
ALTER TABLE ONLY sh1.aaa
ADD CONSTRAINT aaa_pkey PRIMARY KEY (common_1);
CREATE TABLE sh2.aaa (
common_1 character varying(10),
common_2 character varying(20),
common_3 integer,
only_b_1 character varying(40),
only_b_2 integer,
only_b_3 integer,
common_ch_type1 character varying(30)
);The following command will compare these two tables
python psqldiff.py -f sh1.aaa -t sh2.aaaAnd give the following result
compare tables: sh1.aaa, sh2.aaa
COLUMNS WITH IDENTICAL NAMES AND CHANGED TYPES, OR CONSTRAINTS
column: common_1
table: sh1.aaa and sh2.aaa of type: character varying(10)
table: sh1.aaa of constraint: PRIMARY KEY
column: common_ch_type1
table: sh1.aaa of type: character varying(10)
table: sh2.aaa of type: character varying(30)
COLUMN NAMES THAT APPEAR IN TABLE sh1.aaa ONLY
column: only_a_1 with type: character varying(10)
column: only_a_2 with type: boolean
COLUMN NAMES THAT APPEAR IN TABLE sh2.aaa ONLY
column: only_b_1 with type: character varying(40)
column: only_b_2 with type: integer
column: only_b_3 with type: integer
Addition: If you want to see the columns in both tables with identical name and type:
python psqldiff.py -f sh1.aaa -t sh2.aaa -sThis will give the following output
compare tables: sh1.aaa, sh2.aaa
COLUMNS WITH IDENTICAL NAMES AND CHANGED TYPES, OR CONSTRAINTS
column: common_1
table: sh1.aaa and sh2.aaa of type: character varying(10)
table: sh1.aaa of constraint: PRIMARY KEY
column: common_ch_type1
table: sh1.aaa of type: character varying(10)
table: sh2.aaa of type: character varying(30)
COLUMN NAMES THAT APPEAR IN TABLE sh1.aaa ONLY
column: only_a_1 with type: character varying(10)
column: only_a_2 with type: boolean
COLUMN NAMES THAT APPEAR IN TABLE sh2.aaa ONLY
column: only_b_1 with type: character varying(40)
column: only_b_2 with type: integer
column: only_b_3 with type: integer
COLUMN NAMES THAT APPEAR IN BOTH TABLES WITH IDENTICAL TYPE AND CONSTRAINT
column: common_2 with type: character varying(20)
column: common_3 with type: integer
TODO: compare indexes too.
CREATE TABLE sh1.students(name varchar(10), fav_teacher varchar(20), fav_topic varchar(20));
INSERT INTO sh1.students(name, fav_teacher, fav_topic)
VALUES
('Joe A', 'Mr. Larson', 'Physics'),
('Peggy L', 'Mrs. Brown', 'Mathematics'),
('Dan L', 'Mrs. Johnson', 'Computers');The following command
python psql2json.py -q 'SELECT * FROM sh1.students' -f out.json
will result in the following json data in file out.json
[
{
"name": "Joe A",
"fav_teacher": "Mr. Larson",
"fav_topic": "Physics"
},
{
"name": "Peggy L",
"fav_teacher": "Mrs. Brown",
"fav_topic": "Mathematics"
},
{
"name": "Dan L",
"fav_teacher": "Mrs. Johnson",
"fav_topic": "Computers"
}
]usage: psqlcount.py [-h] --schema SCHEMA [--fast]
Count number of rows in all tables of a given schema, report the result in
sorted form. Requires configuration file for db connection string, by default
it looks for ./.psqldiff and ~/.psqldiff with the following configuration
[PSQL] conn="postgresql://<DBUSER>:<DBPASSWRD>@<HOST>:<PORT>/<DBNAME>"
options:
-h, --help show this help message and exit
--schema SCHEMA, -s SCHEMA
schema name
--fast, -f show approximate count of rows (fast, but is not
exact, falls back to slow count - if no count is
known)