Skip to content

python scripts for working with postgress (check if join condition columns are indexed, compare column set of two tables, dump sql query result to json)

Notifications You must be signed in to change notification settings

MoserMichael/pghelp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

python scripts for working with Postgres

This repository has some neat scripts for working with the PostgreSQL db. Note that most of the scripts are accessing internal DB tables like pg_catalog and pg_info. Unfortunately these internal tables are subject to change, and may differ between DB versions. So if things stop working for you, please check what happened to the tables. (this stuff is working for me, unfortunately I don't have the resources to update this for each and every postgres version).

setup

# first time: create virtual env
python3 -m venv .venv

# venv activate (each usage)
source .venv/bin/activate

# first time: install requirements
pip3 install -r requirements.txt

The psycopg2 postgres client is used here.

configuration

The current directory or home directory must have the file: .psqldiff for configuring the connect string:

[PSQL]

conf=postgresql://<PG_USER>:<PG_PASSWD>>@<PG_HOSTNAME>:<PG_PORT>/<PG_DB>

Check if all join clauses are backed by indexes

One reason why a SELECT statement can have bad performance: the SELECT statement has JOIN clauses, where the columns in the ON clause are not backed by indexes. This utility checks, if this is the case.

This utility uses the very remarkable sqlglot library, for parsing of sql statements.

usage: psqljoincheck.py [-h] -i INPUT [-d] [-s]

Check if all join clauses are backed by indexes. 

Requires configuration file for db connection string, by default it looks for 
./.psqldiff and ~/.psqldiff 

with the following configuration

[PSQL]
conn="postgresql://<DBUSER>:<DBPASSWRD>@<HOST>:<PORT>/<DBNAME>"
    

options:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        input sql file
  -d, --debug           debug on
  -s, --show_listing    show listing of sql

Comparing the structure of two SQL tables

Shows the difference between the columns declared in two sql statements.

Example: for the following two sql tables

CREATE TABLE sh1.aaa (
    only_a_1 character varying(10),
    only_a_2 boolean,
    common_1 character varying(10) NOT NULL,
    common_2 character varying(20),
    common_3 integer,
    common_ch_type1 character varying(10)
);

ALTER TABLE ONLY sh1.aaa
    ADD CONSTRAINT aaa_pkey PRIMARY KEY (common_1);


CREATE TABLE sh2.aaa (
    common_1 character varying(10),
    common_2 character varying(20),
    common_3 integer,
    only_b_1 character varying(40),
    only_b_2 integer,
    only_b_3 integer,
    common_ch_type1 character varying(30)
);

The following command will compare these two tables

python psqldiff.py  -f sh1.aaa -t sh2.aaa

And give the following result

compare tables: sh1.aaa, sh2.aaa
COLUMNS WITH IDENTICAL NAMES AND CHANGED TYPES, OR CONSTRAINTS

column: common_1
        table: sh1.aaa and sh2.aaa of type: character varying(10)
        table: sh1.aaa of constraint: PRIMARY KEY


column: common_ch_type1
        table: sh1.aaa of type: character varying(10)
        table: sh2.aaa of type: character varying(30)


COLUMN NAMES THAT APPEAR IN TABLE sh1.aaa ONLY

        column: only_a_1 with type: character varying(10)
        column: only_a_2 with type: boolean

COLUMN NAMES THAT APPEAR IN TABLE sh2.aaa ONLY

        column: only_b_1 with type: character varying(40)
        column: only_b_2 with type: integer
        column: only_b_3 with type: integer

Addition: If you want to see the columns in both tables with identical name and type:

python psqldiff.py  -f sh1.aaa -t sh2.aaa -s

This will give the following output

compare tables: sh1.aaa, sh2.aaa
COLUMNS WITH IDENTICAL NAMES AND CHANGED TYPES, OR CONSTRAINTS

column: common_1
        table: sh1.aaa and sh2.aaa of type: character varying(10)
        table: sh1.aaa of constraint: PRIMARY KEY


column: common_ch_type1
        table: sh1.aaa of type: character varying(10)
        table: sh2.aaa of type: character varying(30)


COLUMN NAMES THAT APPEAR IN TABLE sh1.aaa ONLY

        column: only_a_1 with type: character varying(10)
        column: only_a_2 with type: boolean

COLUMN NAMES THAT APPEAR IN TABLE sh2.aaa ONLY

        column: only_b_1 with type: character varying(40)
        column: only_b_2 with type: integer
        column: only_b_3 with type: integer

COLUMN NAMES THAT APPEAR IN BOTH TABLES WITH IDENTICAL TYPE AND CONSTRAINT

        column: common_2 with type: character varying(20)
        column: common_3 with type: integer

TODO: compare indexes too.

dump the result of an sql query into a json file

CREATE TABLE sh1.students(name varchar(10), fav_teacher varchar(20), fav_topic varchar(20));

INSERT INTO sh1.students(name, fav_teacher, fav_topic) 
VALUES 
('Joe A', 'Mr. Larson', 'Physics'), 
('Peggy L', 'Mrs. Brown', 'Mathematics'), 
('Dan L', 'Mrs. Johnson', 'Computers');

The following command

python psql2json.py -q 'SELECT * FROM sh1.students' -f out.json

will result in the following json data in file out.json

[
    {
        "name": "Joe A",
        "fav_teacher": "Mr. Larson",
        "fav_topic": "Physics"
    },
    {
        "name": "Peggy L",
        "fav_teacher": "Mrs. Brown",
        "fav_topic": "Mathematics"
    },
    {
        "name": "Dan L",
        "fav_teacher": "Mrs. Johnson",
        "fav_topic": "Computers"
    }
]

count rows of all tables of a namespace and report tables, sorted by row count

usage: psqlcount.py [-h] --schema SCHEMA [--fast]

Count number of rows in all tables of a given schema, report the result in
sorted form. Requires configuration file for db connection string, by default
it looks for ./.psqldiff and ~/.psqldiff with the following configuration
[PSQL] conn="postgresql://<DBUSER>:<DBPASSWRD>@<HOST>:<PORT>/<DBNAME>"

options:
  -h, --help            show this help message and exit
  --schema SCHEMA, -s SCHEMA
                        schema name
  --fast, -f            show approximate count of rows (fast, but is not
                        exact, falls back to slow count - if no count is
                        known)

About

python scripts for working with postgress (check if join condition columns are indexed, compare column set of two tables, dump sql query result to json)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages