Simplify testing of PostgresQL SQL queries
The idea is to inject mock tables as SQL code around the tested query.
The mock tables will be effectively overloading the actual tables.
The good point is you don't need to modify the tables themselves.
Under the hood, it uses Common Table Expressions using the keywords WITH to overload
the table name with fake data ( but the actual table stays unchanged ).
As input, it accepts either geopandas dataframes or pandas dataframes.
pip install git+https://github.com/lbruand/gdf2sql#egg=gdf2sql
- python 3.5+
- pandas
- geopandas
df = pd.DataFrame(
{
"name": ["Buenos Aires", "Brasilia", "Santiago", "Bogota", "Caracas"],
"Country": ["Argentina", "Brazil", "Chile", "Colombia", "Venezuela"],
"Population": [20., 25., 9., 8., 5.],
"Latitude": [-34.58, -15.78, -33.45, 4.60, 10.48],
"Longitude": [-58.66, -47.91, -70.66, -74.08, -66.86],
}
)
inner_query = "SELECT name, ST_AsText(geometry) " \
"FROM nyc_subway_stats " \
"WHERE nyc_subway_stats.name = 'Brasilia'"
tables: List[VTable] = [(build_vtable("nyc_subway_stats", df))]
result_query = build_test_sql_query(tables, inner_query)
# Run the result query inside postgresql.
# It will run as if there as a nyc_subway_stats table containing the `df` dataframe.Let's take a query example :
SELECT name
FROM nyc_subway_stats
WHERE nyc_subway_stats.name = 'Brasilia'(A)
Is transformed into :
WITH
nyc_subway_stats(name, Country, Population, Latitude, Longitude) AS (VALUES
('Buenos Aires', 'Argentina', 20.0, -34.58, -58.66),
('Brasilia', 'Brazil', 25.0, -15.78, -47.91),
('Santiago', 'Chile', 9.0, -33.45, -70.66),
('Bogota', 'Colombia', 8.0, 4.6, -74.08),
('Caracas', 'Venezuela', 5.0, 10.48, -66.86)),
INNERQUERY_6cee68 as (SELECT name FROM nyc_subway_stats WHERE nyc_subway_stats.name = 'Brasilia') SELECT * FROM INNERQUERY_6cee68With two common table expressions :
- one for the mock table
nyc_subway_statsthat overrides the actual content of the realnyc_subway_stats( which might not even exist) - one for the inner query A
- No need to update the underlying table before the test (which might be slow).
- The data and the query are send to be tested inside the postgresql engine directly.
- gdf2sql is independent of the database considered. So you might decide to unittest in SQLite and then to test integration in postgresQL. ( Beware that SQLite and PostgresQL are different SQL dialects. That fact might affect your queries. You want to use sqlglot to convert from one to the other.)
- This lets you test the functionnal characteristic of you queries, in particular, in the face of malformed data in your tables.
- Of course, there is no point of using this to test actual performance, indexes ... This will need actual functionnal tests.
- You can still break down your queries into manageable subqueries and then compose them using WITH/CTE.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html To Be Continued