fix(sql): only return tables in `current_database` #9748

gforsyth · 2024-08-01T21:38:47Z

Description of changes

This started as a fix of #9686 but it turned up a few issues in other backends.

The short version is: when we call get_schema without specifying a
catalog.database prefix, we get tables that exist in many (sometimes all) the
databases on a given backend (that match the name passed to get_schema).

Postgres has things like search_path to help with this, but the tables on search_path are slightly different than what's returned by \dt (just to make this extra fun).

What's implemented here (and what I'm proposing as a general standard) is:

When you ask for a table named foo, we will look through the
current_database and wherever temp tables are stored for a match and return
the schema of that table.

My idea here is that anything that shows up in the output of list_tables (or
con.tables) should be accessible via con.table without additional arguments.

There's at least one edge-case which I discovered while writing this
description, which is if the temp table has the same name as a table in the
current_database, the table returned will be the interleaved schema of the two
tables, which is terrible.

I'll fix that, too.

Issues closed

Fixes bug(postgres): cross-schema column mixing #9686

Todo List of things to fix (not all in this PR):

decide on whether temp tables take precedence over concrete tables or vice-versa
add spark, datafusion and clickhouse exceptions to test
add cross-database table creation to mysql
fix mssql table collection
fix trino table collection
separate out risingwave get_schema because it doesn't have postgres-specific function
confirm that exasol can't create tables in other databases

ibis/backends/tests/test_client.py

cpcloud · 2024-08-01T21:54:27Z

It looks like temp takes precedence if the names collide, at least in DuckDB:

D create table t (x int);
D create temp table t (y int);
D table t;
┌────────┐
│   y    │
│ int32  │
├────────┤
│ 0 rows │
└────────┘

ibis/backends/duckdb/__init__.py

cpcloud · 2024-08-01T21:58:10Z

Pushed up the additional test. It passed without changes 🥳

gforsyth · 2024-08-01T22:01:44Z

It looks like temp takes precedence if the names collide, at least in DuckDB:

Ok, I guess we need to decide on what convention we want to enforce?

gforsyth · 2024-08-01T22:04:15Z

lol, I'm going to have to relax the cross-backend exception check a fair bit until we get Naty's TableNotFound PR in, but I think this should go in before 10.0

gforsyth · 2024-08-01T22:05:36Z

ok, datafusion and clickhouse are failing b/c the exception doesn't match.

MSSQL is failing the way postgres and duckdb did before the fix.

gforsyth · 2024-08-01T22:06:20Z

A hilarious error message from MySQL:

        if database is not None and database != self.current_database:
>           raise com.UnsupportedOperationError(
                "Creating tables in other databases is not supported by Postgres"
            )
E           ibis.common.exceptions.UnsupportedOperationError: Creating tables in other databases is not supported by Postgres

Need to fix that, too...

edit: actually, you CAN create tables in other databases using mysql, so that needs to be fixed.

gforsyth · 2024-08-01T22:13:43Z

Ok, this turned into a bit of a nightmare. I'll pick it back up tomorrow or Monday

cpcloud · 2024-08-04T10:30:54Z

docker/mysql/startup.sql

@@ -1,5 +1,4 @@
 CREATE USER 'ibis'@'localhost' IDENTIFIED BY 'ibis';
 CREATE SCHEMA IF NOT EXISTS test_schema;
-GRANT CREATE, DROP ON *.* TO 'ibis'@'%';
-GRANT CREATE,SELECT,DROP ON `test_schema`.* TO 'ibis'@'%';
+GRANT CREATE,SELECT,DROP ON *.* TO 'ibis'@'%';


Necessary to allow the ibis user to operate on newly created databases.

cpcloud · 2024-08-04T10:31:49Z

ibis/backends/duckdb/__init__.py

-                "column_name",
-                "data_type",
-                sg.column("is_nullable").eq(sge.convert("YES")).as_("nullable"),
+        query = sge.Describe(


DESCRIBE in DuckDB used to be different than using information_schema, but only for CSV files, and that case has been fixed in a version that we no longer support.

cpcloud · 2024-08-04T10:33:34Z

ibis/backends/risingwave/__init__.py

@@ -586,3 +586,9 @@ def drop_sink(
        )
        with self._safe_raw_sql(src):
            pass
+
+    @property
+    def _session_temp_db(self) -> str | None:


This was actually enough to make tests pass, since otherwise get_schema works just fine.

cpcloud · 2024-08-04T10:34:45Z

ibis/backends/trino/__init__.py

-                sg.column("is_nullable").eq(sge.convert("YES")).as_("nullable"),
+                C.column_name,
+                C.data_type,
+                C.is_nullable.eq(sge.convert("YES")).as_("nullable"),
            )
            .from_(sg.table("columns", db="information_schema", catalog=catalog))


I wish we could use DESCRIBE here, but it doesn't contain a nullability column 😞

cpcloud · 2024-08-04T12:02:26Z

Exasol does indeed allow creating tables in other databases, but there was a bug in our code (we weren't quoting identifiers).

cpcloud · 2024-08-04T12:03:10Z

The only remaining task is to decide how we want to scope temp tables versus non-temp tables in backends that support them. We can defer that to a follow-up IMO.

DuckDB puts temp tables into a catalog named `temp` (not a `database`).

…the same catalog

…ion and listing tables

…e of an actual bug

…avior

…d tables

…le name test

gforsyth

Looks good to me!

cpcloud reviewed Aug 1, 2024

View reviewed changes

ibis/backends/tests/test_client.py Show resolved Hide resolved

cpcloud reviewed Aug 1, 2024

View reviewed changes

ibis/backends/duckdb/__init__.py Outdated Show resolved Hide resolved

cpcloud force-pushed the postgres_search_path branch 2 times, most recently from d8f8c5c to 456f95a Compare August 3, 2024 13:02

cpcloud added pyspark The Apache PySpark backend duckdb The DuckDB backend mssql The Microsoft SQL Server backend trino The Trino backend exasol Issues related to the exasol backend postgres The PostgreSQL backend mysql The MySQL backend labels Aug 3, 2024

cpcloud reviewed Aug 4, 2024

View reviewed changes

cpcloud force-pushed the postgres_search_path branch from ed837a7 to b6b0e1d Compare August 4, 2024 11:30

cpcloud force-pushed the postgres_search_path branch 2 times, most recently from 56e3ed2 to d6d2405 Compare August 4, 2024 12:53

cpcloud changed the title ~~fix(postgres, duckdb): only return tables in current_database~~ fix(sql): only return tables in current_database Aug 4, 2024

cpcloud force-pushed the postgres_search_path branch from d6d2405 to c64f759 Compare August 4, 2024 12:58

cpcloud added the ci-run-cloud Add this label to trigger a run of Bigquery and Snowflake in CI label Aug 4, 2024

gforsyth and others added 20 commits August 5, 2024 10:24

fix(postgres): scope get_schema to current database + temp tables

19ac32a

fix(duckdb): scope get_schema to current database + temp tables

0ec2503

DuckDB puts temp tables into a catalog named `temp` (not a `database`).

test(duckdb): add cross schema/catalog test

b14ee2e

chore(duckdb): simplify get_schema by using DESCRIBE

af48c66

chore: clean up errors a bit in duplicate schema test

4cb3300

chore: remove bogus postgres marker

86fd26f

chore: clean up error message for mysql and exasol

5e84fcf

chore: xfail mysql

7dce011

fix(trino): handle tables with the same name in different schemas in …

83c422a

…the same catalog

chore: remove unnecessary sg.and_ calls

0ebb7ae

fix(mssql): always search a single schema when getting table informat…

f7b4bd4

…ion and listing tables

feat(mysql): enable creating tables in other databases

6e1cf4a

test(flink): xfail

cef5969

chore(risingwave): allow reuse of postgres get_schema

a348460

fix(exasol): allow creating tables in other databases

6773416

chore: commentary about exceptions

d9627b4

test(exasol): remove testing hacks that seem to be implemented becaus…

607682b

…e of an actual bug

test: ensure that the type are different enough to test the right beh…

3c1296d

…avior

fix(snowflake): ensure that schema lookup works with identically name…

7046a2e

…d tables

test(bigquery): import googlenotfound error to handle overlapping tab…

56f6f46

…le name test

cpcloud force-pushed the postgres_search_path branch from 907227e to 7929fe0 Compare August 5, 2024 14:28

chore: unnest

936873b

cpcloud force-pushed the postgres_search_path branch from 7929fe0 to 936873b Compare August 5, 2024 14:29

gforsyth commented Aug 5, 2024

View reviewed changes

cpcloud added the ci-run-cloud Add this label to trigger a run of Bigquery and Snowflake in CI label Aug 5, 2024

cpcloud approved these changes Aug 5, 2024

View reviewed changes

cpcloud added this to the 9.3 milestone Aug 5, 2024

ibis-docs-bot bot removed the ci-run-cloud Add this label to trigger a run of Bigquery and Snowflake in CI label Aug 5, 2024

cpcloud merged commit c7f5717 into ibis-project:main Aug 5, 2024
88 checks passed

gforsyth deleted the postgres_search_path branch August 5, 2024 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sql): only return tables in `current_database` #9748

fix(sql): only return tables in `current_database` #9748

gforsyth commented Aug 1, 2024 •

edited by cpcloud

Loading

cpcloud commented Aug 1, 2024

cpcloud commented Aug 1, 2024

gforsyth commented Aug 1, 2024

gforsyth commented Aug 1, 2024

gforsyth commented Aug 1, 2024

gforsyth commented Aug 1, 2024 •

edited

Loading

gforsyth commented Aug 1, 2024

cpcloud Aug 4, 2024

cpcloud Aug 4, 2024

cpcloud Aug 4, 2024

cpcloud Aug 4, 2024

cpcloud commented Aug 4, 2024

cpcloud commented Aug 4, 2024

gforsyth left a comment

fix(sql): only return tables in current_database #9748

fix(sql): only return tables in current_database #9748

Conversation

gforsyth commented Aug 1, 2024 • edited by cpcloud Loading

Description of changes

Issues closed

Todo List of things to fix (not all in this PR):

cpcloud commented Aug 1, 2024

cpcloud commented Aug 1, 2024

gforsyth commented Aug 1, 2024

gforsyth commented Aug 1, 2024

gforsyth commented Aug 1, 2024

gforsyth commented Aug 1, 2024 • edited Loading

gforsyth commented Aug 1, 2024

cpcloud Aug 4, 2024

Choose a reason for hiding this comment

cpcloud Aug 4, 2024

Choose a reason for hiding this comment

cpcloud Aug 4, 2024

Choose a reason for hiding this comment

cpcloud Aug 4, 2024

Choose a reason for hiding this comment

cpcloud commented Aug 4, 2024

cpcloud commented Aug 4, 2024

gforsyth left a comment

Choose a reason for hiding this comment

fix(sql): only return tables in `current_database` #9748

fix(sql): only return tables in `current_database` #9748

gforsyth commented Aug 1, 2024 •

edited by cpcloud

Loading

gforsyth commented Aug 1, 2024 •

edited

Loading