[EWT-1250] Sqlalchemy/Superset expectations from python-DBAPI #17

prantogg · 2024-08-01T18:43:15Z

This PR introduces the following requirements that have risen from the Wherobots x Superset integration -

SQLAlchemy expects the result set to be a List of Tuples or List of Lists. SQLAlchemy wraps this result set in it's own Row object.
Superset requires rollback() and commit() to be implemented. Other OLAP databases such as pyhive simple "pass" the not implemented rollback() and commit() methods. For context - Superset's background processes often bypass the SQLAlchemy dialect and directly interacts with DBAPI. This is why overriding the rollback and commit methods in the Dialect doesn't suffice.
Adds Web socket URL parameter, ws_url, to connection. This helps maintain static connection pool configuration in Superset.

notion-workspace · 2024-08-01T18:48:01Z

SQLAlchemy expectations from Python-DBAPI

wherobots/db/connection.py

peterfoldes · 2024-08-01T18:51:27Z

wherobots/db/connection.py

+                    schema = reader.schema
+                    columns = schema.names
+                    column_types = [field.type for field in schema]
+                    rows = reader.read_all().to_pandas().values.tolist()


Do you need both read_all().to_pandas() or can you read_pandas() directly? (to be fair according to the docs it does the same thing)

Good catch! Using read_pandas() now for better readability.

wherobots/db/cursor.py

peterfoldes · 2024-08-01T18:58:28Z

wherobots/db/driver.py

-            },
-            headers=headers,
+    if ws_url:
+        session_uri = ws_url


Couple of things here:

This can be done earlier, so if we feel confident in using the ws_url we can do that right after checking the token/api-keys.

This needs a little bit different logging, since we're not requesting a new runtime (see line 70).

This should early return so that we don't have to indent everything afterwards. Makes it more readable.

mpetazzoni · 2024-08-06T20:49:26Z

wherobots/db/driver.py

@@ -51,6 +50,7 @@ def connect(
    results_format: Union[ResultsFormat, None] = None,
    data_compression: Union[DataCompression, None] = None,
    geometry_representation: Union[GeometryRepresentation, None] = None,
+    ws_url: str = None,


Why is this required, instead of calling connect_direct() directly?

The reason this parameter wasn't included in connect() is that it creates ambiguity between the rest of the parameters (like runtime/region) and the runtime you'd actually connect to when providing a ws_url, which may not match those choices.

mpetazzoni · 2024-08-06T20:52:41Z

wherobots/db/connection.py

-                query.handler(json.loads(result_bytes.decode("utf-8")))
+                data = json.loads(result_bytes.decode("utf-8"))
+                columns = data["columns"]
+                column_types = data.get("column_types")


Is column_types optional? If so, then it's good that you're using data.get() here, but then in Cursor.__get_results you expect column_types to be non-None. You either need to ensure column_types is always provided, or change __get_results to be more defensive.

mpetazzoni · 2024-08-06T20:56:19Z

wherobots/db/cursor.py

                    None,  # precision
                    None,  # scale
                    True,  # null_ok; Assuming all columns can accept NULL values
                )
-                for col_name in result.columns
+                for i, col_name in enumerate(columns)


Use https://docs.python.org/3/library/functions.html#zip to avoid jumping hoops with an index (it's much nicer to read, and also more efficient):

self.__description = [ ( col_name, _TYPE_MAP.get(col_type, 'STRING'), ... ) for (col_name, col_type) in zip(columns, column_types) ]

prantogg added 6 commits August 1, 2024 13:41

feat: add ws_url parameter to connect

c43c8a4

feat: change return type to List of Tuples

c21261a

feat: allow passing commit() and rollback()

fa5d1b1

add temporary debug logging

df3fff5

remove temporary debug logging

b802b80

add types to type map

65120fb

prantogg marked this pull request as ready for review August 1, 2024 18:44

prantogg requested review from mpetazzoni and peterfoldes August 1, 2024 18:44

prantogg changed the title ~~Sqlalchemy/Superset expectations from python-DBAPI~~ [EWT-1250] Sqlalchemy/Superset expectations from python-DBAPI Aug 1, 2024

peterfoldes requested changes Aug 1, 2024

View reviewed changes

address comments

2c9b5f8

prantogg requested a review from peterfoldes August 1, 2024 23:09

fix typo

96e44dc

mpetazzoni requested changes Aug 6, 2024

View reviewed changes

prantogg marked this pull request as draft August 13, 2024 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[EWT-1250] Sqlalchemy/Superset expectations from python-DBAPI #17

[EWT-1250] Sqlalchemy/Superset expectations from python-DBAPI #17

Uh oh!

prantogg commented Aug 1, 2024

Uh oh!

notion-workspace bot commented Aug 1, 2024

Uh oh!

Uh oh!

peterfoldes Aug 1, 2024

Uh oh!

prantogg Aug 1, 2024

Uh oh!

Uh oh!

peterfoldes Aug 1, 2024

Uh oh!

mpetazzoni Aug 6, 2024

Uh oh!

mpetazzoni Aug 6, 2024

Uh oh!

mpetazzoni Aug 6, 2024

Uh oh!

Uh oh!

[EWT-1250] Sqlalchemy/Superset expectations from python-DBAPI #17

Are you sure you want to change the base?

[EWT-1250] Sqlalchemy/Superset expectations from python-DBAPI #17

Uh oh!

Conversation

prantogg commented Aug 1, 2024

Uh oh!

notion-workspace bot commented Aug 1, 2024

Uh oh!

Uh oh!

peterfoldes Aug 1, 2024

Choose a reason for hiding this comment

Uh oh!

prantogg Aug 1, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

peterfoldes Aug 1, 2024

Choose a reason for hiding this comment

Uh oh!

mpetazzoni Aug 6, 2024

Choose a reason for hiding this comment

Uh oh!

mpetazzoni Aug 6, 2024

Choose a reason for hiding this comment

Uh oh!

mpetazzoni Aug 6, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!