Description
Summary
This is illustrated in test__struct.py
. Here's a sample failing test that captures how I would expect this to behave:
import sqlalchemy
def test_struct_parens_demo(faux_conn):
from sqlalchemy_bigquery import ARRAY, STRUCT
table = sqlalchemy.Table(
"t",
sqlalchemy.MetaData(),
sqlalchemy.Column("person", STRUCT(children=ARRAY(STRUCT(name=sqlalchemy.String)))),
)
expr = table.c.person.children[0].name
expected_sql = f"SELECT `t`.`person`.children[OFFSET(%(param_1:INT64)s)].name AS `anon_1` \nFROM `t`"
actual_sql = str(sqlalchemy.select(expr).compile(faux_conn.engine))
assert actual_sql == expected_sql
AssertionError: assert 'SELECT ((`t`...1` \nFROM `t`' == 'SELECT `t`.`...1` \nFROM `t`'
- SELECT `t`.`person`.children[OFFSET(%(param_1:INT64)s)].name AS `anon_1`
+ SELECT ((`t`.`person`.children)[OFFSET(%(param_1:INT64)s)]).name AS `anon_1`
? ++ + +
FROM `t`
I believe this is happening because the STRUCT type is taking on the lowest possible operator precedence in SQLAlchemy, when it actually has the highest operator precedence in BigQuery.
This doesn't create any functional issue, but it does make it hard to read the rendered SQL. For example, I work on a system where we run snapshot tests on the SQL generated by the bigquery dialect. We also often consult the generated SQL in the BigQuery console when debugging.
Investigation
In _struct.py
we make SQLAlchemy aware of struct_getitem_op
by patching the default comparator to alias json_getitem_op
:
sqlalchemy.sql.default_comparator.operator_lookup[
struct_getitem_op.__name__
] = sqlalchemy.sql.default_comparator.operator_lookup["json_getitem_op"]
But it seems this is not the only mapping for operators. We also have _PRECEDENCE, which determines when parens are necessary.
If the target operator does not exist in _PRECEDENCE
, it takes on the lowest possible precedence.
So this would explain why we get unnecessary parens. Note that I'm calling these parens "unnecessary" because I see that all indexed types in BigQuery have the same precedence (the highest).
Proposal
I guess we could also patch _PRECEDENCE
, but I don't think that its necessary to patch SQLAlchemy internals to accomplish this. Here's an alternative: route both ARRAY and STRUCT indexing operations to getitem
.
I think the intention for the builtin getitem
operator is that it can handle any indexing operation, not necessarily just for the builtin ARRAY type. It seems possible to tweak visit_getitem_binary
to render based on the type of the LHS of the indexing operation:
def visit_getitem_binary(self, binary, operator_, **kw):
left = self.process(binary.left, **kw)
if binary.left.type._type_affinity is _struct.STRUCT:
return f"{left}.{binary.right.value}"
right = self.process(binary.right, **kw)
return f"{left}[OFFSET({right})]"
Routing indexing operations to getitem
also ensures that they have the same precedence, so the parens go away.
I poked around and found a few examples of other dialects checking _type_affinity
in visit hooks (just to be sure we're not exchanging one internals reference for another) - here's a sample from mysql.