Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute prepared statements on client side by default #310

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

damian3031
Copy link
Member

Draft PR to discuss approach for #306

@mdesmet
Copy link
Contributor

mdesmet commented Jan 5, 2023

Just to complete the information for @hashhar.

We discussed the following approaches

  1. We could also avoid the prepare and deallocation query by simply adding the prepared statement directly on the ClientSession. The advantage is that we can still make use of the query parser on the server side. The disadvantage is that we still use the headers as transport.

  2. Do the parsing on the clientside. then we have to take care of values and identifiers within the query that contain ?. eg. cur.execute("SELECT '?''?', ?", 'test') or cur.execute('select 1 as "?""test", ?', 'test')

  3. Transport the prepared statement parameters in the JSON body instead of the headers as a backwards compatible API change, that can be used by all clients instead of just the trino python client

question_mark_positions.reverse()
for index, value in enumerate(reversed(params)):
operation = "".join([operation[:question_mark_positions[index]],
"'", value, "'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value should not be enclosed by single quotes but use the appropriate type string.

)

# substitue parameters in query in reversed order
question_mark_positions = [index for index, character in enumerate(operation) if character == '?']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative to the reverse trick is to use string slicing. maybe that would be slightly more readable?

@damian3031 damian3031 force-pushed the prepared-statements-client-side branch from 1daa3d4 to dab8de2 Compare January 5, 2023 09:44
@hashhar
Copy link
Member

hashhar commented Jan 6, 2023

Doing on the client side makes the most sense to me depending on what problem we're trying to solve.

1 (directly manipulate ClientSession) seems like a hack.
2 (client side) solves the problem of multiple queries but introduces security concerns.
3 (new backward compatible API) solves the limitation that prepared statements cannot be very large but it doesn't help the fact that multiple queries need to be sent anyway.

I think we can do the following:

  • Make paramstyle settable and only allow it to be set to either format or qmark with qmark as default.
  • When we create Connection we "freeze" the paramstyle value i.e. a Connection's paramstyle is constant.
  • paramstyle = format would do client side bindings, paramstyle = qmark would do server side (what is already done today)
  • For paramstyle = format it's easier to do client side binding since you no longer need to worry about "finding" all markers and can just use str % values from Python to do the value setting.

For paramstyle = format we need to:

  • escape all params only - query text needs no escaping - if something needs escaping there (%) then the user should do that - this is what all Python drivers do (I checked psycopg2, pymysql, Oracle).
  • Do query % escaped_values and execute the query.

Where the handling of qmark and format style needs to be done? - to me it seems the logical place is to have two Cursor classes and override their execute and executemany methods. Alternatively we can keep existing cursor and introduce two methods process_qmark_params and process_format_params and call them from execute and executemany.

The question we need to answer

What's the biggest problem we want to solve?

  1. multiple queries sent when using params?
  2. the size limitation of prepared statements?

If we want to solve 1 then the only solution is what we discuss above.
If we want to solve 2 then the API change that @mdesmet mentioned in his comment is a good solution.

To be honest I don't know which of these problems people actually care about.

The benefit of customizable paramstyle is that we can solve both problems at once but at the cost of some additional code and complexity (both for users and for us).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants