Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coord: Optimize single statement read transaction #15255

Merged
merged 6 commits into from
Oct 11, 2022

Conversation

jkosh44
Copy link
Contributor

@jkosh44 jkosh44 commented Oct 7, 2022

coord: Optimize single statement read transaction

The PostgreSQL extended wire protocol
(https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY)
leaves it ambiguous as to whether or not the current statement will be
single statement implicit transaction or a multi statement implicit
transaction. Due to this ambiguity, we previously removed all of our
single statement read transactions in
a887755. This ended up negatively
impacting query freshness and latency in the Serializable isolation
level.

PostgreSQL will eagerly commit implicit transactions after executing
certain statements but before receiving a Sync message. See
https://www.postgresql.org/message-id/flat/17434-d9f7a064ce2a88a3%40postgresql.org
and
https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=f92944137.

This commit will eagerly commit ALL statements in an implicit
transaction when using the extended protocol. This vastly simplifies
things and allows us to re-apply the optimizations we previously made
to single statement transactions.

Multi-statement implicit transactions using the extended protocol are
extremely rare and an edge case. They are not possible unless a client
uses pipelining in psql or a custom driver. The PostgreSQL BEGIN
documentation even wrongly implies that they don't exist
(https://www.postgresql.org/docs/14/sql-begin.html). Therefore,
it's much better to apply the optimizations in the common case than to
allow this edge case feature.

Much of this commit is reversing certain parts of
a887755.

Fixes MaterializeInc/database-issues#4202
Fixes MaterializeInc/database-issues#4010

Motivation

This PR adds a known-desirable feature.

Tips for reviewer

  • Some of this commit is reversing certain parts of a887755.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.

  • This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.

  • This PR includes the following user-facing behavior changes:

    • Implicit transactions run using the extended query protocol will eagerly commit after one statement. If you want multi-statement transactions while using the extended query protocol, then use explicit transactions.

The PostgreSQL extended wire protocol
(https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY)
leaves it ambiguous as to whether or not the current statement will be
single statement implicit transaction or a multi statement implicit
transaction. Due to this ambiguity, we previously removed all of our
single statement read transactions in
a887755. This ended up negatively
impacting query freshness and latency in the Serializable isolation
level.

PostgreSQL will eagerly commit implicit transactions after executing
certain statements but before receiving a Sync message. See
https://www.postgresql.org/message-id/flat/17434-d9f7a064ce2a88a3%40postgresql.org
and
https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=f92944137.

This commit will eagerly commit ALL statements in an implicit
transaction when using the extended protocol. This vastly simplifies
things and allows us to re-apply the optimizations we previously made
to single statement transactions.

Multi-statement implicit transactions using the extended protocol are
extremely rare and an edge case. They are not possible unless a client
uses pipelining in psql or a custom driver. The PostgreSQL BEGIN
documentation even wrongly implies that they don't exist
(https://www.postgresql.org/docs/current/sql-begin.html). Therefore,
it's much better to apply the optimizations in the common case than to
allow this edge case feature.

Much of this commit is reversing certain parts of
a887755.

Fixes #14696
@jkosh44 jkosh44 changed the title [WIP]: fix timestamp serializable coord: Optimize single statement read transaction Oct 7, 2022
@jkosh44 jkosh44 marked this pull request as ready for review October 7, 2022 20:44
@jkosh44 jkosh44 requested a review from maddyblue October 7, 2022 21:16
src/adapter/src/session.rs Show resolved Hide resolved
src/adapter/src/session.rs Outdated Show resolved Hide resolved
@@ -0,0 +1,49 @@
# Test that multiple DDL statements in the same extended request are supported
Copy link
Contributor

@maddyblue maddyblue Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this differ from postgres (why is this file in pgtest-mz instead of pgtest)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Postgres the Insert will be aborted while in Materialize the Insert will be committed in it's own transaction. I've added a SELECT at the end to clarify this.

test/pgtest/transactions.pt Show resolved Hide resolved
test/testdrive/numeric.td Show resolved Hide resolved
src/environmentd/tests/sql.rs Outdated Show resolved Hide resolved
src/pgwire/src/protocol.rs Show resolved Hide resolved
@jkosh44 jkosh44 enabled auto-merge (squash) October 11, 2022 16:21
@jkosh44 jkosh44 disabled auto-merge October 11, 2022 16:25
@jkosh44 jkosh44 enabled auto-merge (squash) October 11, 2022 16:25
@jkosh44 jkosh44 merged commit 8404cde into MaterializeInc:main Oct 11, 2022
@jkosh44 jkosh44 deleted the serializable-reads branch October 11, 2022 18:28
benesch added a commit to benesch/materialize that referenced this pull request Dec 22, 2023
The single-statement optimization introduced in MaterializeInc#15255 caused
Materialize to deviate in behavior from PostgreSQL when statements are
submitted in implicit transactions using the extended query protocol.
The deviation in behavior was intentional, as doing so greatly
simplifies our code and it's fairly hard to hit this case in the
protocol.

However, a user recently discovered that the Node.js pg-query-stream
library does not work correctly with Materialize due to this deviation
in behavior. pg-query-stream attempts to page through a portal that is
established in an implicit transaction via the extended query protocol,
but the portal is immediately expired after the first batch of results.

This commit adjusts the adapter/pgwire protocol to special-case paging
through a portal established in an implicit transaction via the extended
protocol, without losing the single-statement optimizations from MaterializeInc#15255.
It adds an end-to-end test to verify that the Node.js pg-query-stream
library is now working as expected.

Fix #24059.
benesch added a commit to benesch/materialize that referenced this pull request Dec 23, 2023
The single-statement optimization introduced in MaterializeInc#15255 caused
Materialize to deviate in behavior from PostgreSQL when statements are
submitted in implicit transactions using the extended query protocol.
The deviation in behavior was intentional, as doing so greatly
simplifies our code and it's fairly hard to hit this case in the
protocol.

However, a user recently discovered that the Node.js pg-query-stream
library does not work correctly with Materialize due to this deviation
in behavior. pg-query-stream attempts to page through a portal that is
established in an implicit transaction via the extended query protocol,
but the portal is immediately expired after the first batch of results.

This commit adjusts the adapter/pgwire protocol to special-case paging
through a portal established in an implicit transaction via the extended
protocol, without losing the single-statement optimizations from MaterializeInc#15255.
It adds an end-to-end test to verify that the Node.js pg-query-stream
library is now working as expected.

Fix #24059.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants