Skip to content

Conversation

@jackc
Copy link
Owner

@jackc jackc commented Nov 2, 2025

This is a proof of concept for skipping Describe portal message when executing a prepared statement.

Currently, pgx always sends a Describe portal message when executing a prepared statement. It receives a RowDescription message in response. This is convenient as result sets always include a RowDescription first regardless of whether the query was executed with the simple protocol, the extended protocol without a prepared statement, or the extended protocol with a prepared statement.

But pgx always Describe prepared statements when it creates them. So it already has the RowDescription. The only thing it lacks is the format (text or binary) of the result fields as that is specified per execution. But if pgx remembered the formats it requested when it sent the query it could synthesize the complete RowDescription without needing to ask PostgreSQL to resend it.

This proof of concept adds a new method, *PgConn.ExecPreparedStatementDescription() that tests this approach.

Here are results of one of the existing benchmarks adapted to use the new method along with the original method:

jack@glados ~/dev/pgx ±prepared-statements-skip-describe-portal » got -run=^$ -bench=PgConnExecPrepared -benchmem
goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5
cpu: Apple M3 Max
BenchmarkSelectRowsPgConnExecPrepared/1_rows/text-16            	   23263	     49916 ns/op	     104 B/op	      10 allocs/op
BenchmarkSelectRowsPgConnExecPrepared/1_rows/binary_-_mostly-16 	   24380	     48728 ns/op	     104 B/op	      10 allocs/op
BenchmarkSelectRowsPgConnExecPrepared/10_rows/text-16           	   21608	     55964 ns/op	     104 B/op	      10 allocs/op
BenchmarkSelectRowsPgConnExecPrepared/10_rows/binary_-_mostly-16         	   21546	     55750 ns/op	     104 B/op	      10 allocs/op
BenchmarkSelectRowsPgConnExecPrepared/100_rows/text-16                   	   10000	    112878 ns/op	     128 B/op	      12 allocs/op
BenchmarkSelectRowsPgConnExecPrepared/100_rows/binary_-_mostly-16        	   10000	    103770 ns/op	     129 B/op	      12 allocs/op
BenchmarkSelectRowsPgConnExecPrepared/1000_rows/text-16                  	    1762	    672274 ns/op	     441 B/op	      25 allocs/op
BenchmarkSelectRowsPgConnExecPrepared/1000_rows/binary_-_mostly-16       	    2073	    580714 ns/op	     474 B/op	      26 allocs/op
BenchmarkSelectRowsPgConnExecPreparedStatementDescription/1_rows/text-16 	   25375	     48632 ns/op	      32 B/op	       2 allocs/op
BenchmarkSelectRowsPgConnExecPreparedStatementDescription/1_rows/binary_-_mostly-16         	   24746	     48533 ns/op	      32 B/op	       2 allocs/op
BenchmarkSelectRowsPgConnExecPreparedStatementDescription/10_rows/text-16                   	   21912	     54659 ns/op	      40 B/op	       2 allocs/op
BenchmarkSelectRowsPgConnExecPreparedStatementDescription/10_rows/binary_-_mostly-16        	   22455	     53232 ns/op	      40 B/op	       2 allocs/op
BenchmarkSelectRowsPgConnExecPreparedStatementDescription/100_rows/text-16                  	   10000	    113159 ns/op	      64 B/op	       4 allocs/op
BenchmarkSelectRowsPgConnExecPreparedStatementDescription/100_rows/binary_-_mostly-16       	   10000	    104351 ns/op	      65 B/op	       4 allocs/op
BenchmarkSelectRowsPgConnExecPreparedStatementDescription/1000_rows/text-16                 	    1813	    677880 ns/op	     377 B/op	      17 allocs/op
BenchmarkSelectRowsPgConnExecPreparedStatementDescription/1000_rows/binary_-_mostly-16      	    2062	    575758 ns/op	     410 B/op	      18 allocs/op
PASS
ok  	github.com/jackc/pgx/v5	23.469s

There is a tiny improvement in runtime, on the order of a few 100ns to a 1000ns. Per query memory usage and allocations are reduced by an amount significant to this benchmark. Whether it is significant in the context of an application is another question.

It also reduces the amount of network traffic. The test TestConnExecPreparedStatementDescriptionNetworkUsage measures the bytes written and read to the PostgreSQL server using the same query used in the benchmark above when returning a single row.

The amount of bytes written to the server only varies by 7 bytes, 54 without Describe and 61 with Describe. But the bytes received varies by 238 bytes, 153 without Describe and 391 with Describe. That is 2.55x received bytes.

The percentage change will vary significantly based on the number of columns in the result set, which determines the size of the RowDescription message, and the number of rows returned. If only one row is returned it is quite likely that RowDescription is bigger than the actual data. But if many rows are returned then the RowDescription cost is insignificant.


Considerations for whether to move forward with this approach:

  1. The biggest issue is the general increase in complexity. It is yet another code path that can be taken when executing queries.
  2. This proof of concept doesn't consider whether it is safe to directly update the prepared statement description. There might be concurrency issues if someone is doing something with the statement description in another goroutine. Now I can't think of any reason why someone would do that, so presumably documenting that you can't mess with a statement description while it is being executed would be sufficient.
  3. Batches would need to use this new approach as well.
  4. It is perfectly valid to do this according to the documented PostgreSQL protocol. However, the PostgreSQL C library libpq doesn't exercise this path. It always sends the Describe portal message. See https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-exec.c#L1883-L1895. We may run into edge cases with PostgreSQL as no one else may be doing this. In addition, it may cause compatibility issues with semi-compatible databases like CRDB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants