Description
Is your feature request related to a problem? Please describe.
Current API gives users either:
- stale reads when operating on transaction isolation level below serializable
- serialization error that adds extra handling burden on application developer
Describe the solution you'd like
It's a combination of the API interface itself and quickstart documentation for beginner database users.
Transaction isolation level
Stale reads are not possible on "strong serializable" isolation level for distributed databases and "ANSI serializable" for single node databases. Operation on lower isolation levels requires application developer to deal with various anomalies and can result in security issues.
So 'serializable' isolation level is a must for the design goal of this feature. And the decreased transaction throughput is not the priority.
The quickstart documentation could show how to enable this level as session default for the supported DBMS. For example, for Postgresql it could be to ask user to connect to the desired database under the desired user and issue
ALTER USER current_user IN DATABASE current_catalog SET default_transaction_isolation = 'serializable';
No stale reads API
Transaction retries
On serializable isolation level the API should handle transaction retries so the library user could focus on his/her current tasks. The retry logic could be configured as parameters for "commit finalizer" logic that will allow power users to tune maximum number of retries and maybe retry strategy. The quickstart documentation for beginners should use the finalizer with defaults - the chosen max_retries
number and exponential backoff with jitter strategy.
Example sleep duration between retries in milliseconds: sleep_ms = (2 ** retry) * 0.1 * (random.random() + 0.5)
.
Simpler finalizer with default strategy "commit finalizer" could have name commit_with_max_retries
.
For Postgres case there are several error codes that could be retried:
-
40000 - transaction_rollback - It may happen during statement preparation phase or, for example during large batched INSERT, I found description here - https://github.com/postgres/postgres/blob/f972ec5c285c3bc50d81f8044e08cd636017ab68/src/backend/replication/logical/reorderbuffer.c#L2540
-
40001 - serialization_failure - the target error of this feature
-
40P01 - deadlock_detected - may happen on commit or after deadlock_timeout. Can be eliminated by following consistent lock order. It's possible to log it as a warning and hint user to review transactions statement order.
Limit users to run side effects
Since transaction can be retried it's not safe to run side effects / non idempotent logic between the transaction queries. It means that the API should provide an interface that allows to run the transaction as a whole with potential retries and either get the desired consistent state of the applied transaction or to get SQLError (for example, IO/TLS error or constraint violation error).
How is it possible to limit side effects?
I think about interface like TryStream combinators. Let's call them "query combinators". The user is expected to glue transactional code using a chain of query combinators. The call chain starts with "transaction input capture" method that returns the structure that implements chainable interface trait. The "commit finalizer" is attached to the last query combinator and executes the constructed nested structure and handles retries.
Combinators accept move
closure as a final argument. There are combinators both for non async and async closures.
Closures could call init logic on each transaction retry if it's need. It could be both non async and async.
The input state of transaction is a single value or a tuple that is captured by value at the start of call chain by "transaction input capture" method. It can include initial dummy values of transaction results. These dummies constrain the transaction output type.
The captured input is propagated through the call chain as the first parameter of combinators. It is modified and returned by the transaction pipeline - in similar fashion as TryStream::try_fold works.
The set of combinators mirrors existing sqlx interface. The most generic one is fetch-like
query combinator that allows to process query results in streaming fashion within single query combinator closure. query_as<T>
- like combinator is useful to collect the result into some specified type, execute-like
- just let the user to check the query result that usually has the count of affected rows.
So the second query combinator parameter depends on the type of combinator and represents the result or result stream of the previous query combinator in the chain.
Non async closures return Result. Async closures return a future that resolves to Result.
The error type may be generic and allow user to interrupt transaction by returning custom error. The transaction error is an enum of either SQLError or custom user error. Closure returns a different enum to indicate an error - either custom error variant or the error for manual transaction rollback that finishes the transaction. It's possible to use custom error itself to indicate the rollback of whole transaction and special error variant with savepoint to name if savepoints are supported. Savepoints make transaction execution logic nonlinear. They could be implemented as special combinators that accept nested subtransaction.
The Ok variant of closure return type is a pair (accumulated_transaction_state, Option). Optionality allows to add conditional logic and make the transaction pipeline structure dynamic.
If there is Some(query) the accumulated state is propagated further through the call chain. If None - it means that the given state is the desired transaction output and it's time to commit transaction.
The transaction Ok variant value has the same type as captured initial state at the start of transaction.
This value doesn't have stale data. It could be processed by further user logic outside of transaction scope.
API extension to support both explicit and implicit transaction pipelines.
Postgres allows to pipeline multiple independent queries in a single batched call and receive results for all of them at once. The are called within implicit transaction.
It's possible to unify interface both for explicit and implicit (pipelined) transactions.
Pipelined query combinators have differences:
- it doesn't issue the next query because the next query doesn't depend on result processing state - for happy case it just returns
accumulated_transaction_state
- it can't rollback explicitly but could do some result validation and return custom error to abort the call chain earlier
- "commit finalizer" processes them differently
- it can collapse sequential implicit pipeline combinators into internal implicit pipeline structure. The structure will execute each relevant closures during the processing of the combined result set. In case of error it will throw away partial result
- "BEGIN" and "COMMIT" statements are pipelined together with the first or the last implicit pipeline. It allows to execute a single implicit pipeline and these statements within a single query batch. CockroachDB will strip these statements and autoretry the implicit transaction when result set is under 16KiB. For larger result set pipeline will start client-side processing and may get serialization error. Then client-side retry mechanism is activated and pipeline is reexecuted.
In general case the transaction chain could be a single explicit transaction that could have multiple pipelined sections. Partial collapse is beneficial for these scenario too because less communication rounds is required to process the chain.
Query combinators documentation
It's important to help user to help to choose the correct combinators. The documentation may provide a flow chart with question paths that user should answer and pick the best combinator for particular case. The flow chart should prioritize implicit pipeline combinators when then next user query doesn't depend on results of the previous query. Then it could ask for execute-like, query_scalar-like, query_as-like, fetch_all-like combinators and finally suggest the most generic fetch-like
query combinator.