Skip to content

sqlite transaction creation is not error or drop-safe. #3932

@kevincox

Description

@kevincox

I have found these related issues/pull requests

None

Description

This is sort of two bugs in one. But the causes are very related (partially the same) and the simpler error-handling bug actually helps demonstrate the problem while being much simpler and more reliable to trigger. So I have decided to report them together. I am also reporting them from the perspective of sqlite because that is where we encountered them and the code that we have explored most thoroughly. However the root cause is in the database independent Transaction object and likely affects all databases, and if databases are not affected it is likely more by chance than by intention.

Drop Safety

With very unfortunate drop-timing or an error during transaction setup code. sqlx_sqlite::SqliteTransactionManager::begin can leave a connection with an open transaction. If this connection is not specially cleaned up it can result in future calls that try to open a transaction such as SqlitePool::begin to fail with:

attempted to call begin_with at non-zero transaction depth

For many people using sqlite this is a major issue. Because sqlite doesn't support concurrent writers it is common to use a single write connection. If this connection is "corrupted" by being left with an open transaction all attempts to use a write transaction in the application will fail. We were lucky in that we noticed fairly quickly, if this happened at night we likely would have had hours of downtime.


async fn begin(conn: &mut SqliteConnection, statement: Option<SqlStr>) -> Result<(), Error> {
let is_custom_statement = statement.is_some();
conn.worker.begin(statement).await?;
if is_custom_statement {
// Check that custom statement actually put the connection into a transaction.
let mut handle = conn.lock_handle().await?;
if !handle.in_transaction() {
return Err(Error::BeginFailed);
}
}
Ok(())
}

In SqliteTransactionManager::begin with a custom statement it is possible that the future is dropped or an error is raised between when the transaction is started (at some point during the execution of line 16) and when line 19 completes (at which point it will be wrapped in a Transaction object that cleans it up). In this case the Transaction object would never be constructed and the transaction would never be rolled back.

DB::TransactionManager::begin(&mut conn, statement).await?;
Ok(Self {
connection: conn,
open: true,
})

It is also quite possible that this same error could occur without a custom statement, but I have not yet looked into the implementation of ConnectionWorker::begin to see if it has any suspension points after the transaction is started. Either way the code structure in Transaction is very error prone. If the future is Droped between the transaction being started (somewhere inside DB::TransactionManager::begin) and the Transaction object being constructed the connection will unexpectedly be left in an "in-transaction" state. This could very likely affect all database implementations as the main "bug" is in the database independent Transaction object. Our use case is just triggering it via the SqliteTransactionManager implementation.

I can think of two simple ways to resolve this:

  1. Build the Transaction object immediately, then call DB::TransactionManager::begin on the connection inside of the transaction. The impl Drop for Transaction will then need to handle the case where the transaction was not actually started. (This could be possibly be done by changing done to be a try state where it starts "unknown" then switches to "open" after the transaction is started then moves to "closed" after rollback or commit.)
  2. Create a new StartingTransaction object that has the same "close transaction if open" logic described above, but once DB::TransactionManager::begin move the connection from that object into the Transaction object (in a Drop and panic-safe way). This avoids Transaction needing to handle the non-transaction case which should be very exceptional and would be good to avoid extra database calls for (especially for networked databases).

Error Safety

A bug with basically the same cause is error safety. Any error raised during transaction setup after the connection is put into in a transaction state (whether an actual underlying sqlite transaction or a ConnectionWorker transaction_depth > 0 state) will not result in the connection being closed.

Some errors are handled, for example if the default or custom statement fails the internal transaction_depth counter is not incremented. Since this error also means a transaction was not started this leaves sqlx in a good "no transaction" state.

let res =
conn.handle
.exec(statement.as_str())
.map(|_| {
shared.transaction_depth.fetch_add(1, Ordering::Release);
});

However other errors are not properly handled. For example there is an explicit check to ensure that the provided custom statement creates a transaction.

// Check that custom statement actually put the connection into a transaction.
let mut handle = conn.lock_handle().await?;
if !handle.in_transaction() {
return Err(Error::BeginFailed);
}

If this check fails it raises an error that avoids the Transaction object being created. This then means that the transaction would not be closed. In theory this isn't an error because there is no transaction. However for sqlite this is an issue because the ConnectionWorker still believes it is in one and will fail future attempts at opening transactions.

I can think of two improvements. Probably it is necessary to do both of them:

  1. Move the "is in connection" check to ConnectionWorker so that it can avoid incrementing transaction_depth if the command succeeded but did not start a transaction. However this approach is fragile as it does not other possible errors which could occur after a transaction was opened.
  2. Basically the same solution as the drop problem, build some object with a Drop that will clean up the transaction no matter why it is done, including any error states. In this case the ConnectionWorker may still need to be updated to track transaction_depth properly but it should also be robust to other errors.

Fixing

I can contribute a patch if we decide on the approach. DB::TransactionManager::get_transaction_depth should provided the required information for determining if the transaction needs to be rolled back without needing database-specific logic. Additionally the ConnectionWorker will need to be re-thought to avoid losing sync when the statement doesn't actually open a transaction.

Reproduction steps

Unfortunately I don't have a concrete way to reproduce the drop case. (It can probably be done by figuring out the exact number of times to poll the future, but I didn't feel like doing that as the reproducer would be very fragile) but I can show the error case which is a very similar error path.

#[sqlx::test]
async fn write_tx_recover(pool: SqlitePool) -> anyhow::Result<()> {
    let mut conn = pool.acquire().await?;

    dbg!(
        Transaction::begin(
            MaybePoolConnection::<Sqlite>::Connection(&mut conn),
            Some("SELECT 1".into()),
        )
        .await
    );

    dbg!(
        Transaction::begin(
            MaybePoolConnection::<Sqlite>::Connection(&mut conn),
            Some("THIS IS NEVER RUN".into()),
        )
        .await
    )?;

    Ok(())
}

Output:

---- db::sqlite::write_tx_recover stdout ----
[api-server/src/db/sqlite.rs:291:5] Transaction::begin(MaybePoolConnection::<Sqlite>::Connection(&mut conn),
Some("SELECT 1".into()),).await = Err(
    BeginFailed,
)
[api-server/src/db/sqlite.rs:299:5] Transaction::begin(MaybePoolConnection::<Sqlite>::Connection(&mut conn),
Some("adsjlfjadslfjaslfejl".into()),).await = Err(
    InvalidSavePointStatement,
)
Error: attempted to call begin_with at non-zero transaction depth

This is a bit of a strange reproducer because the SELECT 1 doesn't actually start an sqlite transaction. However it exercises the same code and bug as the drop case. In this case the ConnectionWorker increments transaction_depth, but then an error is returned before the Transaction object is created. The transaction is never cleaned up (because that was the responsibility of the Transaction which was never constructed). The second call then fails because ConnectionWorker still believes that it is in a transaction. The second command is garbage because it is never run, ConnectionWorker returns an error before even getting to sqlite.

This could be modified to trigger the Drop bug.

  1. Update the commands to be BEGIN IMMEDIATE.
  2. Instead of await-ing the first future poll it N times then drop it. Where N is enough times to start the transaction in sqlite (and/or ConnectionWorker) but not enough to complete the future (and build the Transaction object.

SQLx version

0.8.5 (and code checked against current main branch)

Enabled SQLx features

"sqlite", "chrono", "uuid", "runtime-tokio-rustls", "macros", "migrate"

Database server and version

3.48.0 2025-01-14 11:05:00 d2fe6b05f38d9d7cd78c5d252e99ac59f1aea071d669830c1ffe4e8966e84010 (64-bit)

Operating system

Linux

Rust version

1.87.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions