Skip to content

Wrong transaction state for broken database connection #40489

Open
@timonzi

Description

Describe the bug

If the database connection is interrupted while we are persisting an entity in a transaction, it may be that the entity has been persisted in the database, but we are not aware of this via our transaction observers, or that the wrong observer is being called.

Example code

We have a simple method which creates/persists the entity and fires an event:

@Inject
Event<TestEvent> event;

@Transactional
public void createEntity(final int i) {
    event.fire(new TestEvent("entity" + i));

    final var entity = new TestEntity();
    entity.setId("entity" + i);

    entityManager.persist(entity);
}

Within a bean, we have multiple observers:

public void afterSuccess(@Observes(during = TransactionPhase.AFTER_SUCCESS) final TestEvent event) {
    // do something
}


public void afterFailure(@Observes(during = TransactionPhase.AFTER_FAILURE) final TestEvent event) {
    // do something
}


public void afterCompletion(@Observes(during = TransactionPhase.AFTER_COMPLETION) final TestEvent event) {
    // do something
}

Now we are faced with the problem in our (real) applications that if we have a problem with the database connection during the persistence of the entity (e.g. because the database was restarted, a network error occurred, etc.), the AFTER_FAILURE observer is called, but the entity was successfully persisted.

This is an inconsistent state and we cannot know whether the entity has really been persisted or not - without performing another check on the database. This might not be a big problem for simple applications, but for generic implementations, or if you don't know what may or may not have been (partially) persisted in the transaction, then this is a very big problem for us.

Hint: We have the same problem when using an implementation of jakarta.transaction.Synchronization. Also here the status in the afterCompletion method is the wrong one (Status.STATUS_ROLLEDBACK instead of Status.STATUS_COMMITTED).

Please let me know if I need to create an issue at another location / project.

Expected behavior

The status of the transaction must reflect whether entities have been persisted or not. It must not be the case that entities have been persisted but we are not aware of this. The same applies in the opposite case, of course.

Or to put it another way: We must be sure that the status of the observers is consistent with regard to the database operations.

Actual behavior

Entities may have been successfully persisted to the database, but we get the information that the transaction failed.

How to Reproduce?

Reproducer: https://github.com/timonzi/transaction-phase-reproducer

Execute the PersistenceTest.

What it does:

  • It starts a Postgres:15.5 instance as Quarkus test resource
  • It creates a thread, which kills all connections every 100 ms
  • It tries to create and persist 100000 entities

Hint: We create so many entities and kill the connections so often because it is not clear to me at which point the problem exists. It can happen of course also when the database connection is lost only once (as said, because of Database restart, network problem, etc.)

What can we see:

  • There are more persisted entities on the database than expected

Example log output:

COUNT = 100000; // Number of entities which should be persisted in a loop
exceptionCounter = 141; // Number of exceptions which occurred in the loop
COUNT - exceptionCounter = 99859; // Number of entities which we expect to be persisted, because of the exceptions
entityCount = 99869; // Number of entities in the database
observerBean.getAfterCompletionCount() = 100000; // Number of calls of after completion observer
observerBean.getAfterSuccessCount() = 99859; // Number of calls of after success observer
observerBean.getAfterFailureCount() = 141; // // Number of calls of after failure observer

Output of uname -a or ver

Linux nb-timonz 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Output of java -version

OpenJDK Runtime Environment Temurin-17.0.6+10 (build 17.0.6+10)

Quarkus version or git rev

3.9.3 (but I think the problem occurs in all Quarkus versions)

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.9.3

Additional information

No response

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions