Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer removing removed entities from to-many collections until after transaction commit #10763

Conversation

mpdude
Copy link
Contributor

@mpdude mpdude commented Jun 5, 2023

This PR fixes regressions introduced in 2.15.1 related to how the UnitOfWork and EntityPersisters implement removal of entities that are part of many-to-many collections.

Background

To better understand the problem and regressions caused, we need to take a look at how the ORM deals with many-to-many associations when an entity shall be removed that is part of such an association.

BasicEntityPersister::deleteJoinTableRecords() is the method responsible for efficient removal of join-table rows in the database when a single entity is removed. It will DELETE all join-table rows referring to the entity in question before the entity is removed itself. This only works when the many-to-many association is declared on the entity's side (ie. its a bi-directional many-to-many association, or the entity is the owning side). Otherwise, since the ORM cannot detect associations that are declared in other classes only, join tables are left alone and database-level constraints apply.

BasicEntityPersister::deleteJoinTableRecords() implements a rather efficient join-table row removal, since a single DELETE query per removed entity and association can be used to possibly delete multiple rows (for associations with different other entities).

Besides deleting join-table rows in the database, we also want to update collections to no longer contain the removed entity. Removing the entity from these collections is the in-memory equivalent to removing the join table rows in the database.

Early during the commit phase, the UnitOfWork::computeAssociationChanges() method is responsible for looking at all associations in all in-memory entities. Among a few other tasks, it will identify all "dirty" collections – those that have been changed and need to be flushed to the database. This method will also check all collections for entities pending removal, and it will remove (unset) these entities from the collections.

Current situation

#10485 complained that the computeAssociationChanges would first take note of the "dirty" collections and then unset entities pending removal. The consequences of this were that

  • in collections that were already dirty (changed) before flush() was called, the join-table row would be removed with a dedicated DELETE statement, while bringing the collection and database in sync
  • more importantly, previously clean collections would be dirty after flush() returns (the opposite of what flush() is supposed to achieve), and subsequent flush() operations would fail since a DELETE statement would be issued for an entity that was no longer in the identity map.

The fix in #10486 (contained in 2.15.1) tried to address this by first unsetting the to-be-removed entities before taking note of (then) dirty collections. This caused unintended behavior changes that were not spotted by tests:

Suggested fix

  • Improve ManyToManyBasicAssociationTest to make sure collections are clean after flush() returns, and include assertions regarding the expected query count. This would have spotted per-row DELETEs instead of using the optimized persister method
  • Do not modify collections during computeAssociationChanges, but instead track which entity needs to be removed from which collection in a dedicated array
  • This way, join-tables will be dealt with by the deleteJoinTableRecords method only
  • After all changes have been flushed successfully, in the finishing phase of the UoW commit, go back to the list of tracked entity removals. Remove the entities from the respective collections immediately before taking new snapshots (= defining the new "clean" state). This makes sure we only do in-memory cleanups and cause no additional database operations due to removals.
  • Improve documentation to raise awareness of how this proess works

Fixes #10752, closes #10753.

@mpdude mpdude force-pushed the defer-collection-entity-removals-to-post-commit branch 2 times, most recently from b21f7e4 to 1620137 Compare June 5, 2023 20:23
Comment on lines +937 to +945
// If this collection is dirty, schedule it for updates
if ($value instanceof PersistentCollection && $value->isDirty()) {
$coid = spl_object_id($value);

$this->collectionUpdates[$coid] = $value;
$this->visitedCollections[$coid] = $value;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reverts the main change from #10486.

@mpdude mpdude force-pushed the defer-collection-entity-removals-to-post-commit branch 2 times, most recently from b280e90 to e8055f0 Compare June 5, 2023 22:19
@mpdude mpdude marked this pull request as ready for review June 6, 2023 22:28
When collection updates/join table cleanups do not happen through specialized Entity-/CollectionPersister methods but instead as "plain" updates, we may issue a lot more queries than expected.
@mpdude mpdude force-pushed the defer-collection-entity-removals-to-post-commit branch from 78ba74c to d220494 Compare June 21, 2023 17:15
@mpdude
Copy link
Contributor Author

mpdude commented Jun 21, 2023

Do you think there should be a mapping configuration setting/switch to tell the ORM not to handle the join table itself and always leave it to the database layer?

Probably that has to be a completely new setting. Might be beneficial for performance, to get consistent behavior regardless of using uni/bidirectional associations and to be able to RESTRICT removing entities that have m2m links.

@greg0ire
Copy link
Member

@mpdude I don't know what level of flexibility people expect, or what's the sensible thing to expect here. I think we should keep the ORM as simple as possible. If people let us know they need this, we can always implement it.

@greg0ire greg0ire merged commit f2abf61 into doctrine:2.15.x Jun 22, 2023
@greg0ire greg0ire added this to the 2.15.3 milestone Jun 22, 2023
@greg0ire
Copy link
Member

Thanks @mpdude !

@mpdude mpdude deleted the defer-collection-entity-removals-to-post-commit branch June 22, 2023 12:36
mpdude added a commit that referenced this pull request Dec 22, 2023
Make sure in-memory collections have removed entities unset before the `postRemove` event is dispatched. This is related to #11098, although by itself probably not going to fix the regression.

 #### Background

When an entity is removed from the database, the UoW also takes care of removing that entity from all in-memory collections. #10763 moved this cleanup and taking snapshots of updated collections to a very late stage during the commit phase, in order to avoid other side effects.

Now, part of the issue in #11098 is that `postRemove` event listeners will be called at a point where the database-level `DELETE` has happened (although the transaction has not yet been committed), but the in-memory collections have not yet been updated.

 #### Suggested change

This PR splits taking the new collection snapshots and updating in-memory collections. The in-memory update happens along with the database-level execution, collection snapshots still happen after transaction commit.
mpdude added a commit that referenced this pull request Dec 22, 2023
Make sure in-memory collections have removed entities unset before the `postRemove` event is dispatched. This is related to #11098, although by itself probably not going to fix the regression.

 #### Background

When an entity is removed from the database, the UoW also takes care of removing that entity from all in-memory collections. #10763 moved this cleanup and taking snapshots of updated collections to a very late stage during the commit phase, in order to avoid other side effects.

Now, part of the issue in #11098 is that `postRemove` event listeners will be called at a point where the database-level `DELETE` has happened (although the transaction has not yet been committed), but the in-memory collections have not yet been updated.

 #### Suggested change

This PR splits taking the new collection snapshots and updating in-memory collections. The in-memory update happens along with the database-level execution, collection snapshots still happen after transaction commit.
mpdude added a commit that referenced this pull request Jan 2, 2024
Make sure in-memory collections have removed entities unset before the `postRemove` event is dispatched. This is related to #11098, although by itself probably not going to fix the regression.

 #### Background

When an entity is removed from the database, the UoW also takes care of removing that entity from all in-memory collections. #10763 moved this cleanup and taking snapshots of updated collections to a very late stage during the commit phase, in order to avoid other side effects.

Now, part of the issue in #11098 is that `postRemove` event listeners will be called at a point where the database-level `DELETE` has happened (although the transaction has not yet been committed), but the in-memory collections have not yet been updated.

 #### Suggested change

This PR splits taking the new collection snapshots and updating in-memory collections. The in-memory update happens along with the database-level execution, collection snapshots still happen after transaction commit.
mpdude added a commit that referenced this pull request Jan 2, 2024
…e `postRemove` event is dispatched. This is related to #11098, although by itself probably not going to fix the regression.

#### Background 

When an entity is removed from the database, the UoW also takes care of removing that entity from all in-memory collections. #10763 moved this cleanup and taking snapshots of updated collections to a very late stage during the commit phase, in order to avoid other side effects.

Now, part of the issue in #11098 is that `postRemove` event listeners will be called at a point where the database-level `DELETE` has happened (although the transaction has not yet been committed), but the in-memory collections have not yet been updated.

#### Suggested change

This PR moves the code part that unsets removed entities from in-memory collections and takes collection snapshots (makes collections "clean") from after transaction commit to before the `postRemove` event. That brings the in-memory update closer to the database-level execution.

In the case of a transaction failure/abort, this leaves us with updated and snapshotted in-memory collections, but no database-level updates. But, at this point, probably things got inconsistent anyways, the EM will be closed and we need not be too worried about the state.
mpdude added a commit that referenced this pull request Jan 2, 2024
…e `postRemove` event is dispatched. This is related to #11098, although by itself probably not going to fix the regression.

#### Background 

When an entity is removed from the database, the UoW also takes care of removing that entity from all in-memory collections. #10763 moved this cleanup and taking snapshots of updated collections to a very late stage during the commit phase, in order to avoid other side effects.

Now, part of the issue in #11098 is that `postRemove` event listeners will be called at a point where the database-level `DELETE` has happened (although the transaction has not yet been committed), but the in-memory collections have not yet been updated.

#### Suggested change

This PR moves the code part that unsets removed entities from in-memory collections and takes collection snapshots (makes collections "clean") from after transaction commit to before the `postRemove` event. That brings the in-memory update closer to the database-level execution.

In the case of a transaction failure/abort, this leaves us with updated and snapshotted in-memory collections, but no database-level updates. But, at this point, probably things got inconsistent anyways, the EM will be closed and we need not be too worried about the state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No ForeignKeyConstraintViolationException exception when trying to remove an entity
3 participants