Description
I have run into a very strange problem at flushing, getting the following warnings and exceptions:
- array_map(): An error occurred while invoking the map callback in doctrine\mongodb-odm\lib\Doctrine\ODM\MongoDB\Persisters\CollectionPersister.php on line 219
- Cannot create a DBRef without an identifier.
UnitOfWork::getDocumentIdentifier()
did not return an identifier for class
I have tracked down the problem using bisect to the commit when DocumentManager::createDBRef()
started to throw an exception if $this->unitOfWork::getDocumentIdentifier()
doesn't return with an identifier. The problem was there before that, but it was silent.
Finally I could find the root cause of this. UnitOfWork uses an array documentIdentifiers
to store document ID's indexed by the result of function spl_object_hash
. There is another array documentStates
, also indexed by the same values. However the spl_object_hash
values are not completely unique: once an object is not referenced anywhere and gets garbage-collected, the same hash value is reused, this is clearly stated in the documentation of the function (see http://php.net/manual/en/function.spl-object-hash.php)
Probably it is less of a problem with full documents, but EmbeddedDocuments don't have an ID and therefore not stored in identityMap
, but their managed state is still stored. See the following scenario (an actual one that happened in my case):
- An EmbeddedDocument is persisted
- The EmbeddedDocument is replaced by another instance by a setter in its parent document,
- The old instance is destroyed by garbage collection, but the value
STATE_MANAGED
is still there indocumentStates
. - A new document is persisted, unfortunately its
spl_object_hash
value is the same as the object hash of the already deleted EmbeddedDocument.UnitOfWork
finds the status STATE_MANAGED for it and thinks that the new document is already managed, its identifier is not written intodocumentIdentifiers
causing inconsistency. DocumentManager::getDocumentIdentifier()
is called, unitOfWork doesn't return a document identifier for it, so an Exception is thrown- If documentIdentifier is referenced in a referenceMany association, the map callback fails because of the exception.
The solution I can think of is preventing garbage collection of the documents whose state is stored.
Since real documents are already stored in identityMap anyway, this could be done by putting persisted documents in a new array in UnitOfWork named something like embeddedDocumentsKnown
, indexed by spl_object_hash
. This needs to be done in doPersist for sure, but it may be necessary on other places too, we have to guarantee that all embeddedDocuments that have a state state are stored here, too.
Of course, this means that these EmbeddedDocuments will still occupy memory even if they would be deleted by gc, but probably it is not too much memory, only if we replace a lot of embeddedDocuments and flush too frequently.
The other way would probably be explicitly creating an event in the destructor of the embeddedDocuments and bindig on that event in UnitOfWork, removing the hash from documentStates, but this would require everyone adding extra code to embeddedDocument classes themselves, and I don't think that is a good option.
I am working on this in a forked repo, so I will push a PR when I am ready.