WIP: improve "live" layer performance with materialised views#5822
Draft
mhsdesign wants to merge 66 commits into
Conversation
21ead2e to
ab53acf
Compare
…uce internal `ContentStreamDbId` neos#5735) Adjust hierarchy relation table to use auto-incrementing number as alias for ContentStreamId which is a uuid or rather any string. Motivation was to reduce size and load on the hierarchy tables which are huge anyway. Also during reading we do query the ContentStreamId for the requested workspace (see getContentGraph) now querying the internal ContentStreamDbId instead for all read operations is no overhead. Squashed commits from neos#5735 This change serves as a base to introduce content stream layers
implemented for set properties and query hierarchy relation rows by taking multiple overlaying content stream db ids per content stream in account
as per Neos.ContentRepository.BehavioralTests/Tests/Behavior/Features/ContentStreamForking/NodeReferencesOnForkContentStream.feature
…rite (move dsp will be a partial fork:O)
… nodes they were from a time, where the Node embodied its content stream which was always fetched from hierarchies
otherwise the optimizer only uses the index for the `id` column and does not efficiently calculate the `MAX(contentstreamlayer)` Now it states `Using index`
The `$rightmostSucceedingSiblingRelationStatement` which is generally used for creating new nodes under a parent and not stating its siblings, profits from this index as otherwise only the index for position sorting is used.
... to signal exactly ONE currentContentStream can be used for joining That way for the `contentStreamLayer` we are `Using where`
using `doctrine/sql-formatter`
we cannot use the hierarchy id as we join on the `dh.parentnodeanchor`
this way the `WHERE` might even be better optimised as no more than one result is expected
It does not matter where to put the subtree tag filtering. Even adding ->where('h.dimensionspacepointhash = :dimensionSpacePointHash')
seems to produce the same EXPLAIN
2164469 to
820b799
Compare
820b799 to
01ad0fa
Compare
…n id ... but simply determine next value in PHP
…hy ids - passes testsuite except `Features/D3-MoveDimensionSpacePoint/MoveDimensionSpacePoint.feature:262` where there are 13 nodes still instead of expected 9 - parallel tests fail and show changes in live that should not have been (`WorkspacePublicationDuringWritingTest`)
Note this is way slower as the database has to diff all entries and does not know which are new
0c3d543 to
b3a8454
Compare
... because layer tables already removed and check does not pass)
94ec98e to
bfafd01
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Experiment on top of #5776
The Dbal adapter could accept additional configuration via its Factory and Objects.yaml, to optimise a fixed set of workspaces like the root workspace "live". This configuration creates new tables and thus schema based on said configuration and thus must not be dynamic. But for workspace names which are a stable identifier - especially for root workspaces - this is valid.
The last bit of "magic" is that we can switch out the
HierarchyRelationStatementfor the Content and Subgraph to use a more performant query for said configured workspaces - it just must be guaranteed that the result is the same as the at read time calculations.Option A: Plain Views
Either a view of just the highest layer per hierarchy id, or as a full view of all live hierarchy rows with data.
But as views are not cached we dont see any performance improvements.
CREATE VIEW $databaseNameEscaped.{$this->tableNames->hierarchyRelationWorkspaceView(WorkspaceName::forLive())} AS SELECT h1.id, MAX(h1.contentstreamlayer) AS contentstreamlayer FROM {$this->tableNames->hierarchyRelation()} AS h1 INNER JOIN {$this->tableNames->contentStreamLayer()} AS l ON l.contentstreamlayer = h1.contentstreamlayer WHERE l.contentstreamid = ( SELECT w.currentcontentstreamid FROM {$this->tableNames->workspace()} AS w WHERE w.name = 'live' LIMIT 1 ) GROUP BY idOption B: Tables with
CREATE TRIGGERA.k.a pseudo materialised views which are not supported by m*sql
An updater that on insert adds the newly created hierarchies in that layer or adjust the layerId for a previously synchronised edge. This solution depends on
AFTER INSERTto access the autoincremented value and does not seem to be threadsafe.Option C: Manually update live layerId and hierarchyId table
This behaves similar as B, except its race condition safe but does not perform well and slows writing by 4 to 7 times depending on how good implemented.
Option D: Stored procedures
... are unhandy as
CALLcannot be used in subselects or joins.Option E: Share the result of the hierarchy query if used multiple times.
As we cannot use Functions, Stored Procedures or Variables as they are either not memorised or only support one value we could used
WITHto share a subquery.The analysis though shows that the query is EXACTLY treated as if the select was inlined so this is rather cosmetic.
Option F:
In the Projection always update the ids in the live hierarchy table in sync with the changes to the full hierarchy table. The simple part would be to adjust
Neos\ContentGraph\DoctrineDbalAdapter\Domain\Projection\HierarchyRelation::addToDatabase()andremoveFromDatabase()to also write to the special workspace table in accordance with the events workspaceName.But that only settles creating new hierarchy relations and removing them. In case they are created via copy on write we leverage many queries performing an upsert via
SELECT * FROM ... INSERT INTO hierarcies ON DUPLICATE KEY UPDATEWe can safely ignore any modifications to existing rows but we definitely need to know new rows added because they did not exist. As the database handles this fully internally we are not aware of what exactly changed. Returning the select first into PHP would be possible but be a major downside for big write operations - possibly impossible if memory is exhausted when f.x. tagging a siteNode and definitely slower.
Neither mysql or mariadb support writing to two tables at the same time based on one select as suggested here https://stackoverflow.com/a/68313315
But mariadb would already support
RETURNING id, contentstreamlayerfrom the insert operation which could be used then from PHP to issue a second insert.Now mysql doesnt even support
RETURNINGso all hope is lost and we would need to make a full diff what changed which is expensive as per Option C.