Skip to content

WIP: improve "live" layer performance with materialised views#5822

Draft
mhsdesign wants to merge 66 commits into
neos:feature/content-graph-hierachy-layers-with-workspace-changesfrom
mhsdesign:task/improve-layer-performance-materialized-views
Draft

WIP: improve "live" layer performance with materialised views#5822
mhsdesign wants to merge 66 commits into
neos:feature/content-graph-hierachy-layers-with-workspace-changesfrom
mhsdesign:task/improve-layer-performance-materialized-views

Conversation

@mhsdesign
Copy link
Copy Markdown
Member

@mhsdesign mhsdesign commented May 11, 2026

Experiment on top of #5776

The Dbal adapter could accept additional configuration via its Factory and Objects.yaml, to optimise a fixed set of workspaces like the root workspace "live". This configuration creates new tables and thus schema based on said configuration and thus must not be dynamic. But for workspace names which are a stable identifier - especially for root workspaces - this is valid.

The last bit of "magic" is that we can switch out the HierarchyRelationStatement for the Content and Subgraph to use a more performant query for said configured workspaces - it just must be guaranteed that the result is the same as the at read time calculations.

Option A: Plain Views

Either a view of just the highest layer per hierarchy id, or as a full view of all live hierarchy rows with data.

But as views are not cached we dont see any performance improvements.

CREATE VIEW $databaseNameEscaped.{$this->tableNames->hierarchyRelationWorkspaceView(WorkspaceName::forLive())} AS
SELECT h1.id, MAX(h1.contentstreamlayer) AS contentstreamlayer
    FROM {$this->tableNames->hierarchyRelation()} AS h1
        INNER JOIN {$this->tableNames->contentStreamLayer()} AS l
            ON l.contentstreamlayer = h1.contentstreamlayer
        WHERE l.contentstreamid = (
            SELECT w.currentcontentstreamid
            FROM {$this->tableNames->workspace()} AS w
                WHERE w.name = 'live'
            LIMIT 1
        )
GROUP BY id

Option B: Tables with CREATE TRIGGER

A.k.a pseudo materialised views which are not supported by m*sql

An updater that on insert adds the newly created hierarchies in that layer or adjust the layerId for a previously synchronised edge. This solution depends on AFTER INSERT to access the autoincremented value and does not seem to be threadsafe.

CREATE TRIGGER $databaseNameEscaped.hierarchy_live_updater AFTER INSERT
    ON {$this->tableNames->hierarchyRelation()} FOR EACH ROW
    BEGIN
        IF NEW.contentstreamlayer = (
            SELECT l.contentstreamlayer FROM cr_default_p_graph_contentstreamlayer AS l
                INNER JOIN cr_default_p_graph_workspace AS w
                ON l.contentStreamId = w.currentContentStreamId
            WHERE w.name = 'live'
            ORDER BY l.contentStreamLayer DESC
            LIMIT 1
        ) THEN
            INSERT INTO {$this->tableNames->hierarchyRelationForWorkspace(WorkspaceName::forLive())}
                SET id = NEW.id, contentstreamlayer = NEW.contentstreamlayer
            ON DUPLICATE KEY UPDATE contentstreamlayer = NEW.contentstreamlayer;
        END IF;
    END;

Option C: Manually update live layerId and hierarchyId table

This behaves similar as B, except its race condition safe but does not perform well and slows writing by 4 to 7 times depending on how good implemented.

if ($event instanceof EmbedsWorkspaceName && $event instanceof EmbedsContentStreamId && $event->getWorkspaceName()->isLive()) {
  $contentStreamLayers = $this->getContentStreamLayers($event);

  $syncHierarchyRelationsStatement = <<<SQL
      INSERT INTO {$this->tableNames->hierarchyRelationForWorkspace(WorkspaceName::forLive())}
      (
          id,
          contentstreamlayer
      )
      SELECT
          h.id,
          h.contentstreamlayer
      FROM {$this->tableNames->hierarchyRelation()} AS h
          LEFT JOIN {$this->tableNames->hierarchyRelationForWorkspace(WorkspaceName::forLive())} wh
              ON h.contentstreamlayer = wh.contentstreamlayer
              AND h.id = wh.id
          WHERE h.contentstreamlayer = :contentStreamWriteLayer
          AND wh.contentstreamlayer IS NULL
      ON DUPLICATE KEY UPDATE contentstreamlayer = VALUES(contentstreamlayer);
  SQL;

  try {
      $this->dbal->executeStatement($syncHierarchyRelationsStatement, [
          'contentStreamWriteLayer' => $contentStreamLayers->getWriteLayer()->value,
      ]);
  } catch (DBALException $e) {
      throw new \RuntimeException(sprintf('TODO: %s', $e->getMessage()), 1776345058, $e);
  }
}

Option D: Stored procedures

... are unhandy as CALL cannot be used in subselects or joins.

Option E: Share the result of the hierarchy query if used multiple times.

As we cannot use Functions, Stored Procedures or Variables as they are either not memorised or only support one value we could used WITH to share a subquery.

The analysis though shows that the query is EXACTLY treated as if the select was inlined so this is rather cosmetic.

WITH subquery AS (SELECT h.*
     FROM ... -- expensive calculation
)
SELECT n.*, h.subtreetags, dsp.dimensionspacepoint AS covereddimensionspacepoint
FROM cr_def_p_graph_node n
         INNER JOIN subquery as h
                    ON h.childnodeanchor = n.relationanchorpoint
         INNER JOIN subquery as ch
                    ON ch.parentnodeanchor = n.relationanchorpoint
         INNER JOIN cr_def_p_graph_dimensionspacepoints dsp ON dsp.hash = h.dimensionspacepointhash
         INNER JOIN cr_def_p_graph_node cn ON cn.relationanchorpoint = ch.childnodeanchor
WHERE cn.nodeaggregateid = '30000000-0000-0000-0000-000000000000';

Option F:

In the Projection always update the ids in the live hierarchy table in sync with the changes to the full hierarchy table. The simple part would be to adjust Neos\ContentGraph\DoctrineDbalAdapter\Domain\Projection\HierarchyRelation::addToDatabase() and removeFromDatabase() to also write to the special workspace table in accordance with the events workspaceName.

But that only settles creating new hierarchy relations and removing them. In case they are created via copy on write we leverage many queries performing an upsert via SELECT * FROM ... INSERT INTO hierarcies ON DUPLICATE KEY UPDATE
We can safely ignore any modifications to existing rows but we definitely need to know new rows added because they did not exist. As the database handles this fully internally we are not aware of what exactly changed. Returning the select first into PHP would be possible but be a major downside for big write operations - possibly impossible if memory is exhausted when f.x. tagging a siteNode and definitely slower.

Neither mysql or mariadb support writing to two tables at the same time based on one select as suggested here https://stackoverflow.com/a/68313315
But mariadb would already support RETURNING id, contentstreamlayer from the insert operation which could be used then from PHP to issue a second insert.
Now mysql doesnt even support RETURNING so all hope is lost and we would need to make a full diff what changed which is expensive as per Option C.

@mhsdesign mhsdesign force-pushed the feature/content-graph-hierachy-layers-with-workspace-changes branch from 21ead2e to ab53acf Compare May 23, 2026 19:06
mhsdesign added 29 commits May 23, 2026 21:07
…uce internal `ContentStreamDbId` neos#5735)

Adjust hierarchy relation table to use auto-incrementing number as alias for ContentStreamId which is a uuid or rather any string.
Motivation was to reduce size and load on the hierarchy tables which are huge anyway.
Also during reading we do query the ContentStreamId for the requested workspace (see getContentGraph) now querying the internal ContentStreamDbId instead for all read operations is no overhead.

Squashed commits from neos#5735

This change serves as a base to introduce content stream layers
implemented for set properties

and query hierarchy relation rows by taking multiple overlaying content stream db ids per content stream in account
as per Neos.ContentRepository.BehavioralTests/Tests/Behavior/Features/ContentStreamForking/NodeReferencesOnForkContentStream.feature
mhsdesign added 13 commits May 23, 2026 21:07
… nodes

they were from a time, where the Node embodied its content stream which was always fetched from hierarchies
otherwise the optimizer only uses the index for the `id` column and does not efficiently calculate the `MAX(contentstreamlayer)`

Now it states `Using index`
The `$rightmostSucceedingSiblingRelationStatement` which is generally used for creating new nodes under a parent and not stating its siblings, profits from this index as otherwise only the index for position sorting is used.
... to signal exactly ONE currentContentStream can be used for joining

That way for the `contentStreamLayer` we are `Using where`
we cannot use the hierarchy id as we join on the `dh.parentnodeanchor`
this way the `WHERE` might even be better optimised as no more than one result is expected
It does not matter where to put the subtree tag filtering. Even adding ->where('h.dimensionspacepointhash = :dimensionSpacePointHash')
seems to produce the same EXPLAIN
@mhsdesign mhsdesign force-pushed the feature/content-graph-hierachy-layers-with-workspace-changes branch 2 times, most recently from 2164469 to 820b799 Compare May 23, 2026 19:12
@mhsdesign mhsdesign force-pushed the feature/content-graph-hierachy-layers-with-workspace-changes branch from 820b799 to 01ad0fa Compare May 23, 2026 19:51
mhsdesign added 6 commits May 23, 2026 21:51
…n id

... but simply determine next value in PHP
…hy ids

- passes testsuite except `Features/D3-MoveDimensionSpacePoint/MoveDimensionSpacePoint.feature:262` where there are 13 nodes still instead of expected 9
- parallel tests fail and show changes in live that should not have been (`WorkspacePublicationDuringWritingTest`)
Note this is way slower as the database has to diff all entries and does not know which are new
@mhsdesign mhsdesign force-pushed the task/improve-layer-performance-materialized-views branch from 0c3d543 to b3a8454 Compare May 23, 2026 20:21
mhsdesign added 2 commits May 24, 2026 16:17
... because layer tables already removed and check does not pass)
@mhsdesign mhsdesign force-pushed the feature/content-graph-hierachy-layers-with-workspace-changes branch 2 times, most recently from 94ec98e to bfafd01 Compare June 6, 2026 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant