Skip to content

Multilingual Content (PoC) #6282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 20 commits into
base: 6.2
Choose a base branch
from
Draft

Conversation

Cyperghost
Copy link
Contributor

@Cyperghost Cyperghost commented May 7, 2025

Closes #6109

TODO

  • Add a helper function for the subquery

Performance tests

Different queries were tested, these have different performance with larger tables.
LATERAL is currently not available in MariaDB and has also shown poor performance in MySQL and was therefore not investigated further.
An UNION query returned a slightly worse result than ROW_NUMBER and was not investigated further.
The best results were achieved with simple subqueries on the content table wcf1_test_content.

Test Table

CREATE TABLE wcf1_test (
    testID INT NOT NULL AUTO_INCREMENT,
    time INT NOT NULL,
    PRIMARY KEY (testID)
);

CREATE TABLE wcf1_test_content (
    contentID INT NOT NULL AUTO_INCREMENT,
    testID INT NOT NULL,
    languageID INT,
    title VARCHAR(255) NOT NULL,
    content VARCHAR(255) NOT NULL,

    PRIMARY KEY(contentID),
   KEY id (testID, languageID)
);

SELECT COUNT(*) FROM wcf1_test;
 -- 100000
SELECT COUNT(*) FROM wcf1_test_content;
 -- 149860
Queries only sorted by title

Query with simple subqueries

SELECT   test.*
FROM     wcf1_test test
ORDER BY (
    SELECT   title
    FROM     wcf1_test_content
    WHERE    testID = test.testID
    ORDER BY CASE
        WHEN languageID = 1 THEN -2
        WHEN languageID = 2 THEN -1
        ELSE languageID
    END ASC
    LIMIT    1
) DESC, test.testID ASC
LIMIT    20;

Output

-> Limit: 20 row(s)  (cost=10084 rows=20) (actual time=307..307 rows=20 loops=1)
    -> Sort: (select #2) DESC, test.testID, limit input to 20 row(s) per chunk  (cost=10084 rows=100275) (actual time=307..307 rows=20 loops=1)
        -> Table scan on test  (cost=10084 rows=100275) (actual time=2.54..13.5 rows=100000 loops=1)
-> Select #2 (subquery in projection; dependent)
    -> Limit: 1 row(s)  (cost=0.524 rows=1) (actual time=0.00264..0.00266 rows=1 loops=100000)
        -> Sort: (case when (wcf1_test_content.languageID = 1) then -(2) when (wcf1_test_content.languageID = 2) then -(1) else wcf1_test_content.languageID end), limit input to 1 row(s) per chunk  (cost=0.524 rows=1.5) (actual time=0.00256..0.00256 rows=1 loops=100000)
            -> Index lookup on wcf1_test_content using testID (testID = test.testID)  (cost=0.524 rows=1.5) (actual time=0.0019..0.00214 rows=1.5 loops=100000)

JOIN on ROW_NUMBER

SELECT     test.*, testContent.title, testContent.content
FROM       wcf1_test AS test
INNER JOIN (
            SELECT content.*, ROW_NUMBER() OVER (
                PARTITION BY content.testID
                    ORDER BY CASE
                        WHEN content.languageID = 1 THEN -2
                        WHEN content.languageID = 2 THEN -1
                        ELSE content.languageID
                    END ASC
            ) AS   rn
            FROM   wcf1_test_content AS content
      ) AS testContent
        ON testContent.testID = test.testID AND testContent.rn = 1
ORDER BY   testContent.title DESC, test.testID ASC
LIMIT      20;

Output

-> Limit: 20 row(s)  (actual time=328..328 rows=20 loops=1)
    -> Sort: testcontent.title DESC, test.testID, limit input to 20 row(s) per chunk  (actual time=328..328 rows=20 loops=1)
        -> Stream results  (cost=22053 rows=14950) (actual time=203..317 rows=100000 loops=1)
            -> Nested loop inner join  (cost=22053 rows=14950) (actual time=203..303 rows=100000 loops=1)
                -> Filter: (testcontent.rn = 1)  (cost=16821 rows=14950) (actual time=203..233 rows=100000 loops=1)
                    -> Table scan on testContent  (cost=34446..36317 rows=149495) (actual time=203..229 rows=149860 loops=1)
                        -> Materialize  (cost=34446..34446 rows=149495) (actual time=203..203 rows=149860 loops=1)
                            -> Window aggregate: row_number() OVER (PARTITION BY content.testID ORDER BY `(case when (content.languageID = 1) then -(2) when (content.languageID = 2) then -(1) else content.languageID end)` )   (cost=0 rows=149495) (actual time=86.2..129 rows=149860 loops=1)
                                -> Sort: content.testID, `(case when (content.languageID = 1) then -(2) when (content.languageID = 2) then -(1) else content.languageID end)`  (cost=15102 rows=149495) (actual time=86.2..93.3 rows=149860 loops=1)
                                    -> Table scan on content  (cost=15102 rows=149495) (actual time=0.0575..37.3 rows=149860 loops=1)
                -> Single-row index lookup on test using PRIMARY (testID = testcontent.testID)  (cost=0.25 rows=1) (actual time=592e-6..611e-6 rows=1 loops=100000)

Query with Subquery and JOIN

SELECT   testing.*
FROM     (
            SELECT testing.*, (
                SELECT   languageID
                FROM     wcf1_test_content testContent
                WHERE    testContent.testID = testing.testID
                ORDER BY CASE
                    WHEN languageID = 1 THEN -2
                    WHEN languageID = 2 THEN -1
                    ELSE languageID
                    END ASC
                LIMIT 1
            ) AS   languageID
            FROM   wcf1_test testing
         ) as testing
JOIN     wcf1_test_content testContent
  ON     testing.testID = testContent.testID
 AND     ((testing.languageID IS NULL AND testContent.languageID IS NULL) OR testing.languageID = testContent.languageID)
ORDER BY testContent.title DESC, testing.testID ASC
LIMIT    20;

Output

-> Limit: 20 row(s)  (actual time=376..376 rows=20 loops=1)
    -> Sort: testcontent.title DESC, testing.testID, limit input to 20 row(s) per chunk  (actual time=376..376 rows=20 loops=1)
        -> Stream results  (cost=151038 rows=150188) (actual time=198..366 rows=100000 loops=1)
            -> Nested loop inner join  (cost=151038 rows=150188) (actual time=198..355 rows=100000 loops=1)
                -> Table scan on testing  (cost=33189..34445 rows=100275) (actual time=198..204 rows=100000 loops=1)
                    -> Materialize  (cost=33189..33189 rows=100275) (actual time=198..198 rows=100000 loops=1)
                        -> Table scan on testing  (cost=10084 rows=100275) (actual time=1.59..12.1 rows=100000 loops=1)
                        -> Select #3 (subquery in projection; dependent)
                            -> Limit: 1 row(s)  (cost=0.4 rows=1) (actual time=0.00161..0.00163 rows=1 loops=100000)
                                -> Sort: (case when (testContent.languageID = 1) then -(2) when (testContent.languageID = 2) then -(1) else testContent.languageID end), limit input to 1 row(s) per chunk  (cost=0.4 rows=1.5) (actual time=0.00153..0.00153 rows=1 loops=100000)
                                    -> Covering index lookup on testContent using testID (testID = testing.testID)  (cost=0.4 rows=1.5) (actual time=950e-6..0.00117 rows=1.5 loops=100000)
                -> Index lookup on testContent using testID (testID = testing.testID), with index condition: (((testing.languageID is null) and (testContent.languageID is null)) or (testing.languageID = testContent.languageID))  (cost=0.374 rows=1.5) (actual time=0.00127..0.00143 rows=1 loops=100000)
Queries with sort and filter by title

Query with simple subqueries

SELECT   test.*
FROM     wcf1_test test
WHERE    (
    SELECT   title
    FROM     wcf1_test_content
    WHERE    testID = test.testID
    ORDER BY CASE
        WHEN languageID = 1 THEN -2
        WHEN languageID = 2 THEN -1
        ELSE languageID
    END ASC
    LIMIT    1
) LIKE 'title %1%'
ORDER BY (
    SELECT   title
    FROM     wcf1_test_content
    WHERE    testID = test.testID
    ORDER BY CASE
        WHEN languageID = 1 THEN -2
        WHEN languageID = 2 THEN -1
        ELSE languageID
    END ASC
    LIMIT    1
) DESC, test.testID ASC
LIMIT    20;

Output

-> Limit: 20 row(s)  (cost=10084 rows=20) (actual time=357..357 rows=20 loops=1)
    -> Sort: (select #3) DESC, test.testID, limit input to 20 row(s) per chunk  (cost=10084 rows=100275) (actual time=357..357 rows=20 loops=1)
        -> Filter: ((select #2) like 'title %1%')  (cost=10084 rows=100275) (actual time=2.4..305 rows=20577 loops=1)
            -> Table scan on test  (cost=10084 rows=100275) (actual time=2.25..13.5 rows=100000 loops=1)
            -> Select #2 (subquery in condition; dependent)
                -> Limit: 1 row(s)  (cost=0.524 rows=1) (actual time=0.00265..0.00267 rows=1 loops=100000)
                    -> Sort: (case when (wcf1_test_content.languageID = 1) then -(2) when (wcf1_test_content.languageID = 2) then -(1) else wcf1_test_content.languageID end), limit input to 1 row(s) per chunk  (cost=0.524 rows=1.5) (actual time=0.00257..0.00257 rows=1 loops=100000)
                        -> Index lookup on wcf1_test_content using testID (testID = test.testID)  (cost=0.524 rows=1.5) (actual time=0.00191..0.00215 rows=1.5 loops=100000)
-> Select #3 (subquery in projection; dependent)
    -> Limit: 1 row(s)  (cost=0.524 rows=1) (actual time=0.00221..0.00223 rows=1 loops=20577)
        -> Sort: (case when (wcf1_test_content.languageID = 1) then -(2) when (wcf1_test_content.languageID = 2) then -(1) else wcf1_test_content.languageID end), limit input to 1 row(s) per chunk  (cost=0.524 rows=1.5) (actual time=0.00213..0.00213 rows=1 loops=20577)
            -> Index lookup on wcf1_test_content using testID (testID = test.testID)  (cost=0.524 rows=1.5) (actual time=0.00157..0.00177 rows=1 loops=20577)

JOIN on ROW_NUMBER

SELECT     test.*, testContent.title, testContent.content
FROM       wcf1_test AS test
INNER JOIN (
            SELECT content.*, ROW_NUMBER() OVER (
                PARTITION BY content.testID
                ORDER BY CASE
                    WHEN content.languageID = 1 THEN -2
                    WHEN content.languageID = 2 THEN -1
                    ELSE content.languageID
                END ASC
            ) AS   rn
            FROM   wcf1_test_content AS content
      ) AS testContent
        ON testContent.testID = test.testID AND testContent.rn = 1
WHERE      testContent.title LIKE 'title %1%'
ORDER BY   testContent.title DESC, test.testID ASC
LIMIT      20;

Output

-> Limit: 20 row(s)  (actual time=268..268 rows=20 loops=1)
    -> Sort: testcontent.title DESC, test.testID, limit input to 20 row(s) per chunk  (actual time=268..268 rows=20 loops=1)
        -> Stream results  (cost=17402 rows=1661) (actual time=208..266 rows=20577 loops=1)
            -> Nested loop inner join  (cost=17402 rows=1661) (actual time=208..263 rows=20577 loops=1)
                -> Filter: ((testcontent.rn = 1) and (testcontent.title like 'title %1%'))  (cost=16821 rows=1661) (actual time=208..246 rows=20577 loops=1)
                    -> Table scan on testContent  (cost=34446..36317 rows=149495) (actual time=208..236 rows=149860 loops=1)
                        -> Materialize  (cost=34446..34446 rows=149495) (actual time=208..208 rows=149860 loops=1)
                            -> Window aggregate: row_number() OVER (PARTITION BY content.testID ORDER BY `(case when (content.languageID = 1) then -(2) when (content.languageID = 2) then -(1) else content.languageID end)` )   (cost=0 rows=149495) (actual time=82.5..130 rows=149860 loops=1)
                                -> Sort: content.testID, `(case when (content.languageID = 1) then -(2) when (content.languageID = 2) then -(1) else content.languageID end)`  (cost=15102 rows=149495) (actual time=82.5..90.5 rows=149860 loops=1)
                                    -> Table scan on content  (cost=15102 rows=149495) (actual time=0.0569..37.7 rows=149860 loops=1)
                -> Single-row index lookup on test using PRIMARY (testID = testcontent.testID)  (cost=0.25 rows=1) (actual time=707e-6..728e-6 rows=1 loops=20577)

Query with Subquery and JOIN

SELECT   testing.*
FROM     (
            SELECT testing.*, (
                SELECT   languageID
                FROM     wcf1_test_content testContent
                WHERE    testContent.testID = testing.testID
                ORDER BY CASE
                    WHEN languageID = 1 THEN -2
                    WHEN languageID = 2 THEN -1
                    ELSE languageID
                END ASC
                LIMIT 1
            ) AS   languageID
            FROM   wcf1_test testing
    ) as testing
JOIN     wcf1_test_content testContent
  ON     testing.testID = testContent.testID
 AND     ((testing.languageID IS NULL AND testContent.languageID IS NULL) OR testing.languageID = testContent.languageID)
WHERE    testContent.title LIKE 'title %1%'
ORDER BY testContent.title DESC, testing.testID ASC
LIMIT    20;

Output

-> Limit: 20 row(s)  (actual time=376..376 rows=20 loops=1)
    -> Sort: testcontent.title DESC, testing.testID, limit input to 20 row(s) per chunk  (actual time=376..376 rows=20 loops=1)
        -> Stream results  (cost=87010 rows=5014) (actual time=192..374 rows=20577 loops=1)
            -> Nested loop inner join  (cost=87010 rows=5014) (actual time=192..371 rows=20577 loops=1)
                -> Table scan on testing  (cost=33189..34445 rows=100275) (actual time=192..197 rows=100000 loops=1)
                    -> Materialize  (cost=33189..33189 rows=100275) (actual time=192..192 rows=100000 loops=1)
                        -> Table scan on testing  (cost=10084 rows=100275) (actual time=2.09..12 rows=100000 loops=1)
                        -> Select #3 (subquery in projection; dependent)
                            -> Limit: 1 row(s)  (cost=0.4 rows=1) (actual time=0.00156..0.00158 rows=1 loops=100000)
                                -> Sort: (case when (testContent.languageID = 1) then -(2) when (testContent.languageID = 2) then -(1) else testContent.languageID end), limit input to 1 row(s) per chunk  (cost=0.4 rows=1.5) (actual time=0.00148..0.00148 rows=1 loops=100000)
                                    -> Covering index lookup on testContent using testID (testID = testing.testID)  (cost=0.4 rows=1.5) (actual time=917e-6..0.00113 rows=1.5 loops=100000)
                -> Filter: (testContent.title like 'title %1%')  (cost=0.374 rows=0.05) (actual time=0.00164..0.00168 rows=0.206 loops=100000)
                    -> Index lookup on testContent using testID (testID = testing.testID), with index condition: (((testing.languageID is null) and (testContent.languageID is null)) or (testing.languageID = testContent.languageID))  (cost=0.374 rows=1.5) (actual time=0.00137..0.00153 rows=1 loops=100000)

MariaDB delivers very similar results to the queries.

@Cyperghost Cyperghost requested a review from dtdesign May 7, 2025 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants