Improve identifier and string escaping in the legacy driver by JanJakes · Pull Request #321 · WordPress/sqlite-database-integration

JanJakes · 2026-02-13T16:16:06Z

Improve escaping consistency in the legacy SQLite translator:

Use quote_identifier() for all identifier interpolations (table names, column names, index names, trigger names) in SQL queries, token values, and DDL reconstruction output.
Use parameterized queries and PDO::quote() for all string literals processed by the translator.

See added tests for more details.

Use parameterized queries and PDO::quote() consistently for all string literals processed by the translator.

brandonpayton

@JanJakes I scanned these changes, and they generally look good to me. I left a couple of questions.

I also asked Claude for feedback. You probably have already done so, but some of the items looked possibly legitimate, so I'll paste the feedback here. The following is from Claude Code (Opus 4.6):

Summary

This PR introduces two systemic improvements to the legacy SQLite translator:

A new quote_identifier() method that properly wraps identifiers in backticks and escapes internal backticks by doubling them.
Consistent use of PDO::quote() and parameterized binding for string literals instead of manual quoting.

It touches ~30 locations across class-wp-sqlite-translator.php and adds 7 well-targeted tests. The changes are clearly security-motivated and address real escaping gaps.

What's Good

Centralized identifier quoting. The new quote_identifier() method eliminates a scattered mix of "\"$var\"", '"' . $var . '"', and '`' . $var . '`' patterns, replacing them all with a single correct implementation. This is exactly the right approach.

Proper normalize_column_name(). The old trim($column_name, ''"')was lossy — it would mangle an identifier like ``colname` by stripping outer backticks but leaving the escaped inner ones. The new version correctly strips matching quote pairs and unescapes doubled quote characters.

extract_bound_parameter() broadened to all strings. Removing the FLAG_STRING_SINGLE_QUOTES restriction means double-quoted strings (which MySQL treats as string literals in non-ANSI mode) now get parameterized instead of being inlined into the SQL. This directly addresses SQL injection vectors through double-quoted strings — the testDoubleQuotedStringWithBackslashEscapeDoesNotCauseInjection test demonstrates this well.

escape_like_to_glob() cleanup. The old version did trim($pattern, "'\"") then manually wrapped in single quotes — a classic escaping gap if the pattern contains a literal single quote. Using $this->pdo->quote() fixes this properly.

get_autoincrement_column() regex fix. Now handles both backtick-quoted and double-quote-quoted identifiers, with proper support for escaped internal quotes ( [^`]| ``+ instead of [^"]+). This is necessary since the PR itself changes DDL output to use backticks.

Tests are well-chosen. They cover the specific edge cases that motivate the changes: backtick-in-identifier, double-quoted string parameterization, injection via backslash-escaped double quotes, single quotes in LIKE BINARY patterns, etc.

Issues / Questions

1. `extract_bound_parameter` — double-quoted identifiers risk

By broadening from FLAG_STRING_SINGLE_QUOTES to TYPE_STRING, any double-quoted token classified as TYPE_STRING will now be extracted as a bound parameter. In MySQL, "foo" is a string literal (unless ANSI_QUOTES is enabled). But SQLite treats "foo" as an identifier.

The question: does the lexer ever emit double-quoted identifiers with TYPE_STRING? If the MySQL lexer classifies `table_name` as TYPE_SYMBOL but "table_name" as TYPE_STRING, then this change could accidentally parameterize identifiers in queries like:

SELECT * FROM "my_table"

Could you confirm how the lexer classifies double-quoted identifiers vs. double-quoted string literals? The AS keyword guard only protects alias contexts.

2. Inconsistency: `escape_like_to_glob` receives unquoted value but old code expected quoted

The old code did trim($pattern, "'\"") to strip quotes, implying $pattern_token->value included surrounding quotes. The new code skips the trim and goes straight to str_replace + pdo->quote().

In translate_like_binary(), the pattern comes from $pattern_token->value. If the token value still includes surrounding quotes (e.g., 'it''s%'), then pdo->quote() will double-quote it, producing something like ''it''s%''. Could you verify that $pattern_token->value is unquoted at this point? The test testLikeBinaryWithSingleQuoteInFormat passes, which is encouraging, but I want to be sure the token's .value property is consistently unquoted.

3. `execute_create_table` — extra space before `$if_not_exists`

Line 943 in the diff:

'CREATE ' . $unique . 'INDEX ' . $if_not_exists . ' ' . $this->quote_identifier( $index_name ) . ...

If $if_not_exists is empty string, this produces CREATE INDEX `` index_name `` ... with a double space. The old code had the same issue since $if_not_exists was interpolated inline, but since you're touching this line, it might be worth a trim or conditional space. Minor nit.

4. Missing `normalize_column_name` on `execute_delete` table name (line 1420/1434)

The PR adds normalize_column_name() calls when extracting table names in execute_delete():

$this->table_name = $this->normalize_column_name( $rewriter->skip()->value );

This is good — but should the same treatment be applied in other places where $this->table_name is set from token values? A quick grep shows $this->table_name is assigned in several other methods. If those tokens can also be quoted, they'd need the same normalization. Though that may be out of scope for this PR.

5. Token type for quoted identifiers in rewriter

In several places, the PR changes token types when adding quoted identifiers:

// Old:
new WP_SQLite_Token( $this->table_name, WP_SQLite_Token::TYPE_KEYWORD, WP_SQLite_Token::FLAG_KEYWORD_RESERVED )
// New:
new WP_SQLite_Token( $this->quote_identifier( $this->table_name ), WP_SQLite_Token::TYPE_KEYWORD, WP_SQLite_Token::FLAG_KEYWORD_KEY )

The flag change from FLAG_KEYWORD_RESERVED to FLAG_KEYWORD_KEY seems intentional and correct (these are identifiers, not reserved words). Just want to confirm the rewriter doesn't filter or treat tokens differently based on these flags.

6. `$ids_to_delete` still not parameterized

In execute_delete() around the diff's line 263-268:

"DELETE FROM {$quoted_table} WHERE {$quoted_pk} IN (" . implode( ',', $ids_to_delete ) . ')'

The $ids_to_delete values come from a prior query result ($id['id_1']), so they're not user-controlled. But for defense-in-depth consistency with the rest of the PR's approach, these could also use parameterized binding. Not a blocker, just noting.

Overall Assessment

This is a solid, well-scoped security hardening PR. The quote_identifier() method and consistent PDO::quote() usage are clear improvements. The tests cover the key edge cases. The main thing I'd want clarified is item #1 (how the lexer classifies double-quoted tokens) since that determines whether broadening extract_bound_parameter is safe or could accidentally parameterize identifiers.

tests/WP_SQLite_Translator_Tests.php

wp-includes/sqlite/class-wp-sqlite-translator.php

For consistency, all $this->table_name assignments now go through normalize_column_name(). While ->value is typically already unquoted by the lexer, this makes the code easier to reason about without needing to know that detail.

The DELETE path selects primary key values and then uses them in a DELETE ... WHERE pk IN (...) query. Previously, the values were interpolated directly via implode(). While the values come from the database (not user input), they could be non-integer types (e.g., VARCHAR primary keys), which would produce broken SQL. Use bound parameters instead.

JanJakes · 2026-03-06T10:25:30Z

Thanks for the thorough review, @brandonpayton! I went through all the comments and addressed them when needed.

For the Claude-generated items:

1. extract_bound_parameter — double-quoted identifiers risk: I verified that this is safe. In the legacy lexer, a double-quoted literal is always classified as TYPE_STRING, because parse_string runs before parse_symbol.

2. escape_like_to_glob — unquoted value: This is correct, because $token->value is already unquoted by the token's extract() method. The old trim() was usually a no-op, and in fact could corrupt values with quotes in them.

3. Double space when $if_not_exists is empty: Since this is pre-existing, I'd leave it as-is in the legacy driver.

4. Missing normalize_column_name in other methods: Good point to make this consistent. Done in 84f2455.

5. Token flags: These are newly built output tokens. I can confirm this has no effect on how the output SQL is built.

6. $ids_to_delete not parameterized: Good point. Addressed in 08f0f15.

brandonpayton

Thanks for the updates and explanations, @JanJakes. This looks good to go.

JanJakes force-pushed the translator-escaping-fixes branch from 86ea87d to d02b6d1 Compare February 13, 2026 16:19

JanJakes force-pushed the translator-escaping-fixes branch from d02b6d1 to b260d23 Compare March 3, 2026 16:25

JanJakes changed the title ~~Escape all identifier interpolations in the legacy SQLite translator~~ Improve identifier and string escaping in the legacy SQLite translator Mar 3, 2026

JanJakes changed the title ~~Improve identifier and string escaping in the legacy SQLite translator~~ Improve identifier and string escaping in the legacy driver Mar 3, 2026

JanJakes force-pushed the translator-escaping-fixes branch 2 times, most recently from 2f325db to 34e9091 Compare March 5, 2026 12:56

JanJakes added 2 commits March 5, 2026 14:16

Use quote_identifier() consistently in the legacy SQLite translator

5efe334

Improve string escaping in the legacy SQLite translator

b5dfeb5

Use parameterized queries and PDO::quote() consistently for all string literals processed by the translator.

JanJakes force-pushed the translator-escaping-fixes branch from 34e9091 to b5dfeb5 Compare March 5, 2026 13:17

JanJakes marked this pull request as ready for review March 5, 2026 15:02

brandonpayton approved these changes Mar 6, 2026

View reviewed changes

tests/WP_SQLite_Translator_Tests.php Show resolved Hide resolved

wp-includes/sqlite/class-wp-sqlite-translator.php Show resolved Hide resolved

JanJakes added 3 commits March 6, 2026 10:20

Break after finding the first matching query in tests

ece1e39

JanJakes requested a review from brandonpayton March 6, 2026 10:36

brandonpayton approved these changes Mar 6, 2026

View reviewed changes

JanJakes merged commit 1b3d5e0 into trunk Mar 6, 2026
16 checks passed

JanJakes deleted the translator-escaping-fixes branch March 6, 2026 15:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve identifier and string escaping in the legacy driver#321

Improve identifier and string escaping in the legacy driver#321
JanJakes merged 5 commits intotrunkfrom
translator-escaping-fixes

JanJakes commented Feb 13, 2026 •

edited

Loading

Uh oh!

brandonpayton left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

JanJakes commented Mar 6, 2026

Uh oh!

brandonpayton left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JanJakes commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonpayton left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Summary

What's Good

Issues / Questions

1. extract_bound_parameter — double-quoted identifiers risk

2. Inconsistency: escape_like_to_glob receives unquoted value but old code expected quoted

3. execute_create_table — extra space before $if_not_exists

4. Missing normalize_column_name on execute_delete table name (line 1420/1434)

5. Token type for quoted identifiers in rewriter

6. $ids_to_delete still not parameterized

Overall Assessment

Uh oh!

Uh oh!

Uh oh!

JanJakes commented Mar 6, 2026

Uh oh!

brandonpayton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JanJakes commented Feb 13, 2026 •

edited

Loading

brandonpayton left a comment •

edited

Loading

1. `extract_bound_parameter` — double-quoted identifiers risk

2. Inconsistency: `escape_like_to_glob` receives unquoted value but old code expected quoted

3. `execute_create_table` — extra space before `$if_not_exists`

4. Missing `normalize_column_name` on `execute_delete` table name (line 1420/1434)

6. `$ids_to_delete` still not parameterized