Improve identifier and string escaping in the legacy driver#321
Improve identifier and string escaping in the legacy driver#321
Conversation
86ea87d to
d02b6d1
Compare
d02b6d1 to
b260d23
Compare
2f325db to
34e9091
Compare
Use parameterized queries and PDO::quote() consistently for all string literals processed by the translator.
34e9091 to
b5dfeb5
Compare
There was a problem hiding this comment.
@JanJakes I scanned these changes, and they generally look good to me. I left a couple of questions.
I also asked Claude for feedback. You probably have already done so, but some of the items looked possibly legitimate, so I'll paste the feedback here. The following is from Claude Code (Opus 4.6):
Summary
This PR introduces two systemic improvements to the legacy SQLite translator:
- A new
quote_identifier()method that properly wraps identifiers in backticks and escapes internal backticks by doubling them. - Consistent use of
PDO::quote()and parameterized binding for string literals instead of manual quoting.
It touches ~30 locations across class-wp-sqlite-translator.php and adds 7 well-targeted tests. The changes are clearly security-motivated and address real escaping gaps.
What's Good
Centralized identifier quoting. The new quote_identifier() method eliminates a scattered mix of "\"$var\"", '"' . $var . '"', and '`' . $var . '`' patterns, replacing them all with a single correct implementation. This is exactly the right approach.
Proper normalize_column_name(). The old trim($column_name, ''"')was lossy — it would mangle an identifier like ``colname` by stripping outer backticks but leaving the escaped inner ones. The new version correctly strips matching quote pairs and unescapes doubled quote characters.
extract_bound_parameter() broadened to all strings. Removing the FLAG_STRING_SINGLE_QUOTES restriction means double-quoted strings (which MySQL treats as string literals in non-ANSI mode) now get parameterized instead of being inlined into the SQL. This directly addresses SQL injection vectors through double-quoted strings — the testDoubleQuotedStringWithBackslashEscapeDoesNotCauseInjection test demonstrates this well.
escape_like_to_glob() cleanup. The old version did trim($pattern, "'\"") then manually wrapped in single quotes — a classic escaping gap if the pattern contains a literal single quote. Using $this->pdo->quote() fixes this properly.
get_autoincrement_column() regex fix. Now handles both backtick-quoted and double-quote-quoted identifiers, with proper support for escaped internal quotes ( [^`]| ``+ instead of [^"]+). This is necessary since the PR itself changes DDL output to use backticks.
Tests are well-chosen. They cover the specific edge cases that motivate the changes: backtick-in-identifier, double-quoted string parameterization, injection via backslash-escaped double quotes, single quotes in LIKE BINARY patterns, etc.
Issues / Questions
1. extract_bound_parameter — double-quoted identifiers risk
By broadening from FLAG_STRING_SINGLE_QUOTES to TYPE_STRING, any double-quoted token classified as TYPE_STRING will now be extracted as a bound parameter. In MySQL, "foo" is a string literal (unless ANSI_QUOTES is enabled). But SQLite treats "foo" as an identifier.
The question: does the lexer ever emit double-quoted identifiers with TYPE_STRING? If the MySQL lexer classifies `table_name` as TYPE_SYMBOL but "table_name" as TYPE_STRING, then this change could accidentally parameterize identifiers in queries like:
SELECT * FROM "my_table"Could you confirm how the lexer classifies double-quoted identifiers vs. double-quoted string literals? The AS keyword guard only protects alias contexts.
2. Inconsistency: escape_like_to_glob receives unquoted value but old code expected quoted
The old code did trim($pattern, "'\"") to strip quotes, implying $pattern_token->value included surrounding quotes. The new code skips the trim and goes straight to str_replace + pdo->quote().
In translate_like_binary(), the pattern comes from $pattern_token->value. If the token value still includes surrounding quotes (e.g., 'it''s%'), then pdo->quote() will double-quote it, producing something like ''it''s%''. Could you verify that $pattern_token->value is unquoted at this point? The test testLikeBinaryWithSingleQuoteInFormat passes, which is encouraging, but I want to be sure the token's .value property is consistently unquoted.
3. execute_create_table — extra space before $if_not_exists
Line 943 in the diff:
'CREATE ' . $unique . 'INDEX ' . $if_not_exists . ' ' . $this->quote_identifier( $index_name ) . ...If $if_not_exists is empty string, this produces CREATE INDEX `` index_name `` ... with a double space. The old code had the same issue since $if_not_exists was interpolated inline, but since you're touching this line, it might be worth a trim or conditional space. Minor nit.
4. Missing normalize_column_name on execute_delete table name (line 1420/1434)
The PR adds normalize_column_name() calls when extracting table names in execute_delete():
$this->table_name = $this->normalize_column_name( $rewriter->skip()->value );This is good — but should the same treatment be applied in other places where $this->table_name is set from token values? A quick grep shows $this->table_name is assigned in several other methods. If those tokens can also be quoted, they'd need the same normalization. Though that may be out of scope for this PR.
5. Token type for quoted identifiers in rewriter
In several places, the PR changes token types when adding quoted identifiers:
// Old:
new WP_SQLite_Token( $this->table_name, WP_SQLite_Token::TYPE_KEYWORD, WP_SQLite_Token::FLAG_KEYWORD_RESERVED )
// New:
new WP_SQLite_Token( $this->quote_identifier( $this->table_name ), WP_SQLite_Token::TYPE_KEYWORD, WP_SQLite_Token::FLAG_KEYWORD_KEY )The flag change from FLAG_KEYWORD_RESERVED to FLAG_KEYWORD_KEY seems intentional and correct (these are identifiers, not reserved words). Just want to confirm the rewriter doesn't filter or treat tokens differently based on these flags.
6. $ids_to_delete still not parameterized
In execute_delete() around the diff's line 263-268:
"DELETE FROM {$quoted_table} WHERE {$quoted_pk} IN (" . implode( ',', $ids_to_delete ) . ')'The $ids_to_delete values come from a prior query result ($id['id_1']), so they're not user-controlled. But for defense-in-depth consistency with the rest of the PR's approach, these could also use parameterized binding. Not a blocker, just noting.
Overall Assessment
This is a solid, well-scoped security hardening PR. The quote_identifier() method and consistent PDO::quote() usage are clear improvements. The tests cover the key edge cases. The main thing I'd want clarified is item #1 (how the lexer classifies double-quoted tokens) since that determines whether broadening extract_bound_parameter is safe or could accidentally parameterize identifiers.
For consistency, all $this->table_name assignments now go through normalize_column_name(). While ->value is typically already unquoted by the lexer, this makes the code easier to reason about without needing to know that detail.
The DELETE path selects primary key values and then uses them in a DELETE ... WHERE pk IN (...) query. Previously, the values were interpolated directly via implode(). While the values come from the database (not user input), they could be non-integer types (e.g., VARCHAR primary keys), which would produce broken SQL. Use bound parameters instead.
|
Thanks for the thorough review, @brandonpayton! I went through all the comments and addressed them when needed. For the Claude-generated items: 1. 2. 3. Double space when 4. Missing 5. Token flags: These are newly built output tokens. I can confirm this has no effect on how the output SQL is built. 6. |
brandonpayton
left a comment
There was a problem hiding this comment.
Thanks for the updates and explanations, @JanJakes. This looks good to go.
Improve escaping consistency in the legacy SQLite translator:
quote_identifier()for all identifier interpolations (table names, column names, index names, trigger names) in SQL queries, token values, and DDL reconstruction output.PDO::quote()for all string literals processed by the translator.See added tests for more details.