Skip to content

Conversation

JanJakes
Copy link
Member

@JanJakes JanJakes commented Oct 1, 2025

This fixes the following error:

SQLSTATE[HY000]: General error: 1 unrecognized token: "x'1'"

For queries like the following:

SELECT 0b1;
SELECT 0b01;
SELECT 0b001;
...

SQLite doesn't have binary string literals, so we need to convert them to HEX string literals. However, HEX string literals in SQLite must be aligned to full bytes (their length needs to be even) — x'01' is valid, while x'1' is invalid.

In MySQL, binary strings don't require such alignment, and 0b1 is valid (and SELECT 0b1 = 0b01 returns true).

Additionally, the base_convert() function used in the original implementation doesn't preserve existing leading 0 padding.

To solve both of these issues, this PR implements the following:

  1. Count how many full bytes the original binary string requires.
  2. Make sure the resulting HEX string has that number of bytes, using leading 0 padding.

For HEX strings, this fix is not needed, as those require to be byte-aligned also in MySQL, so throwing an error for x'1' is actually correct.

@JanJakes JanJakes requested a review from adamziel October 1, 2025 09:10
Copy link
Member

@akirk akirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR title is confusing because you first talk about hex literals and the change is about binary representation.

* to full bytes (SQLite requires HEX strings of even length).
*/
$byte_count = (int) ceil( strlen( $value ) / 8 );
$hex = str_pad( $hex, $byte_count * 2, '0', STR_PAD_LEFT );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary or is this a requirement of the notation that starts with x? Quick test run:

sqlite> select x'1';
Parse error: unrecognized token: "x'1'"
  select x'1';
         ^--- error here
sqlite> select x'01';

sqlite> select x'02';

sqlite> select 0x1;
1
sqlite> select 0x01;
1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akirk Unfortunately, in SQLite the 0x1 notation is a numeric literal, while the x'01' notation is binary (SELECT 0x01 = x'01'; is false). From the original PR:

  1. In MySQL, 0x… is a binary literal, while in SQLite it’s a numeric one.
  2. In MySQL, 0b..., b'...', and B'...' syntaxes can be used to represent binary literals. SQLite only supports HEX literals (x'...').


// Verify correct padding (0b1 === 0b01 === 0b001 ... === 0x00000001).
$result = $this->assertQuery( 'SELECT 0b1' );
$this->assertEquals( array( (object) array( '0b1' => pack( 'H*', '01' ) ) ), $result );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we spell out the tested string explicitly without relying on pack? Or have two tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel It's hard for things like 0b1, because that byte represents an invisible character. Converting strings like 00000001 to the actual byte value in PHP seems to be not so straightforward. I used pack, maybe there's a better way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to express it would be char by char:

pack( 'H*', '01' ) === chr(1); // true
pack( 'H*', '0001' ) === chr(0) . chr(1) // true
...

Is that more readable? I don't know.

Maybe Unicode codepoints are better?

pack( 'H*', '01' ) === "\u{1}"; // true
pack( 'H*', '0001' ) === "\u{0}\u{1}"; // true

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! I suppose pack works


// Verify correct padding (0b1 === 0b01 === 0b001 ... === 0x00000001).
$result = $this->assertQuery( 'SELECT 0b1' );
$this->assertEquals( array( (object) array( '0b1' => pack( 'H*', '01' ) ) ), $result );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we spell out the tested string explicitly without relying on pack? Or have two tests?

Copy link
Collaborator

@adamziel adamziel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking great!

@JanJakes JanJakes changed the title Pad HEX literals to full bytes as required in SQLite Pad binary string literals to full bytes as required by SQLite HEX notation Oct 1, 2025
@JanJakes
Copy link
Member Author

JanJakes commented Oct 1, 2025

I think the PR title is confusing because you first talk about hex literals and the change is about binary representation.

@akirk Thanks! I improved it, hopefully. It's all a bit confusing, as one HEX notation is a binary string in MySQL and a number in SQLite (0x...), but another notation is a binary string in both (x'...').

@JanJakes JanJakes merged commit 488b8db into develop Oct 2, 2025
14 checks passed
@JanJakes JanJakes deleted the binary-padding branch October 2, 2025 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants