-
Notifications
You must be signed in to change notification settings - Fork 49
Pad binary string literals to full bytes as required by SQLite HEX notation #255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR title is confusing because you first talk about hex literals and the change is about binary representation.
* to full bytes (SQLite requires HEX strings of even length). | ||
*/ | ||
$byte_count = (int) ceil( strlen( $value ) / 8 ); | ||
$hex = str_pad( $hex, $byte_count * 2, '0', STR_PAD_LEFT ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really necessary or is this a requirement of the notation that starts with x? Quick test run:
sqlite> select x'1';
Parse error: unrecognized token: "x'1'"
select x'1';
^--- error here
sqlite> select x'01';
sqlite> select x'02';
sqlite> select 0x1;
1
sqlite> select 0x01;
1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akirk Unfortunately, in SQLite the 0x1
notation is a numeric literal, while the x'01'
notation is binary (SELECT 0x01 = x'01';
is false
). From the original PR:
- In MySQL,
0x…
is a binary literal, while in SQLite it’s a numeric one. - In MySQL,
0b...
,b'...'
, andB'...'
syntaxes can be used to represent binary literals. SQLite only supports HEX literals (x'...'
).
|
||
// Verify correct padding (0b1 === 0b01 === 0b001 ... === 0x00000001). | ||
$result = $this->assertQuery( 'SELECT 0b1' ); | ||
$this->assertEquals( array( (object) array( '0b1' => pack( 'H*', '01' ) ) ), $result ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we spell out the tested string explicitly without relying on pack
? Or have two tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamziel It's hard for things like 0b1
, because that byte represents an invisible character. Converting strings like 00000001
to the actual byte value in PHP seems to be not so straightforward. I used pack
, maybe there's a better way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to express it would be char by char:
pack( 'H*', '01' ) === chr(1); // true
pack( 'H*', '0001' ) === chr(0) . chr(1) // true
...
Is that more readable? I don't know.
Maybe Unicode codepoints are better?
pack( 'H*', '01' ) === "\u{1}"; // true
pack( 'H*', '0001' ) === "\u{0}\u{1}"; // true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha! I suppose pack works
|
||
// Verify correct padding (0b1 === 0b01 === 0b001 ... === 0x00000001). | ||
$result = $this->assertQuery( 'SELECT 0b1' ); | ||
$this->assertEquals( array( (object) array( '0b1' => pack( 'H*', '01' ) ) ), $result ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we spell out the tested string explicitly without relying on pack
? Or have two tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking great!
@akirk Thanks! I improved it, hopefully. It's all a bit confusing, as one HEX notation is a binary string in MySQL and a number in SQLite ( |
This fixes the following error:
SQLSTATE[HY000]: General error: 1 unrecognized token: "x'1'"
For queries like the following:
SQLite doesn't have binary string literals, so we need to convert them to HEX string literals. However, HEX string literals in SQLite must be aligned to full bytes (their length needs to be even) —
x'01'
is valid, whilex'1'
is invalid.In MySQL, binary strings don't require such alignment, and
0b1
is valid (andSELECT 0b1 = 0b01
returnstrue
).Additionally, the
base_convert()
function used in the original implementation doesn't preserve existing leading0
padding.To solve both of these issues, this PR implements the following:
0
padding.For HEX strings, this fix is not needed, as those require to be byte-aligned also in MySQL, so throwing an error for
x'1'
is actually correct.