-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix that MysqliStatement cannot handle streams #3217
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mpdude thank you, the improvement makes sense. Please see my comments.
Besides that, would it make sense to test the case when the data size is actually longer than max_allowed_packet
? Otherwise, the test scenarios would work even without the fix.
} | ||
} | ||
} else { | ||
if (! $this->_stmt->send_long_data($paramNr, $resource)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use send_long_data()
for non-streams? We should bind them as usual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could change that like so: https://gist.github.com/mpdude/1ad1c03556d9fd4191396159b7929124
Basically, if you bind as ParameterType::LARGE_OBJECT
but do not pass in a stream, just treat it like ParameterType::STRING
. Might lose the advantage of sending the parameter in a network package by itself, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, valid point. Let's send strings via send_long_data()
too. Please rename $resource
to $value
then, since it's not always a resource and it's by design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might lose the advantage of sending the parameter in a network package by itself, though.
This did make sense since I thought one could send a string which exceeds max_allowed_packet
using send_long_data()
. Given it's not the case, what's the advantage of sending the parameter in a network package by itself? Isn't it better to just bind the string as is and let the driver decide how to send it?
'id' => 1, | ||
'clobfield' => 'ignored', | ||
'blobfield' => fopen('data://text/plain,test', 'r'), | ||
'binaryfield' => fopen('data://text/plain,test', 'r'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
binary
andblob
types mapBLOB
/BINARY
/VARBINARY
columns to PHP streams.
It's not exactly true (see 5465c6c). Only BLOB and CLOB (TEXT) fields should be populated from streams. It's fine if the statement binds streams to any columns it can, but we shouldn't be testing this for non-LOB fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need some more explications here 🤔
Please have a look at the current test on 2.7:
clobfield
is always used withParameterType::STRING
. It does not return a stream onSELECT
, so I also did not expect streams to be bound here (could change that).blobfield
is aParameterType::LARGE_OBJECT
, should process streams when binding data and return streams fromSELECT
(as-is right now).binaryfield
is always treated likeblobfield
, including the assertions that streams are returned. Only difference is that its database type isbinary
.
Your remarks would make sense to me if clobfield
were treated as ParameterType::LARGE_OBJECT
and binaryfield
were ParameterType::STRING
, because then stream processing (on read as well as write) would only happen for the LARGE_OBJECT
type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your remark regarding changed behaviour is true for 3.x, but this is against 2.7.
{ | ||
$this->_conn->insert('blob_table', [ | ||
'id' => 1, | ||
'clobfield' => 'ignored', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~~~Of course we could do, but then I don't really get the purpose of the parameter types... On `SELECT`, only the `LARGE_OBJECT` type will return a stream, whereas `STRING` is a... string.~~~
See my remark in the next comment please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you should be able to bind a bind a character stream to a CLOB field. LARGE_OBJECT
doesn't imply BLOB, it implies any LOB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On
SELECT
, only theLARGE_OBJECT
type will return a stream, whereasSTRING
is a... string.
I think this is an issue we should fix in 3.0. Regardless of whether it's TEXT or BLOB, it should be returned as a stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test covers it now?
$this->assertBinaryContains('test'); | ||
} | ||
|
||
public function testInsertCanHandleStreamLongerThanChunkSize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't one test be enough to test a large-ish value? If a large value can be stored, a small one can too.
Thanks for the review!
I did not intend to work around the Documentation for the underlying What
No, they don't:
|
@mpdude, all above is correct. What I meant by "the test scenarios would work even without the fix", is that currently, it is already possible to store strings like "test" or 40K times "x" using the currently used MySQL API but it's not possible to store strings longer than 16M. If we don't cover this case with a test, then instead of using |
@@ -42,9 +46,7 @@ class MysqliStatement implements \IteratorAggregate, Statement | |||
ParameterType::BOOLEAN => 'i', | |||
ParameterType::NULL => 's', | |||
ParameterType::INTEGER => 'i', | |||
|
|||
// TODO Support LOB bigger then max package size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mpdude the fact that this is done is not covered by tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, maybe I should not have removed that comment. But as I explained above, there is no way in MySQL (by design) to bypass the max package size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mpdude isn't using send_long_data()
the way to store the data longer than the max package size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I tried to explain that in the comments and discussion. You cannot overcome the max_allowed_package
limitation.
send_long_data()
will – for each invocation – send out a chunk of data immediately to the MySQL server. So in the first place, it saves you from having to buffer one big query on the client side (memory allocation) before sending that out. We can read streams in small steps and send data to the server, never keeping everything at once in memory.
The MySQL server will still enforce the max_allowed_packet
limitation and reject queries exceeding it. That happens even if you have sent more data than allowed in small chunks.
It's not obvious, but the documentation and also the bug report I linked above shortly mention it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mpdude I'm sorry, I missed the part of #3217 (comment) where you mentioned the MySQL bug. Now I see why we don't cover the case where the string exceeds max_allowed_packet
. Thank you for your patience.
Let's not remove the comment in this pull request but remove it in another where we explain the reason in proper details. Otherwise, it may look like it's resolved by this ticket and this is one of the things which confused me.
I think there is no way we can exceed the What this PR actually does fix is that passing streams together with the
As I said above, we cannot process a value larger than But you are right that instead of using Any idea what a test might look like that guards against this? |
Why not? $conn->insert('blob_table', [
'id' => 1,
'blobfield' => str_repeat('x', 0x1000001),
]); Results in:
|
@morozov I think I've got all your comments covered and fixed. Regarding the |
@@ -42,9 +46,7 @@ class MysqliStatement implements \IteratorAggregate, Statement | |||
ParameterType::BOOLEAN => 'i', | |||
ParameterType::NULL => 's', | |||
ParameterType::INTEGER => 'i', | |||
|
|||
// TODO Support LOB bigger then max package size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mpdude I'm sorry, I missed the part of #3217 (comment) where you mentioned the MySQL bug. Now I see why we don't cover the case where the string exceeds max_allowed_packet
. Thank you for your patience.
Let's not remove the comment in this pull request but remove it in another where we explain the reason in proper details. Otherwise, it may look like it's resolved by this ticket and this is one of the things which confused me.
@mpdude please retarget this against |
c71d64c
to
91516bf
Compare
@morozov This is now based on master and I hope I did not miss any of your comments. One open question remains regarding the use of underscore-prefixed names for non-public methods and fields. I just followed the way the class was written before, so...? The other is about the chunk size of 8 vs 64 KB, see #3217 (comment). |
Is there anything I can do here to make the builds pass? |
As discussed in doctrine#3217 (see doctrine#3217 (comment) in particular), there is no way in MySQL to get around the `max_allowed_packet` limitation.
You can follow #3217 (comment). DB2 and Oracle capitalize column names. Given that you only fetch one column, you don't need to refer to it by name, just use its value. |
Please remove them. The Doctrine coding standard is based on PSR-2 which in turn doesn't allow using underscores.
Answered above. |
Regarding the failing builds I meant that appveyor and continuousphp seem to fail almost immediately and I don't understand what's wrong. |
27014da
to
beaaa79
Compare
Also rebased against master since #3285 has been merged. |
DB2 was like |
Right now, AppVeyor passes, and ContinuousPHP has failures in |
Okay, for the record, our CI ignores method name violations because they are reported as warnings, and it ignores warnings (I'll file a separate ticket to track this): Line 10 in 41f437a
This is what the report would look like w/o suppressing warnings prior to the moment when the names were fixed:
|
Update: See #3288. |
@morozov what would be the right |
It should be |
Would that also pass static code analysis? I cannot really make sense of https://travis-ci.org/doctrine/dbal/jobs/433673047#L543. |
It should. You can run |
.gitignore
Outdated
@@ -8,3 +8,4 @@ vendor/ | |||
/phpunit.xml | |||
/.phpcs-cache | |||
/phpstan.neon | |||
/.idea |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this to your own .gitignore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry
The blob type maps BLOB (and also TEXT) columns to PHP streams. Internally, they use the ParameterType::LARGE_OBJECT (i. e. \PDO::PARAM_LOB) binding type, which suggests that efficient handling of PHP stream resources was intended. However, at least when using the mysqli driver, stream resources passed into insert() or update() are simply cast to strings. As a result, a literal string like "Resource id doctrine#126" will end up in the database. This PR fixes the issue by correctly processing streams in the MysqliStatement when they are passed with the ParameterType::LARGE_OBJECT binding type. It uses the mysqli::send_long_data() method to pass stream data in chunks to the MySQL server, thus keeping the memory footprint low. This method does not (despite claims to the contrary) allow to bypass the max_allowed_package size! The pdo_mysql driver was already capable of handling streams this way. Now this is covered by tests. Helpful documentation: - http://php.net/manual/en/mysqli-stmt.send-long-data.php - http://php.net/manual/en/mysqli-stmt.bind-param.php - see first "Note" - http://php.net/manual/en/pdo.lobs.php - https://blogs.oracle.com/oswald/phps-mysqli-extension:-storing-and-retrieving-blobs Additional notes on MySQL's max_allowed_packet: This change does not not intend to work around the max_allowed_packet setting, and quick tests show that this is not possible: When MySQL is configured to use a low max_allowed_packet value, an error will be triggered stating Parameter of prepared statement which is set through mysql_send_long_data() is longer than 'max_allowed_packet' bytes. Documentation for the underlying mysql_stmt_send_long_data() C API function suggests that max_allowed_packet is always a hard limit. References: - https://dev.mysql.com/doc/refman/8.0/en/mysql-stmt-send-long-data.html - https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_max_allowed_packet - https://bugs.mysql.com/bug.php?id=83958 What mysqli::send_long_data() seems to do is that every data chunk of data passed to it is immediately sent out to the network. I have confirmed this using tcpdump, and so the advantage might be that we can keep the memory footprint low on the PHP side while processing streams.
33cf105
to
72dcd04
Compare
Squashed and 🤞🏻also for CS builds. We should get this finalized as our (comments/LOC changed) ratio approaches 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍 Unless @kimhemsoe has objections, I'll merge it later today.
Looks good to me, great work. About the spelling mistakes in variables.. those are mine :D Hope to fix them in dbal 3 .. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Awesome work @mpdude! |
Please don’t forget to merge this back into the lowest possible 2.x release. |
@mpdude this is targeting 2.9, as it is an improvement |
Ok, not going to argue about your definitions ;-) Just wanted to mention it because the PR is against master and not a 2.x branch. |
We expect all pull requests to target |
Release v2.9.0 This is a minor release of Doctrine DBAL that aggregates over 40 fixes and improvements developed by 18 contributors over the last 5 months. This release includes all changes of the 2.8.x series, as well as feature additions and improvements that couldn't land in patch releases. ## Backwards Compatibility Breaks This doesn't contain any intentional Backwards Compatibility (BC) breaks. ## Deprecations * The usage of `NULL` to specify the absence of an offset in `LIMIT`ed queries is deprecated. Use `0` instead. * It's not recommended to rely on the default length specified by implementations of `Type`. These values are not used by the library and will be removed. * It's not recommended to rely on the string representation of `Type` objects. * Regular-expression based asset filters are deprecated in favor of callback-based as more extensible. * Calling `Statement::fetchColumn()` with an invalid column index is deprecated. * The `dbal:import` CLI command is deprecated. Please use other database client applications for import. Please see details in the [UPGRADE.md](UPGRADE.md) documentation. ## New Features * Added support for MariaDB 10.3. * Added support for Windows authentication for SQL Server. * Added support for column length in index definitions on MySQL. ## Improvements and Fixes * Implemented handling BLOB objects represented as streams in the MySQL (`mysqli`) driver. * Implemented handling BLOB objects represented as streams in the IDM DB2 driver. * DBAL is now continuously tested with the PDO driver for Oracle. * Implemented handling of URLs in master-slave and pooling-shard connection configuration. * The codebase is now fully compatible with the Doctrine Coding Standard v5.0. Total issues resolved: **45** **Deprecations:** - [3244: Deprecated dbal:import CLI command](doctrine#3244) thanks to @morozov - [3253: Deprecated usage of the NULL offset in LIMITed queries](doctrine#3253) thanks to @morozov - [3256: Deprecate Doctrine\DBAL\Types\Type::getDefaultLength()](doctrine#3256) thanks to @Majkl578 - [3258: Deprecate Doctrine\DBAL\Types\Type::&doctrine#95;&doctrine#95;toString()](doctrine#3258) thanks to @Majkl578 - [3316: Deprecated regex-based asset filters](doctrine#3316) thanks to @morozov - [3359: Removed DataAccessTest::testFetchColumnNonExistingIndex() since it covers a bug in PDO](doctrine#3359) thanks to @morozov **New Features:** - [2412: Add mysql specific indexes with lengths](doctrine#2412) thanks to @bburnichon - [3278: Add support for MariaDB 10.3](doctrine#3278) thanks to @javiereguiluz - [3283: MariaDB improvements, support 10.3](doctrine#3283) thanks to @sidz - [3333: Allow windows (userless/passwordless) authentication for SQL Server](doctrine#3333) thanks to @odinsey **Bug Fixes:** - [3355: Implemented comparison of default values as strings regardless of their PHP types](doctrine#3355) thanks to @morozov and @Majkl578 **Improvements:** - [3201: Fix support for URL to account for master-slave and pooling-shard connections](doctrine#3201) thanks to @stof - [3217: Fix that MysqliStatement cannot handle streams](doctrine#3217) thanks to @mpdude - [3235: Use PSR-4 autoloader](doctrine#3235) thanks to @Majkl578 - [3254: Throw ConversionException when unserialization fail for array and object types](doctrine#3254) thanks to @seferov - [3259: Update export ignores](doctrine#3259) thanks to @Majkl578 - [3309: Implemented handling BLOBs represented as stream resources for IBM DB2](doctrine#3309) thanks to @morozov and @mpdude - [3331: Fetch all should use the driver statement's fetchAll method](doctrine#3331) thanks to @michaelcullum **Documentation Improvements:** - [3223: GitHub template grammar/spelling fixes](doctrine#3223) thanks to @GawainLynch - [3232: Removed NOW() from QueryBuilder usage examples](doctrine#3232) thanks to @morozov - [3239: 2.8 in README & branch alias to 2.9](doctrine#3239) thanks to @Majkl578 - [3269: Fixed type hints in DockBlocks](doctrine#3269) thanks to @marforon - [3275: Add .doctrine-project.json to root of the project.](doctrine#3275) thanks to @jwage - [3276: Update homepage](doctrine#3276) thanks to @Majkl578 - [3280: Use behaviuor instead of behavior](doctrine#3280) thanks to @BackEndTea - [3285: Remove old comment from MysqliStatement](doctrine#3285) thanks to @mpdude - [3318: Removed link to www.doctrine-project.org from doc blocks](doctrine#3318) thanks to @morozov - [3319: remove ClassLoader](doctrine#3319) thanks to @garak - [3337: Fix of links in documentation](doctrine#3337) thanks to @SenseException - [3350: Remove pdo&doctrine#95;sqlsrv from known vendor issues list](doctrine#3350) thanks to @ostrolucky - [3357: Fix typo](doctrine#3357) thanks to @BenMorel - [3370: Removed 2.7 from README](doctrine#3370) thanks to @morozov **Code Quality Improvements:** - [3252: Replaced call&doctrine#95;user&doctrine#95;func&doctrine#95;array() of a fixed method with the usage of variadic arguments](doctrine#3252) thanks to @morozov - [3306: Fixed coding standard violations in the codebase](doctrine#3306) thanks to @morozov - [3303: Updated doctrine/coding-standard to 5.0, ](doctrine#3303) thanks to @morozov - [3317: Implemented proper escaping of string literals in platforms and schema managers](doctrine#3317) thanks to @morozov - [3363: Remove redundant implements](doctrine#3363) thanks to @BenMorel **Continuous Integration Improvements:** - [3307: Test against the latest stable sqlsrv extension](doctrine#3307) thanks to @morozov - [3320: Trying to fix failing DB builds](doctrine#3320) thanks to @morozov - [3325: Updated PHPUnit to 7.4](doctrine#3325) thanks to @morozov - [3339: ContinuousPHP configuration for PDO Oracle driver](doctrine#3339) thanks to @morozov - [3365: Reorganize Travis build matrix](doctrine#3365) thanks to @BenMorel # gpg: Signature made Tue Dec 4 05:44:06 2018 # gpg: using RSA key 374EADAF543AE995 # gpg: Can't check signature: public key not found # Conflicts: # README.md # lib/Doctrine/DBAL/Driver/AbstractOracleDriver.php # lib/Doctrine/DBAL/Version.php
The
binary
andblob
types mapBLOB
/BINARY
/VARBINARY
columns to PHP streams. Internally, they use theParameterType::LARGE_OBJECT
(i. e.\PDO::PARAM_LOB
) binding type, which suggests that efficient handling of PHP stream resources was intended.However, at least when using the
mysqli
driver, stream resources passed intoinsert()
orupdate()
are simply cast to strings. As a result, a literal string like"Resource id #126"
will end up in the database.This PR fixes the issue by correctly processing streams in the
MysqliStatement
when they are passed with theParameterType::LARGE_OBJECT
binding type. It uses themysqli::send_long_data()
method to pass stream data in chunks to the MySQL server, thus keeping the memory footprint low. This method does not (despite claims to the contrary) allow to bypass themax_allowed_package
size.Update: The
pdo_mysql
driver was already capable of handling streams this way. Now this is covered by tests.Helpful documentation