-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4.042 improperly encoding blobs when sql_type is SQL_UNKNOWN_TYPE #117
Comments
The whole I'm not sure what we can done here. Basically applications which trying to use experimental broken option provided by DBD::mysql < 4.042 and just broken. Due to some fundamentals of perl scalars and also MySQL protocol itself, DBD::mysql does not type of bind parameter, so driver can use logic which you already described. When mysql_enable_utf8 is 0, then nothing is encoded/decoded and so treated as binary sequence of octets. |
In other words: DBD::mysql does not know that a parameter is being used for insert into a blob column. Since you set mysql_enable_utf8 it will encode it, unless bound with the SQL_BLOB or similar type. On retrieval it knows that the column is a binary type, so it does not decode it. The behavior you describe only worked accidentally before. |
I used it for 10 years without problems. I’m sure I would have had problems if I had tried things where the incongruity of its design got exposed, but I never needed to. So my code had no bugs – even though DBD::mysql was designed incorrectly. Worse: http://grep.cpan.me/?q=mysql_enable_utf8+file%3A.pm There’s plenty of infrastructure-level code just on CPAN alone which sets this flag on behalf of the user. In the face of these realities, the fact that the documentation declared the feature experimental back when it was introduced, 11 years ago, is simply irrelevant today. But now |
So... what do you suggest? As wrote I have no idea what can be done here. Probably the best solution would be to fix broken code. And about |
Of course many of these CPAN modules will produce broken results when used in any somewhat unusual configuration where the broken design actually matters. But a lot of people don’t try that, and for them it ends up not mattering that the design is not sound. For these people, there is no gain from fixing the broken design, it just creates busywork. I’m not sure what I would suggest, specifically. The general idea is always the same though: don’t change how How exactly you do that depends on the code and what you need to achieve long term. New flag e.g. (Of course when it comes to the docs, the old approach should be de-emphasised, and the new way should be shown as the obvious/standard way of doing things, to steer new users in that direction. Also, to get people away from the old way it is important to make sure that the new way doesn’t make too many things harder to do than the old way – esp. commonly needed things. Note however the old way must stay documented – because there is still code that uses it, and people who have to maintain that code need to be able to figure out what it does.) Something along these lines. |
Problem with old If users (and authors of other modules) are not going to fix broken code and need current behavior they should stay at specific version of perl and all cpan modules. Otherwise it is not guaranteed that also old DBD::mysql version (prior to 4.042) provide same results. It can lead to results: after updating cpan module XYZ to new version MySQL server in user application started returning wrong result, where XYZ can be module which has nothing to do with MySQL. Supporting such pseudo-random behavior is not easy and funny. And I really suggest to conserve such fragile code and disallow updating code around (which can fully break it). What is probably more important: how to write I created this pull request which adds "hacks" into documentation: #119 Please comment, test or provide another suggestion for it. To improve situation. |
Any CPAN module? How? |
If a CPAN module provides a string with a different internal representation, even if it's the same string, previous versions of DBD::mysql will treat it differently. We had this issue with output from Spreadsheet::ParseExcel that returned (perfectly normal) un-upgraded strings with byte values between 128 and 255, probably due to some dependency, causing errors when used as params to DBD::mysql with mysql_enable_utf8 set, as the internal representation it used was not UTF-8 encoded. Any module that provides you with a string could change what representation it gives you and break DBD::mysql while the string value is unchanged. A more complete discussion of the issue is in https://rt.cpan.org/Ticket/Display.html?id=87428 |
@Grinnz is right. And moreover un-upgraded strings are created by default from string constants if you do not So please look at pull request #119 if it helps you or not... |
Yeah, but that isn’t DBD::mysql changing its behaviour. That’s some other module changing its behaviour. If the user of both modules compensates for the change in the other module, DBD::mysql will get the same inputs as before, and will behave the same way as before. The behaviour of DBD::mysql is not changing, just because of an upgrade to some other module. It’s perfectly predictable what it does, even if it’s not correct. “When the inputs change, the behaviour changes; when the inputs are the same, the behaviour is the same” is the opposite of random.
Funny, there’s an almost verbatim precursor of my suggestion (but better thought out) already in there. As far as I can tell, doing it that would have worked just fine… with no breakage. Taking that approach would have given you what you wanted, without causing anyone else any pain. So why choose to cause pain when you don’t even gain anything from it? @pali: So…
… if this meant what @Grinnz was talking about, then I disagree that that behaviour is random (even just pseudo), I disagree that it’s hard to support solely on that basis, and I disagree about what’s not funny here (namely, changing it).
That is indeed useful information in order to cope with the problems of the old interface. But I don’t see anything in there which would argue against adding the correct interface under a different flag – rather than changing Has there been any technical argument for not using a different flag? So far, all I have seen are arguments about the correctness of not using the UTF8 flag – which I do not disagree with at all. |
I cannot agree here with you, because calling utf8::upgrade() or composing string via What is changing here is low level representation, and you can compare it compiling program with different optimization flags, e.g. -O0 or -O2 in gcc. In both cases you get same program, just compiled with different assembler instructions. And in perl usage of utf8::upgrade() or Note that perl operators like I wrote it is pseudo-random (not fully-random), which means it is deterministic. So I could agree with you that behavior can be predicted. Originally I talk with DBD::mysql maintainers (half a year ago?) and wrote first proposed solution that introduction of new attribute for utf8 with hacking code around with supporting both of them can be possible. But IIRC we dismiss it due to introduction of big mess in code and reason that current behaviour does not make sense in perl. Due to above fact and also that calling utf8::upgrade() is legal and already done by perl itself at lot of places and also that in documentation was explicitely written "experimental and may change" state, I introduced patches which fixes |
Please reconsider this stance. In as much as the new behavior is "correct", the old behavior was widely deployed and has a tremendous amount of inertia. Migrating to the new behavior will require coordination and substantial effort. We (Request Tracker) are willing to undergo that effort, but we do need to prepare for it. Given that we are in the middle of a stable series and DBD-mysql 4.042 is getting traction, we are unable to do such a migration right now. Our only recourse right now is to refuse to install under 4.042, forcing users to install 4.041 by hand. |
What we learnt from this is that developers do not read documentation nor care about states like "This option is experimental and may change in future versions.". Now I'm thinking how could be experimental features which are not expected to work correctly or which can be removed/changed at any time introduced into modules... |
How seriously can you expect them to take that after the feature has been “experimental” without changes in behaviour for 10 years? |
Furthermore see this commit from a year ago and especially its message: 026fa02 |
Yeah, I removed the 'this is experimental' message last year because I knew lots of software depended at that point on this switch because it was the only way to get some form of sane behaviour from DBD::mysql - which had lots of other issues, but also mostly worked for most people. I'm really not so happy with the situation that emerged. I though we fixed lots of smaller issues but I also thought we had enough unit test coverage to not break existing software using DBD::mysql for people. We explicitly sollicited feedback after posting development releases to CPAN by making blog posts on b.p.o, sending out mails to mailing lists, and of course we only got actual feedback (like this issue) after we released the stable. @CaptTofu please advise! I think we need to look at doing what the original code did and do whatever worked for most users. |
It doesn’t seem too late to add a |
Because there were open bugs that it does not work for 10 years? |
What software doesn’t have “some open bugs”? I do not believe you hold even yourself to that standard. If that was the bar for what software is safe to use… what would be left? |
Hi, I'm an end user that would like to share with you the impact of this change. Manual text:
However, the following code inserts corrupt binary data: use DBI qw(:sql_types); open FILE, "test.bin"; $dbh = DBI->connect($dsn,$user,$password,{mysql_enable_utf8 => 1}); This change also make it impossible to use the following one-liner as it requires the data type of all placeholders to be specified: |
This parameter is not bound as binary type.
This one is.
It is limitation of MySQL. If |
I understand the underlying motivation to make this change. You are right that is a bug that needs to be fixed. |
@pali turning on mysql_enable_utf8 has been the least broken option for handling unicode with DBD::mysql for over a decade. It's been basically standard for e.g. any DBIx::Class app that uses unicode w/mysql for most of that time. The only thing you can do here that won't cause massive corruption of people's production data is to keep it behaving the old way, even if the old way is stupid. You can add a warning that the old way is stupid and that people should consider migrating to the new way, but the old code must stay bugwards compatible or you're going to trash people's production data. This is the sort of backcompat problem that most sucks to have as a maintainer, and I do very much feel for you, but there really ain't any alternative here. Sorry. |
Dear @pali Are you the maintainer of this module? What do you think about the below proposal together with mentioning in the documentation that a value 1 is not sufficient as it lacks encoding of the input statement+parameters but that it remains there for backward compatibility, and value 2 is preferred for correct UTF-8 handling in combination with the note that specifying the datatype of each placeholder value is mandatory for this setting? Working of 4.041:
Working of 4.042:
Proposal for 4.043:
|
@nightlight1 unfortunately it is not that simple. In the old behavior, the decision whether input parameters were encoded or not is based on the internal state of the perl string, not a clear user visible reason. This is part of the bug that was fixed. Also, retrieved columns were decoded in different circumstances as well. Regardless, you are on the right track, the backwards compatible fix to this would be to restore the behavior of mysql_enable_utf8 and mysql_enable_utf8mb4, and add a new option or option value for the correct behavior. |
@Grinnz And that is still not enough! Even without
You put With new DBD::mysql you get correct So you cannot support both old buggy behavior and also new perl-correct. I have no idea what else can be done there. Trying to support old buggy behavior would mean to introduce either new bugs or break non-UTF-8 support at all. |
Then it sounds like the bugfix with mysql_enable_utf8 disabled has not caused any problems so far, and that can remain. It is only the behavior with mysql_enable_utf8/mb4 changing that has caused the issues in this thread. Would it be possible to restore the old behavior with those switches, and add a new switch or value for the fixed UTF-8 behavior? |
Every day this current version is distributed it causes database corruption in production environments (every INSERT/UPDATE statement involving BLOB's causes corruption). |
It is not possible to avoid this "side effect" or "fix" it to recognize blob inputs, for reasons that were discussed earlier in this thread. The old way only accidentally worked for this use case and cannot be relied upon for a real fix. The only way to make it both correct for new code and bugwards compatible for old code is to put the correct behavior on a separate option. |
Absolutely. And this is what I want to propose at this point and actually
put out there soon.
…--
Michiel
Op wo 28 jun. 2017 om 19:06 schreef Dan Book <notifications@github.com>
It is not possible to avoid this "side effect" or "fix" it to recognize
blob inputs, for reasons that were discussed earlier in this thread. The
old way only accidentally worked for this use case and cannot be relied
upon for a real fix. The only way to make it both correct for new code and
bugwards compatible for old code is to put the correct behavior on a
separate option.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#117 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAoQMBFUQPh0WCRKycrFwcgzDzU7hMsYks5sIogdgaJpZM4M2DZu>
.
|
@nightlight1 you can disagree... but if something is not possible to achieve and fully implement then you would need to take it as is. The only option which I know is freeze of whole module. |
Which is why the new ("current") code needs to be introduced in a different way than it has been done for 4.0.42. For which it first needs to be backed out and then probably reintroduced in a different manner. |
Well, @shadowcat-mst was not the only one to get that impression ... |
I also read the wording, but I also noticed the many exclamation marks that were added to the manual page of this module. It convinced me that if a person uses these in neutral sentences it may just be his/her common way of expressing and not necessarily meant to be offensive. |
So, @CaptTofu and I just released version https://metacpan.org/source/MICHIELB/DBD-mysql-4.043/Changes 4.043 which is the same as 4.041 but with a higher version number. In the next period, we'll be going to add back fixes that were added after 4.042 and put that out as a development release. We're explicitly going to fix bugs but NOT wanting to break things. If we're going to make changes to how unicode is handled, and if it's going to (also) require changes to existing code, it will be only after users explicitly opting in. We don't want people upgrading their perls or doing an OS upgrade and finding they get data corruption. The one good thing is there are now lots of eyeballs here. Of course we're going to be extra careful now, but some of you testing upcoming development releases would be more than welcome. |
Thank you. I hope that a new version/option of the corrected unicode handling will be released quickly however. Until then I will not be able to upgrade our DBD::mysql from 4.042 as I do not want to go back to the broken behavior. |
Of course, if you want you're free to pin to 4.042 if that works for you.
Op do 29 jun. 2017 om 23:23 schreef Dan Book <notifications@github.com>
… Thank you. I hope that a new version/option of the corrected unicode
handling will be released quickly however. Until then I will not be able to
upgrade our DBD::mysql from 4.042 as I do not want to go back to the broken
behavior.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#117 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAoQMAnKz66NXnKA9UjBhIPoT4H9_zlEks5sJBW_gaJpZM4M2DZu>
.
|
@mbeijen , @CaptTofu From what I understand, existing end-user code will now work again.
I'd be more than happy to help with testing. Where's the best point to be looking (list/blog to be tracking etc.) for changes related to this issue? |
@markuswernig typically we post at dbi-announce (http://lists.perl.org/list/dbi-announce.html) which I'll do today. |
Does it mean that DBD::mysql is now freezed forever? As i have no idea how you want to do future development & still provide that buggy backward compatibility. |
Btw, now look what happened and what you did:
Ask question yourself: Based on above facts, who can be interested in use of DBD::mysql which does not work as expected in perl? New developers? Sure not. Active developers? Not sure, why anymore? Now I asked for two CVEs for DBD::mysql. We would see who would start fixing security issues based on above fact that buggy behavior cannot be fixed. And basically everything just because two, tree, four, five (?) people are usable to report bugs to broken application and just started to want have old, broken, non-working and unmaintainable behavior of mysql_enable_utf8 in DBD::mysql driver. You now killed DBD::mysql. Like @Grinnz said, I already know more people who would need to pin DBD::mysql to 4.042, just because they started to use utf8 and unicode strings correctly, like in whole perl word. all: Instead of asking for help how you fix your or someone's else buggy application (which I or any other people could help), you rather forced maintainers to go with DBD::mysql to hell. |
You're looking at it wrong. This broke so much code that representatives from 3 different companies had to tell you you'd broken their code. You have to always see your feedback as the tip of an iceberg, because it is. And there are probably many more companies out there currently corrupting their data, and won't know about it because they're not checking yet. Only the very top echelon of developers involve themselves in opensource enough that they actually communicate with opensource. The vast majority are silent consumers When you're dealing with software that's purpose in life is to not corrupt data, and have data there tomorrow, you go out of your way not to break that promise. There's no point in being involved in this sector if you don't care to deliver on that promise. |
Companies which dealing with such data are running on verified infrastructure and every upgrade of core system is tested and reviewed. And in most common scenario they are running some enterprise distribution (like RHEL or their clones) where are versions already freezed. Nobody in companies is updating random software from random internet places directly on production without any testing. |
No, only the best, most careful, and most expensive of such companies do. You'd be disturbed how common bad practices are, and practices that exist purely out of lack of time or funding. Heaven help anyone who dares to make a small business and not really understand their technology.
There's a saying, and its truer than you want to believe: Everyone has a testing rig. |
@kentfredric if you silently introduce the changes of course data will be corrupted! When introducing breaking changes you should deprecate and make sure that old code do not even run. Then developers can choose to stay on an old version or upgrade. |
@Row - that's exactly what we're all saying in this thread and it's what the very first sentence of the issue report says:
A regression with unexpected side effects is by definition a silent introduction of changes, and we're saying the approach taken (that has since been reverted) needs to be rethought. |
Indeed, its not so much about "no change", its about creating changes such that they avoid breaking code. And even this is not a "death to DBD::mysql", because there are multiple ways of getting the desired result from different syntax. There's lots of ways to avoid breaking backwards compatibility. The most unfortunate thing here is this step wasn't made before 4.042 was released. "Opt in to change[1]" is much more reliable than "make sure upstream didn't break your code every release". The latter makes sense only if you assume upstream don't care about your data, which is a bad default assumption to need for a Database Module. 1: Keeping in mind, "upgrading" and "opting in" are very different things in practice. |
Not intending to hijack this issue, but I've run into exactly the same type of problem recently: In this case something in core's behaviour silently changed. |
You keep saying these, and they keep not being true. I ran MySQL in production with “broken” DBD::mysql versions for >10 years, and the content in the database was never wrong. How is that not working as expected? Upgrading to your “fixes” would have made the content in my database wrong. How is that working as expected? I was glad that the place I work at switched away from MySQL just before your fixes landed in DBD::mysql. You sure did not increase my interest in DBD::mysql… sorry to say that. There is a difference between a bug in the implementation and a bug in the design of an interface. In this particular case, there was a design bug, not an implementation bug. The old interface took an implementation detail (the UTF8 flag in scalars) and made that part of is interface semantics. That’s a bad choice. But it’s an interface design choice. It’s not an implementation bug. And it can be worked with perfectly reliably if you understand the interface. You and several others wrote repeatedly that the old interface could be broken by upgrades of unrelated modules, and that you had no idea how to write code against such an interface. First of all, that’s only a “could” – there are plenty of cases where that wouldn’t happen. But even then, you absolutely can write robust code against an interface that relies on the UTF8 flag. I hope you have heard of What you did was to break the contract of a long-standing interface, because its design is bad. The resulting interface design may be better, no dispute – but that’s not a bugfix. It’s interface breakage. You want a better interface? (Yes, it’s a better interface.) Add it. Don’t break one that already has tons of users. |
@pali I know this situation is frustrating. I think I can speak for others too when I say that I greatly appreciate your contributions of fixes to this very broken distribution. I hope that they can all be reapplied now, just without changing the broken behavior of mysql_enable_utf8/mb4 that so many are relying upon. I also think that mysql_enable_utf8/mb4 should be marked as deprecated when the fixed behavior is added. |
We will work out a fix that both solves the issue long-term and doesn't
break things for existing users. I really appreciate the work Pali has
done and take seriously his concerns as I do those in the community who
need their code to work. It certainly seems that there is a lot of
interest in this which will help get perhaps people testing it.
…On 7/1/17 1:29 PM, Dan Book wrote:
@pali <https://github.com/pali> I know this situation is frustrating.
I think I can speak for others too when I say that I greatly
appreciate your contributions of fixes to this very broken
distribution. I hope that they can all be reapplied now, just without
changing the broken behavior of mysql_enable_utf8/mb4 that so many are
relying upon. I also think that mysql_enable_utf8/mb4 should be marked
as deprecated when the fixed behavior is added.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#117 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAGgrkT8imQabiVYIeFyVfsvkKEoFgtpks5sJoHqgaJpZM4M2DZu>.
|
Is it still planned to reapply the reverted fixes? This distribution now has all of the issues again that were previously resolved. |
Discussion about this seems to have moved to the perl.dbi.users mailing list: https://www.nntp.perl.org/group/perl.dbi.users/2017/08/msg37429.html More importantly: https://www.nntp.perl.org/group/perl.dbi.users/2017/09/msg37443.html |
The utf8 encoding changes have resulted in a regression that has unexpected side effects.
Consider the following table:
create table foo (foo longblob);
and script:
use DBI;
my $dbh = DBI->connect('DBI:mysql:database=test','desktop','',{ mysql_enable_utf8 => 1 })
or die $DBI::errstr;
my $sth = $dbh->prepare('INSERT INTO foo (foo) values (?)');
$sth->execute("I18N Web Testing æøå");
my $rth = $dbh->prepare('SELECT foo FROM foo');
$rth->execute();
while (@Row = $rth->fetchrow_array) {
print $row[0],"\n";
}
Now the longblob mysql type should be considered a SQL_BLOB, but because the exec call doesn't call bind_param with the attrib set to SQL_BLOB, the default value of 0 is used SQL_UNKNOWN_TYPE and passed to bind_param. If you sv_dump at 939 in dbdimp.c you'll see:
SV = PV(0x162d950) at 0x15f8630
REFCNT = 1
FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK)
PV = 0x15dc200 "I18N Web Testing \303\246\303\270\303\245"\0
CUR = 23
LEN = 25
COW_REFCNT = 0
And sql_type is the default value of 0.
the output of the scrip will print the value stored in the database:
I18N Web Testing æøå`
because it is double encoding the characters, and not preserving the blob value because the
sql_type_is_binary check returns false because the SQL_UNKNOWN_TYPE value of sql_type.
One example of widely used code that exhibit this behavior is Apache::Session (Apache::Session::MySQL), which calls bind_param explicitly but never passes the SQL_BLOB at and of the call sites. Since Apache::Session is using a Storable (which is binary data stored in a blob) the additional UTF8 encoding can result in data corruption which can cause perl to crash with an OOM error when materialize is invoked. This is a potential security threat.
The text was updated successfully, but these errors were encountered: