Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.042 improperly encoding blobs when sql_type is SQL_UNKNOWN_TYPE #117

Open
cthulhuology opened this issue Apr 6, 2017 · 110 comments
Open
Labels
utf8 Unicode and UTF-8 handling

Comments

@cthulhuology
Copy link

cthulhuology commented Apr 6, 2017

The utf8 encoding changes have resulted in a regression that has unexpected side effects.
Consider the following table:

create table foo (foo longblob);

and script:

use DBI;

my $dbh = DBI->connect('DBI:mysql:database=test','desktop','',{ mysql_enable_utf8 => 1 })
or die $DBI::errstr;

my $sth = $dbh->prepare('INSERT INTO foo (foo) values (?)');
$sth->execute("I18N Web Testing æøå");

my $rth = $dbh->prepare('SELECT foo FROM foo');
$rth->execute();
while (@Row = $rth->fetchrow_array) {
print $row[0],"\n";
}

Now the longblob mysql type should be considered a SQL_BLOB, but because the exec call doesn't call bind_param with the attrib set to SQL_BLOB, the default value of 0 is used SQL_UNKNOWN_TYPE and passed to bind_param. If you sv_dump at 939 in dbdimp.c you'll see:

SV = PV(0x162d950) at 0x15f8630
REFCNT = 1
FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK)
PV = 0x15dc200 "I18N Web Testing \303\246\303\270\303\245"\0
CUR = 23
LEN = 25
COW_REFCNT = 0

And sql_type is the default value of 0.

the output of the scrip will print the value stored in the database:

I18N Web Testing æøå`

because it is double encoding the characters, and not preserving the blob value because the
sql_type_is_binary check returns false because the SQL_UNKNOWN_TYPE value of sql_type.

One example of widely used code that exhibit this behavior is Apache::Session (Apache::Session::MySQL), which calls bind_param explicitly but never passes the SQL_BLOB at and of the call sites. Since Apache::Session is using a Storable (which is binary data stored in a blob) the additional UTF8 encoding can result in data corruption which can cause perl to crash with an OOM error when materialize is invoked. This is a potential security threat.

@pali
Copy link
Member

pali commented Apr 7, 2017

The whole mysql_enable_utf8 was just experimental and totally broken prior to DBD::mysql 4.042. Experimental status was also written in documentation, see: https://metacpan.org/pod/release/CAPTTOFU/DBD-mysql-4.033/lib/DBD/mysql.pm#mysql_enable_utf8

I'm not sure what we can done here. Basically applications which trying to use experimental broken option provided by DBD::mysql < 4.042 and just broken.

Due to some fundamentals of perl scalars and also MySQL protocol itself, DBD::mysql does not type of bind parameter, so driver can use logic which you already described. When mysql_enable_utf8 is 0, then nothing is encoded/decoded and so treated as binary sequence of octets.

@Grinnz
Copy link
Contributor

Grinnz commented Apr 7, 2017

In other words: DBD::mysql does not know that a parameter is being used for insert into a blob column. Since you set mysql_enable_utf8 it will encode it, unless bound with the SQL_BLOB or similar type. On retrieval it knows that the column is a binary type, so it does not decode it. The behavior you describe only worked accidentally before.

@ap
Copy link

ap commented Apr 8, 2017

The whole mysql_enable_utf8 was just experimental and totally broken prior to DBD::mysql 4.042.

I used it for 10 years without problems. I’m sure I would have had problems if I had tried things where the incongruity of its design got exposed, but I never needed to. So my code had no bugs – even though DBD::mysql was designed incorrectly.

Worse: http://grep.cpan.me/?q=mysql_enable_utf8+file%3A.pm There’s plenty of infrastructure-level code just on CPAN alone which sets this flag on behalf of the user.

In the face of these realities, the fact that the documentation declared the feature experimental back when it was introduced, 11 years ago, is simply irrelevant today.

But now mysql_enable_utf8 is fixed – I‘m not going argue about that. But users’ code that wasn’t broken before now is. And this was simply unnecessary.

@pali
Copy link
Member

pali commented Apr 8, 2017

So... what do you suggest? As wrote I have no idea what can be done here. Probably the best solution would be to fix broken code.

And about mysql_enable_utf8 usage on cpan, you do not know how many of those modules are broken and how many expect that mysql_enable_utf8 is not buggy? If module is multi-dbd and set mysql_enable_utf8, pg_enable_utf8 and also sqlite_unicode then it expects that mysql_enable_utf8 is working (which means needs DBD::mysql >= 4.042).

@ap
Copy link

ap commented Apr 8, 2017

Of course many of these CPAN modules will produce broken results when used in any somewhat unusual configuration where the broken design actually matters. But a lot of people don’t try that, and for them it ends up not mattering that the design is not sound. For these people, there is no gain from fixing the broken design, it just creates busywork.

I’m not sure what I would suggest, specifically. The general idea is always the same though: don’t change how mysql_enable_utf8 works – for now at least –, but add the more correct encoding approach, and then give users a way to ask for it.

How exactly you do that depends on the code and what you need to achieve long term. New flag e.g. mysql_enable_encoding? New flag value e.g. mysql_enable_utf8 => 2? Some other approach maybe? Also, do you warn when mysql_enable_utf8 is used? Maybe detect problematic uses and only warn then? Depends on what’s easier to implement and cleaner as an interface. Overall I would weigh that choice based on how difficult it is to support both the old and new thing long-term. The easier it is to keep the old stuff around, the less old users should be pushed away from it.

(Of course when it comes to the docs, the old approach should be de-emphasised, and the new way should be shown as the obvious/standard way of doing things, to steer new users in that direction. Also, to get people away from the old way it is important to make sure that the new way doesn’t make too many things harder to do than the old way – esp. commonly needed things. Note however the old way must stay documented – because there is still code that uses it, and people who have to maintain that code need to be able to figure out what it does.)

Something along these lines.

@pali
Copy link
Member

pali commented Apr 10, 2017

Problem with old mysql_enable_utf8 implementation is that it behave "pseudo-randomly". For frozen combination of perl at specific version, all cpan modules at specific version and user application it acts deterministic. But once either perl or any cpan module is updated old mysql_enable_utf8 (prior to 4.042) can change result of DBD::mysql. So result is basically unpredictable.

If users (and authors of other modules) are not going to fix broken code and need current behavior they should stay at specific version of perl and all cpan modules. Otherwise it is not guaranteed that also old DBD::mysql version (prior to 4.042) provide same results. It can lead to results: after updating cpan module XYZ to new version MySQL server in user application started returning wrong result, where XYZ can be module which has nothing to do with MySQL.

Supporting such pseudo-random behavior is not easy and funny.

And I really suggest to conserve such fragile code and disallow updating code around (which can fully break it).

What is probably more important: how to write mysql_enable_utf8 code which tries to be compatible with both old buggy DBD::mysql versions and new.

I created this pull request which adds "hacks" into documentation: #119

Please comment, test or provide another suggestion for it. To improve situation.

@ap
Copy link

ap commented Apr 12, 2017

But once either perl or any cpan module is updated old mysql_enable_utf8 (prior to 4.042) can change result of DBD::mysql.

Any CPAN module? How?

@Grinnz
Copy link
Contributor

Grinnz commented Apr 12, 2017

If a CPAN module provides a string with a different internal representation, even if it's the same string, previous versions of DBD::mysql will treat it differently. We had this issue with output from Spreadsheet::ParseExcel that returned (perfectly normal) un-upgraded strings with byte values between 128 and 255, probably due to some dependency, causing errors when used as params to DBD::mysql with mysql_enable_utf8 set, as the internal representation it used was not UTF-8 encoded. Any module that provides you with a string could change what representation it gives you and break DBD::mysql while the string value is unchanged.

A more complete discussion of the issue is in https://rt.cpan.org/Ticket/Display.html?id=87428

@pali
Copy link
Member

pali commented Apr 15, 2017

@Grinnz is right. And moreover un-upgraded strings are created by default from string constants if you do not use utf8 in your file from which those strings come from. Also some older perl versions generate un-upgraded string even if \N{U+XX} construction is used. And any perl function can upgrade or downgrade string as needed.

So please look at pull request #119 if it helps you or not...

@ap
Copy link

ap commented Apr 15, 2017

@Grinnz:

If a CPAN module provides a string with a different internal representation, even if it's the same string, previous versions of DBD::mysql will treat it differently.

Yeah, but that isn’t DBD::mysql changing its behaviour. That’s some other module changing its behaviour. If the user of both modules compensates for the change in the other module, DBD::mysql will get the same inputs as before, and will behave the same way as before. The behaviour of DBD::mysql is not changing, just because of an upgrade to some other module.

It’s perfectly predictable what it does, even if it’s not correct. “When the inputs change, the behaviour changes; when the inputs are the same, the behaviour is the same” is the opposite of random.

A more complete discussion of the issue is in https://rt.cpan.org/Ticket/Display.html?id=87428

Funny, there’s an almost verbatim precursor of my suggestion (but better thought out) already in there. As far as I can tell, doing it that would have worked just fine… with no breakage. Taking that approach would have given you what you wanted, without causing anyone else any pain. So why choose to cause pain when you don’t even gain anything from it?

@pali: So…

Supporting such pseudo-random behavior is not easy and funny.

… if this meant what @Grinnz was talking about, then I disagree that that behaviour is random (even just pseudo), I disagree that it’s hard to support solely on that basis, and I disagree about what’s not funny here (namely, changing it).

So please look at pull request #119 if it helps you or not...

That is indeed useful information in order to cope with the problems of the old interface. But I don’t see anything in there which would argue against adding the correct interface under a different flag – rather than changing mysql_enable_utf8 and thereby breaking already-working code.

Has there been any technical argument for not using a different flag? So far, all I have seen are arguments about the correctness of not using the UTF8 flag – which I do not disagree with at all.

@pali
Copy link
Member

pali commented Apr 15, 2017

That’s some other module changing its behaviour.

I cannot agree here with you, because calling utf8::upgrade() or composing string via \x{...}\x{...} sequences, or via \N{...} is not behavior change. It is same behavior with exactly same function API.

What is changing here is low level representation, and you can compare it compiling program with different optimization flags, e.g. -O0 or -O2 in gcc. In both cases you get same program, just compiled with different assembler instructions.

And in perl usage of utf8::upgrade() or \N{...} is similar behavior. Perl just translate source file to different opcodes, but meaning of program is same...

Note that perl operators like . (dot, concatenation) or perl functions like substr calls utf8::upgrade() in some cases. And there is no guarantee that due to optimization in future it would not be changed (under which conditions is utf8::upgrade() called).

I wrote it is pseudo-random (not fully-random), which means it is deterministic. So I could agree with you that behavior can be predicted.

Originally I talk with DBD::mysql maintainers (half a year ago?) and wrote first proposed solution that introduction of new attribute for utf8 with hacking code around with supporting both of them can be possible. But IIRC we dismiss it due to introduction of big mess in code and reason that current behaviour does not make sense in perl.

Due to above fact and also that calling utf8::upgrade() is legal and already done by perl itself at lot of places and also that in documentation was explicitely written "experimental and may change" state, I introduced patches which fixes mysql_enable_utf8, rather then having two different UTF-8 supports... Note that I explicitely stated in pull request #67 that it would change behavior for more then month in review state and about 3 months there was on cpan engineering version. So I though it would be OK as nobody was against.

@sartak
Copy link

sartak commented May 22, 2017

Originally I talk with DBD::mysql maintainers (half a year ago?) and wrote first proposed solution that
introduction of new attribute for utf8 with hacking code around with supporting both of them can be
possible. But IIRC we dismiss it due to introduction of big mess in code and reason that current
behaviour does not make sense in perl.

Please reconsider this stance. In as much as the new behavior is "correct", the old behavior was widely deployed and has a tremendous amount of inertia. Migrating to the new behavior will require coordination and substantial effort. We (Request Tracker) are willing to undergo that effort, but we do need to prepare for it. Given that we are in the middle of a stable series and DBD-mysql 4.042 is getting traction, we are unable to do such a migration right now. Our only recourse right now is to refuse to install under 4.042, forcing users to install 4.041 by hand.

@pali
Copy link
Member

pali commented Jun 16, 2017

old behavior was widely deployed

What we learnt from this is that developers do not read documentation nor care about states like "This option is experimental and may change in future versions.".

Now I'm thinking how could be experimental features which are not expected to work correctly or which can be removed/changed at any time introduced into modules...

@ap
Copy link

ap commented Jun 16, 2017

What we learnt from this is that developers do not read documentation nor care about states like "This option is experimental and may change in future versions.".

How seriously can you expect them to take that after the feature has been “experimental” without changes in behaviour for 10 years?

@sartak
Copy link

sartak commented Jun 16, 2017

How seriously can you expect them to take that after the feature has been “experimental” without changes in behaviour for 10 years?

Furthermore see this commit from a year ago and especially its message: 026fa02

@mbeijen
Copy link
Contributor

mbeijen commented Jun 16, 2017

Yeah, I removed the 'this is experimental' message last year because I knew lots of software depended at that point on this switch because it was the only way to get some form of sane behaviour from DBD::mysql - which had lots of other issues, but also mostly worked for most people.

I'm really not so happy with the situation that emerged. I though we fixed lots of smaller issues but I also thought we had enough unit test coverage to not break existing software using DBD::mysql for people. We explicitly sollicited feedback after posting development releases to CPAN by making blog posts on b.p.o, sending out mails to mailing lists, and of course we only got actual feedback (like this issue) after we released the stable.

@CaptTofu please advise!

I think we need to look at doing what the original code did and do whatever worked for most users.

@ap
Copy link

ap commented Jun 17, 2017

It doesn’t seem too late to add a mysql_enable_unicode with the new behaviour and put mysql_enable_utf8 back the way it was. It’s been a couple months, but a couple months is not long: the cascade of packagers and package upgrades is much longer than that, so a lot of users are not yet affected.

@pali
Copy link
Member

pali commented Jun 17, 2017

How seriously can you expect them to take that after the feature has been “experimental” without changes in behaviour for 10 years?

Because there were open bugs that it does not work for 10 years?

@ap
Copy link

ap commented Jun 17, 2017

What software doesn’t have “some open bugs”? I do not believe you hold even yourself to that standard. If that was the bar for what software is safe to use… what would be left?

@nightlight1
Copy link

Hi, I'm an end user that would like to share with you the impact of this change.
I work at a company that has 100GB of files stored in MySQL databases of 5+ years old.
After upgrading this module all all user files that are inserted into tables with LONGBLOB columns are corrupt when they are retrieved.
The manual says that BLOB files are not encoded but it does (see testing code below).
Now we are challenged to find out which records are corrupt, and which ones are not.
Unsetting the flag is not an option as all our text data is utf8 mixed with LONGBLOBs.
The proposed solutions asks for a review of every written SQL statement and start specifying for each column its DataType.
This is not user friendly as it is more time consuming to write code.
This change breaks with Perl's paradigm that backwards compatibility is safeguarded by the habbit to test all changes against all exiting CPAN modules before releasing it.
The arguments from this change are understandable from a technical point of view, but we prefer to maintain usability by keeping the module backwards compatible.
Thank you for your understanding.

Manual text:

Input bind parameters of binary types (SQL_BIT, SQL_BLOB, SQL_BINARY, SQL_VARBINARY and SQL_LONGVARBINARY) are not touched regardless of the mysql_enable_utf8 attribute state.

However, the following code inserts corrupt binary data:

use DBI qw(:sql_types);

open FILE, "test.bin";
binmode FILE;
read(FILE, my $file, 100000000);
close FILE;

$dbh = DBI->connect($dsn,$user,$password,{mysql_enable_utf8 => 1});
$sth = $dbh->prepare("INSERT INTO pdo_blob (the_blob) VALUES(?)");
$sth->bind_param( 1, $file); # corruption
#$sth->bind_param(1, $file, SQL_BLOB); # no corruption
$sth->execute();
($fileNew) = $dbh->selectrow_array("SELECT the_blob FROM pdo_blob ORDER BY id DESC LIMIT 1");
print ($file eq $fileNew ? "Equal" : "Not equal");

This change also make it impossible to use the following one-liner as it requires the data type of all placeholders to be specified:
$dbh->do($SQL,undef,@$placeholderValues);

@pali
Copy link
Member

pali commented Jun 28, 2017

Input bind parameters of binary types ...

$sth->bind_param( 1, $file); # corruption

This parameter is not bound as binary type.

#$sth->bind_param(1, $file, SQL_BLOB); # no corruption

This one is.

This change also make it impossible to use the following one-liner as it requires the data type of all placeholders to be specified:
$dbh->do($SQL,undef,@$placeholderValues);

It is limitation of MySQL. If mysql_server_prepare is turned off (by default), then all question marks are replaced by bind values in DBI client code. And DBI send to MySQL server one non-parametrized SQL string. If mysql_server_prepare is turned on, then DBI client send SQL statement with question marks to prepare on MySQL server and after that it send parameters. But MySQL server does not provide information if value assigned to question mark is binary or not. Therefore MySQL client has no idea if it is binary or not and user is responsible for it. It is limitation of MySQL.

@nightlight1
Copy link

I understand the underlying motivation to make this change. You are right that is a bug that needs to be fixed.
I wanted to share with you the impact of this change from a user perspective as I believe that this necessary fix comes with the side effect of corrupting data without the user/developer noticing it.
I hope this will make the maintainer of this module consider to change the module in such a way that backwards compatibility is maintained.

@shadowcat-mst
Copy link

@pali turning on mysql_enable_utf8 has been the least broken option for handling unicode with DBD::mysql for over a decade. It's been basically standard for e.g. any DBIx::Class app that uses unicode w/mysql for most of that time.

The only thing you can do here that won't cause massive corruption of people's production data is to keep it behaving the old way, even if the old way is stupid. You can add a warning that the old way is stupid and that people should consider migrating to the new way, but the old code must stay bugwards compatible or you're going to trash people's production data.

This is the sort of backcompat problem that most sucks to have as a maintainer, and I do very much feel for you, but there really ain't any alternative here. Sorry.

@nightlight1
Copy link

Dear @pali Are you the maintainer of this module? What do you think about the below proposal together with mentioning in the documentation that a value 1 is not sufficient as it lacks encoding of the input statement+parameters but that it remains there for backward compatibility, and value 2 is preferred for correct UTF-8 handling in combination with the note that specifying the datatype of each placeholder value is mandatory for this setting?

Working of 4.041:
mysql_enable_utf8 not set or 0: nothing (default)
mysql_enable_utf8 set to 1:

  1. "SET NAMES utf8" at connect? Yes;
  2. Input statement and bind parameters encoded? No
  3. Retrieved columns decoded? Yes

Working of 4.042:
mysql_enable_utf8 not set or 0: nothing (default)
mysql_enable_utf8 set to 1:

  1. "SET NAMES utf8" at connect? Yes;
  2. Input statement and bind parameters encoded? Yes
  3. Retrieved columns decoded? Yes

Proposal for 4.043:
mysql_enable_utf8 not set or 0: nothing (default)
mysql_enable_utf8 set to 1:

  1. "SET NAMES utf8" at connect? Yes;
  2. Input statement and bind parameters enoded? No
  3. Retrieved columns decoded? Yes
    mysql_enable_utf8 set to 2:
  4. "SET NAMES utf8" at connect? Yes;
  5. Input statement and bind parameters enoded? Yes
  6. Retrieved columns decoded? Yes

@Grinnz
Copy link
Contributor

Grinnz commented Jun 28, 2017

@nightlight1 unfortunately it is not that simple. In the old behavior, the decision whether input parameters were encoded or not is based on the internal state of the perl string, not a clear user visible reason. This is part of the bug that was fixed. Also, retrieved columns were decoded in different circumstances as well. Regardless, you are on the right track, the backwards compatible fix to this would be to restore the behavior of mysql_enable_utf8 and mysql_enable_utf8mb4, and add a new option or option value for the correct behavior.

@pali
Copy link
Member

pali commented Jun 28, 2017

@Grinnz And that is still not enough! Even without mysql_enable_utf8 DBD::mysql damage input strings. See this example with old DBD::mysql:

$ perl -MDBI -MData::Dumper -e '$Data::Dumper::Useqq = 1; my $dbh = DBI->connect("dbi:mysql:test", "test", undef); my $str2 = "\301" . "\N{U+10FF}"; my $str = substr($str2, 0, 1); print Dumper $dbh->selectall_arrayref("SELECT ?", {}, $str)'
$VAR1 = [
          [
            "\303\201"
          ]
        ];

You put "\301" on input and you woud get "\303\201" on output.

With new DBD::mysql you get correct "\301".

So you cannot support both old buggy behavior and also new perl-correct. I have no idea what else can be done there. Trying to support old buggy behavior would mean to introduce either new bugs or break non-UTF-8 support at all.

@Grinnz
Copy link
Contributor

Grinnz commented Jun 28, 2017

Then it sounds like the bugfix with mysql_enable_utf8 disabled has not caused any problems so far, and that can remain. It is only the behavior with mysql_enable_utf8/mb4 changing that has caused the issues in this thread. Would it be possible to restore the old behavior with those switches, and add a new switch or value for the fixed UTF-8 behavior?

@nightlight1
Copy link

Every day this current version is distributed it causes database corruption in production environments (every INSERT/UPDATE statement involving BLOB's causes corruption).
In the event it is a challenge to find a solution it would be better for the time being to rollback the current (better) UTF8 implementation and put these changes back in a "development release" until all undesired side effects are covered.

@Grinnz
Copy link
Contributor

Grinnz commented Jun 28, 2017

It is not possible to avoid this "side effect" or "fix" it to recognize blob inputs, for reasons that were discussed earlier in this thread. The old way only accidentally worked for this use case and cannot be relied upon for a real fix. The only way to make it both correct for new code and bugwards compatible for old code is to put the correct behavior on a separate option.

@mbeijen
Copy link
Contributor

mbeijen commented Jun 28, 2017 via email

@pali
Copy link
Member

pali commented Jun 29, 2017

@nightlight1 you can disagree... but if something is not possible to achieve and fully implement then you would need to take it as is. The only option which I know is freeze of whole module.

@markuswernig
Copy link

@pali

It is not possible to have old behavior working in current code.

Which is why the new ("current") code needs to be introduced in a different way than it has been done for 4.0.42. For which it first needs to be backed out and then probably reintroduced in a different manner.

@markuswernig
Copy link

@pali

I'm sorry if you understood my words too aggressive or as attack.

Well, @shadowcat-mst was not the only one to get that impression ...

@nightlight1
Copy link

I also read the wording, but I also noticed the many exclamation marks that were added to the manual page of this module. It convinced me that if a person uses these in neutral sentences it may just be his/her common way of expressing and not necessarily meant to be offensive.
For example: people from the Mediterranean appear more enthusiastic in their language than people from Northern Europe.

@mbeijen
Copy link
Contributor

mbeijen commented Jun 29, 2017

So, @CaptTofu and I just released version https://metacpan.org/source/MICHIELB/DBD-mysql-4.043/Changes 4.043 which is the same as 4.041 but with a higher version number.

In the next period, we'll be going to add back fixes that were added after 4.042 and put that out as a development release. We're explicitly going to fix bugs but NOT wanting to break things. If we're going to make changes to how unicode is handled, and if it's going to (also) require changes to existing code, it will be only after users explicitly opting in. We don't want people upgrading their perls or doing an OS upgrade and finding they get data corruption.

The one good thing is there are now lots of eyeballs here. Of course we're going to be extra careful now, but some of you testing upcoming development releases would be more than welcome.

@Grinnz
Copy link
Contributor

Grinnz commented Jun 29, 2017

Thank you. I hope that a new version/option of the corrected unicode handling will be released quickly however. Until then I will not be able to upgrade our DBD::mysql from 4.042 as I do not want to go back to the broken behavior.

@mbeijen
Copy link
Contributor

mbeijen commented Jun 29, 2017 via email

@markuswernig
Copy link

@mbeijen , @CaptTofu
Thank you very much! (And of course: Thanks to you and all the other devs for all the work you have already put into DBD::mysql.
U rock!

From what I understand, existing end-user code will now work again.

Of course we're going to be extra careful now, but some of you testing upcoming development releases would be more than welcome.

I'd be more than happy to help with testing. Where's the best point to be looking (list/blog to be tracking etc.) for changes related to this issue?

@mbeijen
Copy link
Contributor

mbeijen commented Jun 30, 2017

@markuswernig typically we post at dbi-announce (http://lists.perl.org/list/dbi-announce.html) which I'll do today.

@pali
Copy link
Member

pali commented Jun 30, 2017

Does it mean that DBD::mysql is now freezed forever? As i have no idea how you want to do future development & still provide that buggy backward compatibility.

@pali
Copy link
Member

pali commented Jul 1, 2017

Btw, now look what happened and what you did:

  • DBD::mysql has non-working Unicode support forever
  • DBD::mysql is incompatible with DBD::Pg and DBD::SQLite which fixed Unicode support in past (in same way as DBD::mysql 4.042)
  • 15+ bug fixes were reverted, including fix for amavis
  • development of DBD::mysql is going to stagnate, probably full freeze as above nonsense buggy behavior is not possible to achieve with bigger code changes
  • nobody would start using DBD::mysql anymore in 2017+ as Unicode support is required

Ask question yourself: Based on above facts, who can be interested in use of DBD::mysql which does not work as expected in perl? New developers? Sure not. Active developers? Not sure, why anymore?

Now I asked for two CVEs for DBD::mysql. We would see who would start fixing security issues based on above fact that buggy behavior cannot be fixed.

And basically everything just because two, tree, four, five (?) people are usable to report bugs to broken application and just started to want have old, broken, non-working and unmaintainable behavior of mysql_enable_utf8 in DBD::mysql driver.

You now killed DBD::mysql.

Like @Grinnz said, I already know more people who would need to pin DBD::mysql to 4.042, just because they started to use utf8 and unicode strings correctly, like in whole perl word.

all: Instead of asking for help how you fix your or someone's else buggy application (which I or any other people could help), you rather forced maintainers to go with DBD::mysql to hell.

@kentfredric
Copy link

And basically everything just because two, tree, four, five (?) people are usable to report bugs to broken application and just started to want have old, broken, non-working and unmaintainable behavior of mysql_enable_utf8 in DBD::mysql driver.

You're looking at it wrong.

This broke so much code that representatives from 3 different companies had to tell you you'd broken their code.

You have to always see your feedback as the tip of an iceberg, because it is.

And there are probably many more companies out there currently corrupting their data, and won't know about it because they're not checking yet.

Only the very top echelon of developers involve themselves in opensource enough that they actually communicate with opensource. The vast majority are silent consumers

When you're dealing with software that's purpose in life is to not corrupt data, and have data there tomorrow, you go out of your way not to break that promise.

There's no point in being involved in this sector if you don't care to deliver on that promise.

@pali
Copy link
Member

pali commented Jul 1, 2017

Companies which dealing with such data are running on verified infrastructure and every upgrade of core system is tested and reviewed. And in most common scenario they are running some enterprise distribution (like RHEL or their clones) where are versions already freezed.

Nobody in companies is updating random software from random internet places directly on production without any testing.

@kentfredric
Copy link

Companies which dealing with such data are running on verified infrastructure and every upgrade of core system is tested and reviewed

No, only the best, most careful, and most expensive of such companies do.

You'd be disturbed how common bad practices are, and practices that exist purely out of lack of time or funding.

Heaven help anyone who dares to make a small business and not really understand their technology.

Nobody in companies

There's a saying, and its truer than you want to believe: Everyone has a testing rig.
But only some people also have a production server.

@Row
Copy link

Row commented Jul 1, 2017

@kentfredric if you silently introduce the changes of course data will be corrupted! When introducing breaking changes you should deprecate and make sure that old code do not even run. Then developers can choose to stay on an old version or upgrade.

@leejo
Copy link

leejo commented Jul 1, 2017

@kentfredric if you silently introduce the changes of course data will be corrupted! When introducing breaking changes you should deprecate and make sure that old code do not even run. Then developers can choose to stay on an old version or upgrade.

@Row - that's exactly what we're all saying in this thread and it's what the very first sentence of the issue report says:

The utf8 encoding changes have resulted in a regression that has unexpected side effects.

A regression with unexpected side effects is by definition a silent introduction of changes, and we're saying the approach taken (that has since been reverted) needs to be rethought.

@kentfredric
Copy link

kentfredric commented Jul 1, 2017

Indeed, its not so much about "no change", its about creating changes such that they avoid breaking code.

And even this is not a "death to DBD::mysql", because there are multiple ways of getting the desired result from different syntax.

There's lots of ways to avoid breaking backwards compatibility.

The most unfortunate thing here is this step wasn't made before 4.042 was released.

"Opt in to change[1]" is much more reliable than "make sure upstream didn't break your code every release".

The latter makes sense only if you assume upstream don't care about your data, which is a bad default assumption to need for a Database Module.

1: Keeping in mind, "upgrading" and "opting in" are very different things in practice.

@jacquesg
Copy link

jacquesg commented Jul 1, 2017

Not intending to hijack this issue, but I've run into exactly the same type of problem recently:

dankogai/p5-encode#105

In this case something in core's behaviour silently changed.

@ap
Copy link

ap commented Jul 1, 2017

Based on above facts, who can be interested in use of DBD::mysql which does not work as expected in perl?

You keep saying these, and they keep not being true.

I ran MySQL in production with “broken” DBD::mysql versions for >10 years, and the content in the database was never wrong. How is that not working as expected?

Upgrading to your “fixes” would have made the content in my database wrong. How is that working as expected?

I was glad that the place I work at switched away from MySQL just before your fixes landed in DBD::mysql. You sure did not increase my interest in DBD::mysql… sorry to say that.

There is a difference between a bug in the implementation and a bug in the design of an interface. In this particular case, there was a design bug, not an implementation bug. The old interface took an implementation detail (the UTF8 flag in scalars) and made that part of is interface semantics. That’s a bad choice. But it’s an interface design choice. It’s not an implementation bug.

And it can be worked with perfectly reliably if you understand the interface. You and several others wrote repeatedly that the old interface could be broken by upgrades of unrelated modules, and that you had no idea how to write code against such an interface. First of all, that’s only a “could” – there are plenty of cases where that wouldn’t happen. But even then, you absolutely can write robust code against an interface that relies on the UTF8 flag. I hope you have heard of utf8::upgrade and utf8::downgrade? They make it perfectly possible to use a UTF8-flag-reliant interface in a way that is robust against future changes in the dataflow within a program. It’s just onerous.

What you did was to break the contract of a long-standing interface, because its design is bad. The resulting interface design may be better, no dispute – but that’s not a bugfix. It’s interface breakage.

You want a better interface? (Yes, it’s a better interface.) Add it. Don’t break one that already has tons of users.

@Grinnz
Copy link
Contributor

Grinnz commented Jul 1, 2017

@pali I know this situation is frustrating. I think I can speak for others too when I say that I greatly appreciate your contributions of fixes to this very broken distribution. I hope that they can all be reapplied now, just without changing the broken behavior of mysql_enable_utf8/mb4 that so many are relying upon. I also think that mysql_enable_utf8/mb4 should be marked as deprecated when the fixed behavior is added.

@CaptTofu
Copy link
Member

CaptTofu commented Jul 2, 2017 via email

@Grinnz
Copy link
Contributor

Grinnz commented Jul 25, 2017

Is it still planned to reapply the reverted fixes? This distribution now has all of the issues again that were previously resolved.

@leejo
Copy link

leejo commented Sep 12, 2017

Discussion about this seems to have moved to the perl.dbi.users mailing list: https://www.nntp.perl.org/group/perl.dbi.users/2017/08/msg37429.html

More importantly: https://www.nntp.perl.org/group/perl.dbi.users/2017/09/msg37443.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
utf8 Unicode and UTF-8 handling
Projects
None yet
Development

No branches or pull requests