Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mysql_enable_utf8 => 1 does not encode to utf8 when using some special characters like 'á' or 'ú' #409

Open
Motoko23 opened this issue Jan 25, 2024 · 6 comments
Assignees
Labels
bug utf8 Unicode and UTF-8 handling

Comments

@Motoko23
Copy link

Motoko23 commented Jan 25, 2024

DBD::mysql version

5.1.0

MySQL client version

8.0.32

Server version

8.0.34

Operating system version

Linux Gentoo (kernel 6.6.8)

What happened?

DBD should encode texts into UTF-8 with mysql_enable_utf8 => 1 flag.

It works only:

  1. When using DBD MySQL, and there are no diacritics at all.
  2. When using DBD MySQL, and there are more complex special characters like „ž“, „š“ (characters with hooks, or a combination of more complex characters).
  3. When using DBD MySQL, and force encoding every SQL, and every execute @parameters to UTF-8 just before pass to DBD MySQL. But thats wrong.
  4. When using DBD MariaDB always. MariaDBD is compatible and is 100% working without this bug. So i am now using MariaDBD on MySQL Database.

It is not working:

  1. When using DBD MySQL, when there is only some characters, for example "á" in word „Informátor“. It simply ignores "á" and does not encode it. In database SQL „show processlist“ is „Inform?tor“. So INSERT save word „Inform?tor“. And when search in SELECT WHERE its search for „Inform?tor“. Thats wrong.

Other information

No response

@Motoko23 Motoko23 added the bug label Jan 25, 2024
@dveeden dveeden added the utf8 Unicode and UTF-8 handling label Jan 25, 2024
@jafd
Copy link

jafd commented Jan 29, 2024

Ever tried doing

SET NAMES utf8mb4;

at the beginning of your session?

@Motoko23
Copy link
Author

Ever tried doing

SET NAMES utf8mb4;

at the beginning of your session?

YES.
I have tried also:

SET NAMES utf8mb4;
set character set utf8mb4;

or

SET NAMES utf8;
set character set utf8;

I must repeat, that it should work for ALL characters or NONE. Its working for SOME.

@jafd
Copy link

jafd commented Jan 29, 2024

Is it, by chance, working with characters that are historically in latin-1 but not in whatever 8-bit codepage used to serve your language (looks like Slovak or Czech, so iso8859-2 or cp1252)? Or vice versa? The thing is that your code should be UTF8 across the board, and if the unicode is in your perl source, you need to use utf8; somewhere at the top too.

@Motoko23
Copy link
Author

Motoko23 commented Jan 29, 2024

Everything is in UTF-8 on input or output.
Everything is in UNICODE inside script.

I am using:

use utf8;                                          # this script text is in UTF-8 (auto decode)
use feature 'unicode_strings';      # this script use unicode in regexp (auto decode)

in every file.

I am using
utf8::decode($query);
for every input.

i am using
utf8::encode($page);
just before final print.

Also everything is woking fine with DBD::MariaDB.

I am mostly using this source:
https://perldoc.perl.org/perlunifaq

And thank You for Your help Yaroslav.
I will be using only Maria Database in future, so i will not use MySQL, but i have reported this bug to warn and help others.

@jafd
Copy link

jafd commented Jan 29, 2024

It's interesting because I haven't run into this and I'm using Unicode a lot. Granted, I'm still using DBD::mysql 4.050, so there might have been a regression or other, because I can see that barring some system setting on the way, you haven't left a stone unturned here (system locale in both client and server? The character set declared on the table columns themselves maybe?).

Sorry that I had to ask all of this stuff, but since I've had my own problems with Unicode, I just know how hard it can be to make sure all ducks are really in a row and how one stupid setting can ruin a week.

@Grinnz
Copy link
Contributor

Grinnz commented Jan 29, 2024

This is likely an instance of the Unicode bug present in this distribution and why you should use DBD::MariaDB, see https://github.com/perl5-dbi/DBD-mysql/issues?q=is%3Aissue+is%3Aopen+label%3Autf8 and https://blogs.perl.org/users/grinnz/2023/12/migrating-from-dbdmysql-to-dbdmariadb.html. The workaround if so would be to call utf8::upgrade on any unicode strings immediately before passing them as parameters (it operates on the string in-place, and leaves the string unchanged to Perl but different to broken interpretations like DBD::mysql).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug utf8 Unicode and UTF-8 handling
Projects
None yet
Development

No branches or pull requests

4 participants