-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expand mangles unicode characters stored in JSON column #57
Comments
Serializing with j is incorrect. Since the query and parameters are being encoded to UTF-8 already, you want to use to_json so as not to double-encode. Alternatively, use j or encode_json and bind it as a SQL_BLOB as shown in https://metacpan.org/pod/Mojo::mysql::Database#query. But you might as well just use the json generator. |
Cool. Not to disagree with you, but to make sure I understand what's happening. I'm using $db->insert() because passing a hash reference with the column names and values is better for my application. Because I didn't see a way to use the json generator without using $db->query, I encoded the json column with j, and then inserted with $db->insert(). Selecting the json column with the mysql CLI and via a graphical interface both show the ñ character correctly, so it appears it was not double-encoded. Only when I use ->expand does it get mangled. Using $db->select() and j to deserialize returns the proper character. What I am missing I this process? I'm just curious how using j in and j out works fine, but mixing j with ->expand doesn't. Thank you for your time, I appreciate the learning process. |
It is hard to debug encoding issues because the tools you use to debug also work in a certain encoding. When you select (with utf8mb4 enabled) and then use j to deserialize, both of those steps decoded, so it successfully double-decodes the double encoded data. |
You are correct that you can't use the json generator with the SQL::Abstract functions. SQL::Abstract::Pg added this feature (https://metacpan.org/pod/SQL::Abstract::Pg#JSON) but it hasn't been added to SQL::Abstract or SQL::Abstract::mysql yet. |
@ViktorNacht: You might want to have a look at #51. I'll merge the changes once @Tekki is happy with the implementation. Any feedback would be very much appreciated, so please take the PR for a spin. |
The idea with my changes is that you won't have to JSON encode and decode anymore, just work with Perl hash and array references. |
I installed #51 and I loved the fact it automagically $db->insert[ed] my arrayref of hashrefs as JSON. That's so awesome. Using the mysql CLI and graphical interface the unicode characters were correctly inserted. Using $result->expand(1) gave me the same mangled results as before. I also tried removing ->expand to see if it would auto-decode and it did not. I still had to use j. I couldn't tell from the docs if there's a syntax I'm supposed to use to select an entire json column as an array/hashref, rather than the nest data. |
I'll keep an eye on the encoding when I modify the expand function. |
Until now I can't find encoding issues with
produces either broken characters or a 'wide character' warning. The next one
prints the correct word. |
For sure, I included grape juice :) in my test data and get the same results. So if I use #51 to insert the data (by letting the module convert to JSON), and then I use $result->expand(1) and still get mangled unicode, where would you suggest looking for issues? There's no j or double-encoding (that I'm aware of). Would running the tests in #51 possibly help? Keep in mind, all other unicode looks perfect, so it's not an encoding issue with the HTML. It's definitely the roundtrip of the JSON to and from the database. |
The question is if you get this error with the published code. #51 is still an experiment. Can you show us how to reproduce this issue, please? |
@ViktorNacht: There's really no way we can reproduce what you are saying without a small example application or a small test. Saying "unicode looks perfect" sounds to us as you're using your eyes, which isn't proof of anything. |
Wow, you're right about not trusting your eyes. This is a small app I made mostly for myself to test the different ways of inserting and selecting JSON, and my results. This issue is probably specific to my system, so I have to be pragmatic and go with what works. But if this is helpful, or you suggest further testing that could be helpful, that's cool. #!/usr/bin/env perl
use Mojolicious::Lite -signatures;
use Mojo::mysql;
use Mojo::JSON 'j';
app->secrets(['Poop']);
helper mysql => sub {
state $mysql = Mojo::mysql->new('mysql://foo:bar@aws.rds.us-west-2.rds.amazonaws.com/table_name?mysql_enable_utf8mb4=1');
};
get '/query' => sub ($c) {
my $drink = 'Szőlőlé';
my $id = $c->mysql->db->query('INSERT INTO unicode_json_test SET my_json = ?', {json => {drink => $drink}})->last_insert_id;
my $unicode_json_test = $c->mysql->db->select(unicode_json_test => '' => { id => $id })->expand(1)->hash;
say $unicode_json_test->{my_json}->{drink} eq $drink ? 'Matches' : 'Fail';
say $unicode_json_test->{my_json}->{drink};
$c->render(text => $unicode_json_test->{my_json}->{drink});
};
get '/j' => sub ($c) {
my $drink = 'Szőlőlé';
my $object = j { drink => $drink };
my $id = $c->mysql->db->insert(unicode_json_test => { my_json => $object })->last_insert_id;
my $unicode_json_test = $c->mysql->db->select(unicode_json_test => '' => { id => $id })->hash;
my $my_json = j $unicode_json_test->{my_json};
say $my_json->{drink} eq $drink ? 'Matches' : 'Fail';
say $my_json->{drink};
$c->render(text => $my_json->{drink});
};
app->start; '/query' uses the internal engine of the module, while '/j' using j going in and out of the DB. The results are
With /query:
With /j:
(I understand why the wide character warning is there, I just left it for completeness) |
To diagnose your issue, what is the column type and charset of the my_json column? |
CREATE TABLE `unicode_json_test` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`my_json` json NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=utf8mb4; (the column charset is the default charset) |
I suspect this is the intersection of two DBD::mysql bugs. First the well-known encoding bug where it uses input strings incorrectly as parameters, you could verify this by running That would leave us with why |
And you'd be 100% correct:
|
Since DBD::MariaDB support is now merged, could you try with Mojo::mysql 1.13, DBD::MariaDB installed, and (You would also need to remove the mysql_enable_utf8mb4 option, this is default behavior in DBD::MariaDB) |
Yes, I’d be happy to in a couple hours.
Stupid question: Use the same server or spin up a MariaDB instance?
This email was sent from my iPhone and may contain mispellings, outer space words, and not good grammar.
… On Mar 7, 2019, at 15:52, Dan Book ***@***.***> wrote:
Since MariaDB support is now merged, could you try with Mojo::mysql 1.13, DBD::MariaDB installed, and mariadb:// as the protocol?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Using the same server. DBD::MariaDB is just an unfortunate naming choice for the fork. |
FYI, using 'mariadb://' with 'mysql_enable_utf8mb4' removed:
|
I unfortunately don't have access to a new enough MySQL or MariaDB server to test this, but to try to isolate the response case further to a query that could demonstrate the problem for bug reports, what happens if you compare the response of something like this? |
|
Interesting, so it decodes that correctly but not from the JSON column... To make sure it's selecting from the column that's the problem, can you try one last thing: |
FYI:
|
@ViktorNacht, @Grinnz thanks to both of your for the additional information. I still had some difficulties to reproduce it with
I see no difference if I use the 'enable...' flags or not, but I have set everything to 'utf8mb4' on the server. Change the query in the last test to |
Hi! If you think that DBD::MariaDB does not correctly return JSON columns as Unicode strings, can you please apply following patch for DBD::MariaDB and provide output for SELECT which just returns that JSON column? diff --git a/dbdimp.c b/dbdimp.c
index 264a2f14..035c8fed 100644
--- a/dbdimp.c
+++ b/dbdimp.c
@@ -5782,6 +5782,7 @@ process:
/* TEXT columns can be returned as MYSQL_TYPE_BLOB, so always check for charset */
if (mysql_charsetnr_is_utf8(fields[i].charsetnr))
sv_utf8_decode(sv);
+ printf("i=%u name=%s charset=%u type=%d value=%s\n", i, fields[i].name, fields[i].charsetnr, (int)fields[i].type, col);
break;
}
} It prints to stdout information about column, charset and type. |
See my comment in perl5-dbi/DBD-MariaDB#142 (comment) . I do not see anything wrong with current behavior of DBD::MariaDB. UTF-8 JSONs are correctly decoded. |
@pali and all the others: I've no time to look at this before Sunday. |
Here is a test, similar to those from Mojo::mysql, but using DBD::mysql and DBD::MariaDB directly:
Message if failed:
Overview over the results:
|
@jhthorsen Seems we have to deal with the situation that the drivers return inconsistent encodings for JSON columns. We have to test for 'szőlőlé' wherever possible.
it succeeds in all 8 situations. |
Note that DBD::mysql has a Unicode Bug which is never going to be fixed (perl5-dbi/DBD-mysql#117) due to legacy reasons and it is reason why DBD::MariaDB fork was created. So if you need correct Unicode support, just do not use DBD::mysql... It would never work correctly and just cause problems, like double encoding or damanged data. @Tekki, can you apply patch from #57 (comment) to the DBD::MariaDB and provide output for your tests? From your testing it looks like that DBD::MariaDB is correctly working with MariaDB server, but incorrectly with MySQL server. Also it proves the observation (that DBD::MariaDB with MariaDB server is working fine) from perl5-dbi/DBD-MariaDB#142 |
We will solve all problems in Mojo::mysql and nobody will use DBI modules directly anymore. 💪 😄 |
|
@Grinnz wrote:
And @Tekki wrote:
Charset 63 is |
But MariaDB server returns correctly charsrt 45 ( |
"Strings produced by converting JSON values have a character set of utf8mb4 and a collation of utf8mb4_bin" This means that the returned JSON will be returned as a utf8mb4 string (and apparently, with the output of JSON_OBJECT() or similar it is, but not with a JSON column) |
It will be returned as |
MYSQL protocol support just strings. |
@Tekki The main problem is that MySQL say that returned string is not in UTF-8 (but in binary), see: #57 (comment) So DBD::MariaDB driver does not treat returned string from MySQL as Unicode string. It does not matter how are values stored on server (or disk). The only thing which matter is how server send them to the client (DBD::MariaDB) and what is the charset which server send to client. Client (DBD::MariaDB) does not see nor know how are values stored on server. It just see information which server send to the client. And if server send wrong information then client of course cannot process it correctly. |
This situation is a bit unfortunate. We have contact to the developers of DBD::mysql and DBD::MariaDB and can decide which version needs to be installed. But if it comes to errors in MySQL, it's getting difficult. First of all I'm not the one who can discuss technical details with the developers on bugs.mysql.com. I don't understand enough of these protocols to decide if there is really an error in MySQL or if the differences are due to the different implementation of JSON in MySQL and MariaDB. Then, even if there is an error and they correct it, we still have no influence on the version that runs on the machines of our users. Most of them will simply install what is shipped with their distribution. |
@Tekki Hacking Mojo::mysql is not a solution. Once upstream decide to fix it, you would again have broken Mojo::mysql and in very bad irreparable way -- new version starts returning double encoded/decoded strings... Hacking server errors on client side, without knowledge of when and how is going problem fixed on server is way to the hell. You need to know how upstream fix server and based on this details implement workaround for older server. But this cannot be done without communication with upstream server. |
I disagree @pali. I don't get how we're not able to do fix it in the client. We could even have code that checks the server version if we have to. |
You add workaround in client; server gets fixed and all existing code which uses client would be broken after upgrading server. People would not be happy to use such client which is going to be periodically broken by such changes... But main question: Why you are going to workaround server problems without even communicating with server developers and coordinate what is correct workaround for particular problem? This does not make much sense. |
@pali: Because our users might not be able to upgrade to the latest and greatest version of the server. When that is said, I'm not going to fix this, but I'm backing @Tekki's comment above. @Tekki: What do you think we should do? Should we close this as "Won't fix"? This thread is getting too long, and I think it's diverging. (And at least I'm having difficult sorting the information) |
I do not think it is feasible for Mojo::mysql to completely handle this unfortunate situation. Aside from if this actually gets fixed in MySQL, MariaDB (server) does not appear to have this bug, so the workaround would be wrong to apply in that case. There is no way to tell that you are getting json column data back rather than some other binary data (or even JSON stored in a blob column). I also do not know if anyone has tested with DBD::mysql and MariaDB as the server. |
FYI: I have opened a bug against MySQL here, but I still have no way to reproduce or test it myself. https://bugs.mysql.com/bug.php?id=95698 |
Thanks @Grinnz, will be interesting to read their answer. We should all be able to test these things, we'll have to find a solution for that. |
The MySQL bug is now set to 'verified'. |
What does verified mean? That it was accepted as a bug? Or as it was already fixed and verified that fix is correct? |
Verified that the bug exists, I believe. |
I believe this conversation has gone stale. Please let open a fresh issue if changes should be made in Mojo::mysql. |
I can create a working example, but just a quick heads-up that I'm getting mangled unicode when I use the expand function selecting data from a JSON column (technically I was using expand(1)).
I confirmed this by removing expand and then de-serializing it with j and the unicode characters remained in-tact.
FYI, my DSN is something like:
mysql://foo:bar@db.us-west-2.rds.amazonaws.com/table_name?mysql_enable_utf8mb4=1
And I don't know if it matters, but I'm inserting the data by serializing it with j first. I haven't figured out how to use json with SQL::Abstract::mysql
The text was updated successfully, but these errors were encountered: