Skip to content

Possible utf8 <> utf8mb4 confusion #1240

Open
@AtiX

Description

@AtiX

Hi,

I was debugging a bug in my application and stumbled across something in mysql2 that looks odd to me and would like to clarify.

The bug:

  • Context: I am using typeorm and cannot save an emoji (4 unicode bytes) in a string field. mysql2 version 2.2.5
  • The server (MySQL 5.7) has the collation set to utf8mb4_unicode_ci, as does the table/column.
  • I supply the connection option charset: "utf8mb4_unicode_ci" to mysql2, but I still get an error when trying to save the string
  • Only when I manually execute the query SET NAMES 'utf8mb4'; and SET CHARACTER SET utf8mb4; after connecting, it works.

Now, what I found odd in mysql2 (when the connection is configured and instantiated):

  • in charsets.js, the string UTF8MB4_UNICODE_CI is mapped to charset number 224
  • However, this number is later then mapped to this.clientEncoding = "utf8" via charset_encoding.js, which looks strange to me, as charset_encoding.js states that is "basically the result of SHOW COLLATION query" (in which case the encoding should be utf8mb4, not utf8)

I see that there is special mapping code in generate-charset-mapping.js to map utf8mb4 onto utf8 so this is probably deliberate(?),

but in the end, there still seems to be a bug, as I cannot use the utf8mb4 charsets without "manual patching" after the connection is already established.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions