Skip to content

Conversation

@nicktobey
Copy link
Contributor

dolthub/vitess#293 should be merged before this.

This PR does two main things:

Parse and validate the collate option, even on binary columns.

Currently the collate option is ignored on columns of binary type, an we just assume binary collation because it's the only one allowed. This is usually correct but causes some problems.

CREATE TABLE t (pk varbinary(10) collate utf8mb4_0900_bin); shouldn't parse, because it's attempting to use an illege collation for column pk. However, we currently ignore the option and parse it anyway.

CREATE TABLE t (pk varbinary(10)) collate utf8mb4_0900_bin; on the other hand, needs to succeed. Binary columns do not inherit the default collation of the table.

Reject the charset option except on columns that allow it.

According to MySQL, only text, sets, and enums are allowed to have character sets. Attempting to specify a character set for any other column type is an error. Before this PR, we were simply ignoring the character set where it didn't make sense.

A good way to think of it is that varbinary is like a shorthand for varchar charset binary. In fact, you're even allowed to write CREATE TABLE t (pk varchar(10) collate binary); and MySQL will generate a varbinary(10) column. Since the column already has a specified char set, it doesn't default to the table charset/collation. And you can't supply an explicit charset to the column because it already has one implicitly.

@nicktobey nicktobey force-pushed the nicktobey/conversion branch from c0bb29f to 9920666 Compare December 2, 2023 01:34
@nicktobey nicktobey requested a review from max-hoffman December 2, 2023 01:34
At the same time, reject the charset option except on columns that allow it.
@nicktobey nicktobey force-pushed the nicktobey/conversion branch from 9920666 to 76ec008 Compare December 2, 2023 01:35
Copy link
Contributor

@max-hoffman max-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correctness fixes sound good, i like the organizational improvements, tests look good. @Hydrocharged might be able to comment on any collation/charset specific details that he understand's a lot better

Copy link
Contributor

@Hydrocharged Hydrocharged left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well!

@nicktobey nicktobey merged commit 7274cc9 into main Dec 6, 2023
@Hydrocharged Hydrocharged deleted the nicktobey/conversion branch February 7, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants