Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support of utf8mb4 for mysql #6992

Merged
merged 8 commits into from
May 24, 2019
Merged

Conversation

lunny
Copy link
Member

@lunny lunny commented May 19, 2019

This PR will allow you change mysql's charset to utf8mb4. This may require your InnoDB greater than 5.6 as @philfry said on #3516 (comment), but gitea will not check mysql's version currently. Users should know their database version themselves.

should fix #5660, #6988 & #2711

@lunny lunny added the type/enhancement An improvement of existing functionality label May 19, 2019
@lunny lunny added this to the 1.9.0 milestone May 19, 2019
@codecov-io
Copy link

codecov-io commented May 19, 2019

Codecov Report

Merging #6992 into master will decrease coverage by <.01%.
The diff coverage is 75%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #6992      +/-   ##
=========================================
- Coverage   41.51%   41.5%   -0.01%     
=========================================
  Files         440     440              
  Lines       59457   59459       +2     
=========================================
- Hits        24683   24678       -5     
- Misses      31556   31561       +5     
- Partials     3218    3220       +2
Impacted Files Coverage Δ
modules/auth/user_form.go 42.85% <ø> (ø) ⬆️
routers/install.go 0% <0%> (ø) ⬆️
models/models.go 56.77% <100%> (+0.18%) ⬆️
modules/avatar/avatar.go 81.25% <0%> (-18.75%) ⬇️
modules/process/manager.go 76.81% <0%> (-4.35%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 181b7c9...44543d6. Read the comment docs.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label May 19, 2019
@lunny lunny mentioned this pull request May 19, 2019
6 tasks
connStr = fmt.Sprintf("%s:%s@%s(%s)/%s%scharset=utf8&parseTime=true&tls=%s",
DbCfg.User, DbCfg.Passwd, connType, DbCfg.Host, DbCfg.Name, Param, tls)
connStr = fmt.Sprintf("%s:%s@%s(%s)/%s%scharset=%s&parseTime=true&tls=%s",
DbCfg.User, DbCfg.Passwd, connType, DbCfg.Host, DbCfg.Name, Param, DbCfg.Charset, tls)
Copy link
Contributor

@zeripath zeripath May 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whilst we're changing this it might be sensible to escape these parts.

@lunny
Copy link
Member Author

lunny commented May 20, 2019

@zeripath done.

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels May 20, 2019
@silverwind
Copy link
Member

I take it there is no way to migrate existing utf8mb3 data because it may already be in a corrupt state?

@lunny
Copy link
Member Author

lunny commented May 21, 2019

@silverwind migrate from utf8 is possible but it’s out of scope of this PR.

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels May 24, 2019
@techknowlogick
Copy link
Member

Approved, but should we make breaking change and default to utf8mb4?

@lunny
Copy link
Member Author

lunny commented May 24, 2019

@techknowlogick That could be another PR. Of course, I think utf8mb4 as default is more reasonable.

@techknowlogick techknowlogick merged commit d5a98a2 into go-gitea:master May 24, 2019
@lunny lunny deleted the lunny/utf8mb4 branch May 24, 2019 04:31
@immanuelfodor
Copy link

Simplest InnoDB version check according to: https://www.fromdual.com/innodb-version

SHOW GLOBAL VARIABLES LIKE 'innodb_ver%';
+----------------+-------------+
| Variable_name  | Value       |
+----------------+-------------+
| innodb_version | 5.6.42-84.2 |
+----------------+-------------+
1 row in set (0.00 sec)

@zeripath
Copy link
Contributor

Ok fancy putting a pr in to test this?

@mrsdizzie
Copy link
Member

mrsdizzie commented May 24, 2019

The real thing to test is:

By default there is a 767 byte limit on indexes, which means that you will exceed the index limit length if you have varchar(255) with a 4 byte encoding like utf8mb4 (255 * 4 = 1020). This is fixed by the innodb_large_prefix setting which was introduce in 5.6.3:

https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_large_prefix

This is still off by default, but many hosts turn it on.

If testing I think the right thing is to test if innodb_large_prefix is set to either 1 or on (both syntax are supported).

MariaDB [test]> SHOW GLOBAL VARIABLES LIKE 'innodb_large_prefix%';
+---------------------+-------+
| Variable_name       | Value |
+---------------------+-------+
| innodb_large_prefix | ON    |
+---------------------+-------+

More complicated, this feature only works when the ROW_FORMAT is COMPRESSED or DYNAMIC.

The Default for MySQL 5.6 was COMPACT and 5.7 is Dynamic -- so it wouldn't work right on 5.6 unless whatever created the rows also specified ROW_FORMAT=DYNAMIC|COMPRESSED in the create statement (in addition to using innodb_large_prefix) .

To simplify, it might be easiest to require 5.7+ and then check the innodb_large_prefix value to make sure it is 1|ON.

Also it will just work without these settings, and the failure would then only happen if somebody entered a string that was more than 191 characters into one of the index fields, with Index column size too large. The maximum column size is 767 bytes. So maybe that is something that can be tested too

@immanuelfodor
Copy link

A few weeks ago, I followed the conversion steps of the Nextcloud database as seen here: https://docs.nextcloud.com/server/16/admin_manual/configuration_database/mysql_4byte_support.html Maybe it can be useful for you guys, there were a few SQLs I needed to run, and also an SQL query generator SQL query 😃 FYI, the above InnoDB version you can see in the previous comment is from a MariaDB install on Ubuntu. I don't know how to get 5.7, apt installs the 5.6 version by default.

@silverwind
Copy link
Member

silverwind commented May 24, 2019

Keep note that the innodb_large_prefix was removed in MySQL 8.0.0. From https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-0.html:

The following InnoDB file format configuration options were deprecated in MySQL 5.7.7 and are now removed: innodb_large_prefix

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

can't add emoji to issues
8 participants