Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utf8 to utfmb8 conversion api command #15969

Merged
merged 1 commit into from
Jan 8, 2020

Conversation

eileenmcnaughton
Copy link
Contributor

Overview

Adds an api to convert the db from utf8 to utf8mb4 - this is intended to help early adopters of the utf8mb4 standard with CiviCRM. utf8mb4 supports emjois. This is work done by @mfb & all discussion is on #13633 . This change simply brings a subset of that change into what should be an easily mergeable form.

Before

No easy way to convert a DB from utf8 to utf8MB4

After

civicrm_api3('System', 'utf8conversion', []); will do the conversion (supports the is_revert param to undo, mostly for testing purposes.

Technical Details

This extracts from #13633 the portion that deals with the conversion
and switches it from an upgrade script (which is not agreed at this stage as there are concerns about
imposing this change) to an api command (which can be added with relatively little discussion as
it only affects those who choose to use it).

We don't expect this command to outlive apiv3 so adding there & not creating insta-tech-debt on
apiv4.

In order to add a unit test I had to alter to support reverting - doesn't seem like a bad thing all round.

As an aside - I like the way it changes the DB level charset & collation. I think I've seen these fail to be set
in the wild

Comments

@civibot
Copy link

civibot bot commented Nov 27, 2019

(Standard links)

@artfulrobot
Copy link
Contributor

  • General standards
    • Explain (r-explain)
      • PASS : The goal/problem/solution have been adequately explained in the PR.
    • User impact (r-user)
      • ISSUE: The change would noticeably impact the user-experience (eg requiring retraining), and we need a better transition/communication plan.
      • COMMENTS: I have a slight reservation about including this extremely low-level db operation in a point and click interface (API Explorer). I realise it's supposed to be run on the command line, but the UI's name makes it sound safe to "Explore"... I think this could be mitigated by a parameter with a name like confirm, and a title like "I confirm that I know what I'm doing and have made a backup"! And/or I think the doing code should start off with a test for utf8mb4 and row format dynamic support - e.g. try to create a table like that before touching the other tables. Finally I'd point out that this could take much longer than the timeout on a typical http config - so if someone was to run this through the API explorer they could be in trouble.
    • Documentation (r-doc)
      • UNREVIEWED
    • Run it (r-run)
      • UNREVIEWED
  • Developer standards
    • Technical impact (r-tech)
      • PASS: The change preserves compatibility with existing callers/code/downstream.
    • Code quality (r-code)
      • PASS: The functionality, purpose, and style of the code seems clear+sensible.
      • COMMENTS: I'm not qualified to say that every non InnoDB db engine accepts charset and collations of utf8mb4 as I only have experience with InnoDB (and faded experience with MyIsam), and I know there's lots more out there now. Would it be safer to whitelist engines?
    • Maintainability (r-maint)
      • UNREVIEWED
    • Test results (r-test)
      • UNREVIEWED

}
$dsn = defined('CIVICRM_LOGGING_DSN') ? DB::parseDSN(CIVICRM_LOGGING_DSN) : DB::parseDSN(CIVICRM_DSN);
$logging_database = $dsn['database'];
$dao = CRM_Core_DAO::executeQuery("SHOW TABLE STATUS FROM `$logging_database` WHERE Engine <> 'MyISAM' AND Name LIKE 'log\_civicrm\_%'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest whitelisting engines we know can handle utf8mb4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm in practice people should only have INNODB or archive - anything else & they have done something weird & wonderful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so I suggest whitelist InnoDB, archive if that's the only thing this is supposed to work with, rather than <> MyISAM - I've heard quite afew people using weird and wonderful stuff!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - we can revisit if people hit issues on trying it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eileenmcnaughton @artfulrobot sorry for chiming in late - we use TokuDB for log tables (via the alterLogTables hook), so those would not be caught by the query. Some earlier discussion for reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pfigel yes, I thought of you when Eileen said "wonderful" :-)
so do you agree we should have a whitelist, and should that include TokuDB?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@artfulrobot I'd probably lean towards blacklisting known-bad engines. There are lots of weird engines out there, and if anyone wants to use them, I feel like testing compatibility should be up to them. If we add TokuDB here, it feels like that's Civi saying "we support TokuDB" - not sure if that's an accurate assessment.

There's lots of uncertainty there for me though, I'm not sure how common it is for other engines to not support utf8mb4 properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After considering @pfigel's comment I've switched back to the blacklist.

We should note that this is a function we'd expect people with some some knowledge to use - especially in instances where they have done something weird and wonderful.

I agree it's odd that it's exposed in the api explorer. However it's not the only System api in there & we haven't seen support requests on the others. A casual clicker should be put off as the entity is 'System', it has a 'weird name' & I would expect few people would have thee courage to click confirm will-nilly on this
Screen Shot 2019-12-10 at 10 56 07 PM

@eileenmcnaughton
Copy link
Contributor Author

@colemanw there is an issue pointed out here that on one hand we create apis to do various tasks. On the other some of them are expected only to be used by somewhat experienced / knowledgeable users. This is not the first / only so I'm not sure it should be blocking but perhaps we need a convention to suppress from the api things like this & a handful of others?

This extracts from civicrm#13633 the portion that deals with the conversion
and switches it from an upgrade script (which is not agreed at this stage as there are concerns about
imposing this change) to an api command (which can be added with relatively little discussion as
it only affects those who choose to use it).

We don't expect this command to outlive apiv3 so adding there & not creating insta-tech-debt on
apiv4.

In order to add a unit test I had to alter to support reverting - doesn't seem like a bad thing all round.

As an aside - I like the way it changes the DB level charset & collation. I think I've seen these fail to be set
in the wild
@seamuslee001
Copy link
Contributor

I'm going to merge this as is, APIv3 Explorer can already do some relatively destructive stuff anyway such as disabling logging etc turning on and off multilingual so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants