Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wtforms] Using wtforms-json which prevents encoding empty strings in the database #5445

Merged
merged 1 commit into from
Feb 4, 2019

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Jul 20, 2018

This issue has been plaguing me for some time, though the actual fix was somewhat trivial. FAB uses WTForms which regrettably coerces a None value into an empty string per here. This means that for the database tables associated with forms string/text columns include a combination of NULL of empty strings (depending on how they were populated) which creates inconsistencies in the the schema, i.e., to identify non-custom SQL tables one would need to use:

COALESCE(sql, '') = ''

as opposed to:

sql IS NULL

This is also troublesome for uniqueness constraints as currently the following schema, table tuples would be considered different (even thought the represent the same entity):

  • (NULL, my_table)
  • (, my_table)

The fix is to use the wtforms-json package which adds supports for None types per here. The package monkey patches the processing of the field and assigns the default value (None) when the value evaluates to false (which includes empty strings).

This PR also performs the necessary migration to retroactively resolve all forms which have encoded empty strings in the database.

to: @betodealmeida @fabianmenges @graceguo-supercat @michellethomas @mistercrunch @timifasubaa

@codecov-io
Copy link

codecov-io commented Jul 20, 2018

Codecov Report

Merging #5445 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5445      +/-   ##
==========================================
+ Coverage   56.07%   56.07%   +<.01%     
==========================================
  Files         526      526              
  Lines       23253    23255       +2     
  Branches     2783     2783              
==========================================
+ Hits        13038    13040       +2     
  Misses       9806     9806              
  Partials      409      409
Impacted Files Coverage Δ
superset/__init__.py 73.17% <100%> (+0.44%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5669a82...e9acc81. Read the comment docs.

@mistercrunch
Copy link
Member

When you say sub-optimal, it seems to me like the difference'' vs NULL wouldn't make much of a difference from a perf standpoint.

@john-bodley
Copy link
Member Author

@mistercrunch I updated the description.

@john-bodley
Copy link
Member Author

ping @mistercrunch

@john-bodley
Copy link
Member Author

@mistercrunch @betodealmeida would you mind looking at this PR? We have numerous inconsistencies in our production database due to lack of uniqueness constraints (#5449, #5451, #5452, #5453) which we would like to resolve and this PR is a necessary upstream requirement.

@john-bodley john-bodley force-pushed the john-bodley-wtforms-json branch 2 times, most recently from 4726575 to 4eb8389 Compare October 2, 2018 03:48
@betodealmeida
Copy link
Member

Taking a look.

@mistercrunch
Copy link
Member

This PR fell in a crack. Let's revive it. Seems like the long term solution is not using WTForms at all as we rip out more of the MVC and FAB scaffolding over time.

So I get the NULL vs '' problem with unique constraints not applying properly, but other than that, does it affect users directly? I'm guessing it may affect the filtering functionality in the CRUD list view UI?

@john-bodley
Copy link
Member Author

@mistercrunch as far as I'm aware it doesn't impact the users however it does pollute the database records if querying and/or writing a migration, i.e, at times one needs to write logic to handle both None (NULL) and an empty string. I felt this was a fairly benign change, though I could be wrong.

@mistercrunch
Copy link
Member

I'm just afraid of the db migration that updates a significant portion of all the cells in the database. Clearly we should recommend scheduling downtime for this one in UPGRADING.md

if getattr(record, col.name) == '':
setattr(record, col.name, None)

session.commit()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a hell of a commit. I'd suggest proceeding carefully and committing on a per-record (only if needed) basis. Maybe on a per-table basis, but that could represent very large commits too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mistercrunch I move the commit to occur after each table. I've also updated the UPDATING.md.

@john-bodley john-bodley force-pushed the john-bodley-wtforms-json branch 3 times, most recently from e4ecfb9 to e8af925 Compare January 31, 2019 17:55
@john-bodley john-bodley added the risk:db-migration PRs that require a DB migration label Jan 31, 2019
@mistercrunch
Copy link
Member

LGTM once it passes the build. Curious to see how long that migration will take in your environment.

@john-bodley
Copy link
Member Author

@mistercrunch I need to update the downgrade version to the laster on master. I’ll test again on our development environment.

@john-bodley
Copy link
Member Author

@mistercrunch the migration took ~ 90 seconds in our environment.

@mistercrunch
Copy link
Member

Sweet. I think this is ready to go!

@john-bodley john-bodley merged commit e1b9077 into apache:master Feb 4, 2019
@john-bodley john-bodley deleted the john-bodley-wtforms-json branch February 4, 2019 17:35
xtinec pushed a commit that referenced this pull request Feb 6, 2019
@xtinec xtinec added the v0.31 label Feb 6, 2019
@betodealmeida
Copy link
Member

@john-bodley @mistercrunch, heads up: we had 2 dashboards with title '', they got changed to None, and it broke some of the CRUD forms. Trying to edit a chart would raise an exception, eg, since it has an input form for dashboards, and these 2 dashboards wouldn't renderer as options.

@john-bodley
Copy link
Member Author

john-bodley commented Feb 21, 2019

@betodealmeida apologies for the breakage and thanks for the heads up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels risk:db-migration PRs that require a DB migration v0.31 🚢 0.34.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants