Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(migration): Ensure key_value LargeBinary is encoded as a MEDIUMBLOB as opposed to BLOB for MySQL #20385

Conversation

john-bodley
Copy link
Member

SUMMARY

At Airbnb we use MySQL for Superset's metadata database where the SQLAlchemy LargeBinary type—which is used for the key_value.value column—is encoded as a BLOB which can handle up to 64 KiB of data. This isn't sufficient for storing large form-data blobs and thus this PR adds a migration which encodes the column as a MEDIUMBLOB which can handle up to 16 MiB of data.

The solution uses the suggestion here, note 16 MiB is equivalent to 2^24 - 1 bytes.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@john-bodley john-bodley requested a review from a team as a code owner June 14, 2022 23:23
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""empty message
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please describe the revision

@@ -0,0 +1,50 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please name the migration file descriptively

@codecov
Copy link

codecov bot commented Jun 14, 2022

Codecov Report

Merging #20385 (69bdb78) into master (160e674) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #20385   +/-   ##
=======================================
  Coverage   66.68%   66.68%           
=======================================
  Files        1739     1739           
  Lines       65143    65143           
  Branches     6902     6902           
=======================================
  Hits        43443    43443           
  Misses      19947    19947           
  Partials     1753     1753           
Flag Coverage Δ
hive 53.70% <100.00%> (ø)
mysql 82.29% <100.00%> (ø)
postgres 82.36% <100.00%> (ø)
presto 53.56% <100.00%> (ø)
python 82.78% <100.00%> (ø)
sqlite 82.09% <100.00%> (ø)
unit 50.15% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/key_value/models.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 160e674...69bdb78. Read the comment docs.

@john-bodley john-bodley force-pushed the john-bodley--migration-key_value-blob branch from 2aad4ec to 89b890a Compare June 14, 2022 23:30
Copy link
Member

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for making this fix in opensource!

@ktmud
Copy link
Member

ktmud commented Jun 15, 2022

I'm pretty sure me and @villebro have discussed about this before... #19805

Unfortunately the fix was only added to the original migration and not to the model. We didn't create a new migration for existing users, either---because the feature was still not in official releases yet.

@ktmud
Copy link
Member

ktmud commented Jun 15, 2022

I'm also wondering whether it's worth creating a new type like MediumText and update key_value/models to point to this type as well. It's risky when the declarative model is out of sync with the actual db schema.

Highlighting this comment since it obviously got lost in the original PR.

@etr2460
Copy link
Member

etr2460 commented Jun 15, 2022

I'm pretty sure me and @villebro have discussed about this before... #19805

Unfortunately the fix was only added to the original migration and not to the model. We didn't create a new migration for existing users, either---because the feature was still not in official releases yet.

🤦 well we ran into it at least. I think it's probably best practice to not modify a merged migration unless there's a bug with it that can't be resolved in a future migration. And even then, maybe it's better to have another migration that undoes the bad one and then performs the same thing with the bug fix

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this went totally sideways ☹️ Sorry for the mess. One comment regarding keeping the the length of the migration consistent with revision 6766938c6065. Also, it's good to know for similar future issues that the migration apparently succeeds even if the existing_type isn't accurate (when running migrations from scratch on the current codebase it'll be performing the migration on a column with length=2**31).

"value",
existing_nullable=False,
existing_type=sa.LargeBinary(),
type_=sa.LargeBinary(length=2**24 - 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep this in line with #19805, should we make it length=2**31?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro aren't we over provisioning the column if we opt for ~ 2 GiB as opposed to 16 MiB? Maybe per #19805 the existing_type should be sa.LargeBinary(length=2**31)—and something similar in the downgrade.

@michael-s-molina
Copy link
Member

@john-bodley @villebro Merging this to include it in v2.0. We can add more changes in a follow-up if necessary 😉

@michael-s-molina michael-s-molina merged commit f5cb23e into apache:master Jun 23, 2022
michael-s-molina pushed a commit that referenced this pull request Jun 23, 2022
…OB as opposed to BLOB for MySQL (#20385)

* fix(migration): Ensure key_value LargeBinary is encoded as a MEDIUMBLOB as opposed to BLOB for MySQL

* Update 2022-06-14_15-28_e09b4ae78457_resize_key_value_blob.py

Co-authored-by: John Bodley <john.bodley@airbnb.com>
(cherry picked from commit f5cb23e)
michael-s-molina pushed a commit that referenced this pull request Jun 28, 2022
…OB as opposed to BLOB for MySQL (#20385)

* fix(migration): Ensure key_value LargeBinary is encoded as a MEDIUMBLOB as opposed to BLOB for MySQL

* Update 2022-06-14_15-28_e09b4ae78457_resize_key_value_blob.py

Co-authored-by: John Bodley <john.bodley@airbnb.com>
(cherry picked from commit f5cb23e)
@mistercrunch mistercrunch added 🍒 2.0.0 🍒 2.0.1 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M v2.0 🍒 2.0.0 🍒 2.0.1 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants