-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: presto table schema expansion when nested rows contain columns of map
and/or array
types
#26892
base: master
Are you sure you want to change the base?
fix: presto table schema expansion when nested rows contain columns of map
and/or array
types
#26892
Conversation
map
type
Thanks @brouberol for the PR. I've added a few reviewers who are likely familiar with this feature. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #26892 +/- ##
==========================================
+ Coverage 60.48% 65.08% +4.59%
==========================================
Files 1931 526 -1405
Lines 76236 37907 -38329
Branches 8568 0 -8568
==========================================
- Hits 46114 24672 -21442
+ Misses 28017 13235 -14782
+ Partials 2105 0 -2105
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI flagged the following required change.
0c3f6ea
to
ccbe4e9
Compare
map
typemap
and/or array
types
e6879bd
to
e357679
Compare
(cherry picked from commit 8fbaf84)
(cherry picked from commit 983a164)
(cherry picked from commit ac8c283)
(cherry picked from commit cc2f6f1)
(cherry picked from commit 2a47edc)
(cherry picked from commit c54fbe6)
(cherry picked from commit 62cf036)
(cherry picked from commit 6d88701)
(cherry picked from commit 6278315)
(cherry picked from commit 11760d3)
…pache#27258) (cherry picked from commit 8b4dce7)
Happy to review if/when this PR gets rebased |
SUMMARY
When the
PRESTO_EXPAND_DATA
feature flag was set toTrue
, we've noticed that some of our Presto tables were able to be previewed, while some other failed, with the following tracebackThe discriminating factor turned out to be the presence of
map
fields in nested rows, causing theparent_data_type
argument passed toPrestoEngineSpec._parse_structural_column
classmethod to be mis-parsed, as demonstrated:The
"request_headers" map(varchar, varchar)
token was parsed into['"request_headers" map', 'varchar, varchar))']
, which later on breaks on['varchar, varchar))'][1]
, as demonstrated in the previous traceback.I've decided to fix this bug by removing the key/value types of the map itself from the
parent_data_type
string, as assuming they were parsed correctly, the types do not seem to be used by SQLAlchemy anyway:This way, the
parent_data_type
string'row("protocol" varchar, "request_headers" map(varchar, varchar))'
becomes'row("protocol" varchar, "request_headers" map)', which gets successfully parsed as
['test_column row', '"protocol" varchar, "request_headers" map)']`.Note: We also had the same issue with
array
types as well. A Prestoarray(string)
column corresponds to asqlalchemy.types.ARRAY
type, so we can drop the(string)
part of the column type.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
Preview the schema of a Presto table contained nested rows with the
PRESTO_EXPAND_DATA
feature flag enabled.ADDITIONAL INFORMATION