-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(mysql): handle string typed decimal results #24241
Conversation
8899ca2
to
f5a2f73
Compare
Codecov Report
@@ Coverage Diff @@
## master #24241 +/- ##
=======================================
Coverage 68.30% 68.31%
=======================================
Files 1957 1957
Lines 75594 75611 +17
Branches 8224 8224
=======================================
+ Hits 51637 51654 +17
Misses 21849 21849
Partials 2108 2108
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
[["1.23456", "abc"]], | ||
[("dec", "decimal(12,6)"), ("str", "varchar(3)")], | ||
[(Decimal("1.23456"), "abc")], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice how we're changing from list[list[Any]]
to list[tuple[Any, ...]]
here - this is because this is what SQLA drivers should return. This will save us from having to do the conversion later here:
superset/superset/result_set.py
Lines 132 to 134 in 50535e4
# only do expensive recasting if datatype is not standard list of tuples | |
if data and (not isinstance(data, list) or not isinstance(data[0], tuple)): | |
data = [tuple(row) for row in data] |
for row in description | ||
if ( | ||
func := cls.column_type_mutators.get( | ||
type(cls.get_sqla_column_type(cls.get_datatype(row[1]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment that explains the logic here? This dict comprehension might be hard to understand without the context of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM + a small nitpick about adding comments
superset/db_engine_specs/base.py
Outdated
) | ||
) | ||
} | ||
if column_type_mutators: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically decimal is a fixed point data structure but native float
in Python is not fixed point data, so some drivers return a string to represent fixed point. column_type_mutators
adds some extra transformation on the decimal column so the query performance might become lower than before.
From the screenshot before and after, I saw the coltypes
field changed from 1
-> 0
, do we have some other approach which that just change the coltypes
instead to mutate entire values of column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch about the coltypes
, let me investigate why that's gone missing.
Wrt to the query performance hit, another option is to make the frontend more resilient to receiving various types. In this case we could cast the value to number
in the frontend if the coltype is GenericDataType.NUMERIC
. However, I'd prefer to do this type of transformation in the backend to ensure that the plugins don't have to deal with all the nuances that SQLAlchemy throws our way.
I'd be curious to hear other peoples thoughts, @michael-s-molina @john-bodley any thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It hit me, that without this change, the linked issue won't be fixed, as CSV export happens purely in the backend. With this new pattern we could even normalize non-standard timestamps early in the flow, eliminating the need for doing it in the frontend, further decoupling the quirks of the analytical databases from the downstream logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, I just noticed we already have logic like this in the codebase:
superset/superset/db_engine_specs/ocient.py
Lines 330 to 334 in 50535e4
# Peek at the schema to determine which column values, if any, | |
# require sanitization. | |
columns_to_sanitize: List[PlacedSanitizeFunc] = _find_columns_to_sanitize( | |
cursor | |
) |
@villebro for Trino the Furthermore, regarding MySQL, doesn't the handling of the |
Oh nice, thanks for the heads up. Let me confirm that by bumping the version 👍 I notice the original PR was changing other return types, too, have we confirmed that Superset handles them well?
That may well be - let me check with the author of the issue which dialect they were using. |
Oh nice, I didn't know it had this many improvements. Let me also check if those pesky non-standard timestamp strings are now replaced with proper datetimes that I had to work around in #23339 |
@villebro do you plan on resurrecting this PR? |
@john-bodley I do, if there's still apetite for it. But as far as Trino is concerned, this doesn't seem needed anymore. So not sure if we want to introduce this additional complexity if it's not really needed. |
FYI @john-bodley I removed the Trino logic as it's now redundant, but for MySQL this does seem to be needed. I'm going to follow-up with the author of the Ocient connector to see if we can migrate their custom mutator logic to this more standardized model. |
(cherry picked from commit 7eab59a)
* fix: is_select with UNION (apache#25290) (cherry picked from commit bb002d6) * fix: Add explicit ON DELETE CASCADE for dashboard_roles (apache#25320) (cherry picked from commit d54e827) * fix(chart): Supporting custom SQL as temporal x-axis column with filter (apache#25126) Co-authored-by: Kamil Gabryjelski <kamil.gabryjelski@gmail.com> * fix: Use RLS clause instead of ID for cache key (apache#25229) (cherry picked from commit fba66c6) * fix: Improve the reliability of alerts & reports (apache#25239) (cherry picked from commit f672d5d) * fix: DashboardRoles cascade operation (apache#25349) (cherry picked from commit a971a28) * fix: datetime with timezone excel export (apache#25318) Co-authored-by: Michael S. Molina <70410625+michael-s-molina@users.noreply.github.com> (cherry picked from commit 5ebcd2a) * fix: Workaround for Cypress ECONNRESET error (apache#25399) (cherry picked from commit d76ff39) * fix(sqllab): invalid persisted tab state (apache#25308) (apache#25398) * fix: Rename on_delete parameter to ondelete (apache#25424) (cherry picked from commit 893b45f) * fix: preventing save button from flickering in SQL Lab (apache#25106) (cherry picked from commit 296ff17) * fix: chart import (apache#25425) (cherry picked from commit a4d8f36) * fix: swagger UI CSP error (apache#25368) (cherry picked from commit 1716b9f) * fix: smarter date formatter (apache#25404) (cherry picked from commit f0080f9) * fix(sqllab): invalid start date (apache#25437) * fix(nativeFilters): Speed up native filters by removing unnecessary rerenders (apache#25282) Co-authored-by: JUST.in DO IT <justin.park@airbnb.com> (cherry picked from commit a0eeb4d) * fix(SqlLab): make icon placement even (apache#25372) (cherry picked from commit 11b49a6) * fix: Duplicate items when pasting into Select (apache#25447) (cherry picked from commit 7cf96cd) * fix: update the SQLAlchemy model definition at json column for Log table (apache#25445) (cherry picked from commit e83a76a) * fix(helm chart): set chart appVersion to 3.0.0 (apache#25373) * fix(mysql): handle string typed decimal results (apache#24241) (cherry picked from commit 7eab59a) * fix: Styles not loading because of faulty CSP setting (apache#25468) (cherry picked from commit 0cebffd) * fix(sqllab): error with lazy_gettext for tab titles (apache#25469) (cherry picked from commit ddde178) * fix: Address Mypy issue which is causing CI to fail (apache#25494) (cherry picked from commit 36ed617) * chore: Adds 3.0.1 CHANGELOG * fix: Unable to sync columns when database or dataset name contains `+` (apache#25390) (cherry picked from commit dbe0838) * fix(sqllab): Broken query containing 'children' (apache#25490) (cherry picked from commit b92957e) * chore: Expand error detail on screencapture (apache#25519) (cherry picked from commit ba541e8) * fix: tags permissions error message (apache#25516) (cherry picked from commit 50b0816) * fix: Apply normalization to all dttm columns (apache#25147) (cherry picked from commit 58fcd29) * fix: REST API CSRF exempt list (apache#25590) (cherry picked from commit 549abb5) * fix(RLS): Fix Info Tooltip + Button Alignment on RLS Modal (apache#25400) (cherry picked from commit a6d0e6f) * fix: thubmnails loading - Talisman default config (apache#25486) (cherry picked from commit 52f631a) * fix(Presto): catch DatabaseError when testing Presto views (apache#25559) Co-authored-by: Rui Zhao <zhaorui@dropbox.com> (cherry picked from commit be3714e) * fix(Charts): Set max row limit + removed the option to use an empty row limit value (apache#25579) (cherry picked from commit f556ef5) * fix(window): unavailable localStorage and sessionStorage (apache#25599) * fix: finestTemporalGrainFormatter (apache#25618) (cherry picked from commit 62bffaf) * fix: revert fix(sqllab): Force trino client async execution (apache#24859) (apache#25541) (cherry picked from commit e56e0de) * chore: Updates 3.0.1 CHANGELOG * fix(sqllab): Mistitled for new tab after rename (apache#25523) (cherry picked from commit a520124) * fix(sqllab): template validation error within comments (apache#25626) (cherry picked from commit b370c66) * fix: avoid 500 errors with SQLLAB_BACKEND_PERSISTENCE (apache#25553) (cherry picked from commit 99f79f5) * fix(import): Make sure query context is overwritten for overwriting imports (apache#25493) (cherry picked from commit a0a0d80) * fix: permalink save/overwrites in explore (apache#25112) Co-authored-by: Elizabeth Thompson <eschutho@gmail.com> (cherry picked from commit e58a3ab) * fix(header navlinks): link navlinks to path prefix (apache#25495) (cherry picked from commit 51c56dd) * fix: improve upload ZIP file validation (apache#25658) * fix: warning of nth-child (apache#23638) (cherry picked from commit 16cc089) * fix(dremio): Fixes issue with Dremio SQL generation for Charts with Series Limit (apache#25657) (cherry picked from commit be82657) --------- Co-authored-by: Beto Dealmeida <roberto@dealmeida.net> Co-authored-by: John Bodley <4567245+john-bodley@users.noreply.github.com> Co-authored-by: Zef Lin <zef@preset.io> Co-authored-by: Kamil Gabryjelski <kamil.gabryjelski@gmail.com> Co-authored-by: Jack Fragassi <jfragassi98@gmail.com> Co-authored-by: Michael S. Molina <70410625+michael-s-molina@users.noreply.github.com> Co-authored-by: JUST.in DO IT <justin.park@airbnb.com> Co-authored-by: Jack <41238731+fisjac@users.noreply.github.com> Co-authored-by: Daniel Vaz Gaspar <danielvazgaspar@gmail.com> Co-authored-by: Stepan <66589759+Always-prog@users.noreply.github.com> Co-authored-by: Corbin Bullard <corbindbullard@gmail.com> Co-authored-by: Gyuil Han <cnabro91@gmail.com> Co-authored-by: Celalettin Calis <celalettin1286@gmail.com> Co-authored-by: Ville Brofeldt <33317356+villebro@users.noreply.github.com> Co-authored-by: ʈᵃᵢ <tdupreetan@gmail.com> Co-authored-by: Michael S. Molina <michael.s.molina@gmail.com> Co-authored-by: mapledan <mapledan829@gmail.com> Co-authored-by: Igor Khrol <khroliz@gmail.com> Co-authored-by: Rui Zhao <105950525+zhaorui2022@users.noreply.github.com> Co-authored-by: Fabien <18534166+frassinier@users.noreply.github.com> Co-authored-by: Hugh A. Miles II <hughmil3s@gmail.com> Co-authored-by: OskarNS <soerensen.oskar@gmail.com>
SUMMARY
MySQL returns
DECIMAL
typed values as strings, rather thanDecimal
orfloat
. This causes issues in the frontend, as most plugins expectGenericDataType.NUMERIC
values to be of typenumber
, notstring
. To get around this we introduce a new propertycolumn_type_mutators
onBaseEngineSpec
, where type-specific callbacks can be introduced to mutate the results coming back from the database. Here we implement logic on the MySQL spec to change string-based decimals toDecimal
, similar to what Postgres already does automatically (I assume most connectors do the same).The new logic can be expensive for big datasets, due to the need to iterate over the full result set. For this reason the for-loop is only triggered if there are columns present that need to be mutated.
AFTER
Now the Big Number chart works as expected on Trino Decimal types (notice the value is now
number
in the response payload):BEFORE
Previously Big Number chart would display "No data" due to the data type being
string
:TESTING INSTRUCTIONS
ADDITIONAL INFORMATION