Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: SQL Lab sorting of non-numbers #18006

Merged
merged 1 commit into from
Jan 12, 2022

Conversation

etr2460
Copy link
Member

@etr2460 etr2460 commented Jan 11, 2022

SUMMARY

#17573 fixed the sorting of floating point numbers, but broke sorting strings that start with numbers. The most common of these are dses like 2022-01-01, and parseFloat("2022-01-01") === 2022, so sorting got all janky.

I've fixed this by ensuring that we only try to parse numbers if they pass a regex that matches JS's format of numbers.

Note that this solution isn't 100% accurate, and will still cause undefined behavior when sorting a list containing both numeric and non numeric strings. I don't have a good answer for fixing this, but perhaps in the future we'll need to refactor this further. I'd imagine a per column parameter that defines the sorting function (convert to number and then sort, do string sort, sort in hexadecimal because all the strings start with 0x and contain only 0-f, something else) but that was very much out of scope of this bug fix.

TESTING INSTRUCTIONS

New unit test (and ensure the unit test added in #17573 still works)

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

to: @lyndsiWilliams @eschutho @graceguo-supercat @ktmud

@etr2460 etr2460 force-pushed the erik-ritter--fix-sqllab-sorting branch 2 times, most recently from 27549a7 to 4fb1780 Compare January 11, 2022 22:45
@codecov
Copy link

codecov bot commented Jan 11, 2022

Codecov Report

Merging #18006 (2159734) into master (bdc35a2) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #18006   +/-   ##
=======================================
  Coverage   66.32%   66.32%           
=======================================
  Files        1569     1569           
  Lines       61658    61661    +3     
  Branches     6232     6233    +1     
=======================================
+ Hits        40892    40895    +3     
  Misses      19167    19167           
  Partials     1599     1599           
Flag Coverage Δ
javascript 50.87% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...src/components/FilterableTable/FilterableTable.tsx 72.02% <100.00%> (+0.50%) ⬆️
...s/legacy-plugin-chart-country-map/src/countries.ts 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bdc35a2...2159734. Read the comment docs.

Copy link
Member

@ktmud ktmud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a small suggestion.

Did you check whether the performance is acceptable when table has the maximum number of rows? I'd imagine regex match for every row to be quite expensive. In regards to the RegEx used, it's probably more important what the database returns other than what JavaScript accepts. For example, some database may return Infinity as Inf.

I think in long run we would want to apply different sorting function based on column types. We should pass down the column type prop to this component and apply number parsing only to numeric columns. We may also want to migrate this table to react-table as well. It has a sortType configuration we can use for free.

// exponential notation, NaN, and Infinity.
// See https://stackoverflow.com/a/30987109 for more details
const onlyNumberRegEx =
/^(NaN|-?((\d*\.\d+|\d+)([Ee][+-]?\d+)?|Infinity))$/;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this to a global constant or create a helper function?

@etr2460
Copy link
Member Author

etr2460 commented Jan 12, 2022

Did you check whether the performance is acceptable when table has the maximum number of rows? I'd imagine regex match for every row to be quite expensive. In regards to the RegEx used, it's probably more important what the database returns other than what JavaScript accepts. For example, some database may return Infinity as Inf.

@ktmud I did not check yet, although aren't regexes pretty fast usually? I suppose i could do a faster check first before the more expensive regex, but my thought was that the parseFloat was already taking a while (that we run on each iteration) so maybe it didn't matter. Regardless I'll test a large result set and report back.

As for matching what databases may return, I suppose i could try to handle everything every database returns, but i figured having something consistent with a single language would be worthwhile. If this was breaking for you with a specific database, then you could always write if(column_name = Inf, 'Infinity', column_name) as column_name in your query, while if we don't have this handling, it could never sort properly (or only would if you did something like if(column_name = Inf, 99999999999999999999, column_name) as column_name

So in summary, i'll test perf and if that looks good, will merge (with the moving of the regex to a global constant, sorry for not doing that in the first place)

Copy link
Member

@eschutho eschutho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching/fixing this issue @etr2460!

@etr2460
Copy link
Member Author

etr2460 commented Jan 12, 2022

/testenv up

@github-actions
Copy link
Contributor

@etr2460 Ephemeral environment spinning up at http://54.69.38.190:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

@etr2460
Copy link
Member Author

etr2460 commented Jan 12, 2022

verified that 10k rows still sort pretty much instantly, both for numeric and non numeric columns

@etr2460 etr2460 force-pushed the erik-ritter--fix-sqllab-sorting branch from 4fb1780 to 2159734 Compare January 12, 2022 18:28
@etr2460 etr2460 merged commit 27000da into apache:master Jan 12, 2022
@github-actions
Copy link
Contributor

Ephemeral environment shutdown and build artifacts deleted.

eschutho pushed a commit that referenced this pull request Jan 27, 2022
shcoderAlex pushed a commit to casual-precision/superset that referenced this pull request Feb 7, 2022
bwang221 pushed a commit to casual-precision/superset that referenced this pull request Feb 10, 2022
@mistercrunch mistercrunch added 🍒 1.4.1 🍒 1.4.2 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.5.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M v1.4.1 🍒 1.4.1 🍒 1.4.2 🚢 1.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants