Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data analysts should be able to reorder columns into name order using sort_columns functions #3250

Merged
merged 8 commits into from
Feb 8, 2022

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Feb 3, 2022

Pull Request Description

Closes https://www.pivotaltracker.com/story/show/181034280

Important Notes

Checklist

Please include the following checklist in your PR:

  • The documentation has been updated if necessary.
  • All code conforms to the Scala, Java, and Rust style guides.
  • All documentation and configuration conforms to the markdown and YAML style guides.
  • All code has been tested where possible.

@radeusgd radeusgd force-pushed the wip/radeusgd/sort-columns-181034280 branch 2 times, most recently from b8191c0 to 80e2020 Compare February 4, 2022 15:47
@radeusgd radeusgd marked this pull request as ready for review February 4, 2022 15:47
Copy link
Member

@wdanilo wdanilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a bunch of comments, but they are applicable to more places. Please be sure that all TODO comments are always well described and linked to the right place.

@@ -171,7 +172,7 @@ type Table
table.remove_columns (By_Name.new ["bar", "foo"])

## TODO [RW] default arguments do not work on atoms, once this is fixed,
the above should be replaced with just `Matching.Exact`.
the above should be replaced with just `By_Name`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these TODOs do not have links to tickets. Please add them. Every TODO has to have link to a relevant ticket.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link is below: See: https://github.com/enso-org/enso/issues/1600.
Or should we create a Pivotal ticket for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link is below: See: https://github.com/enso-org/enso/issues/1600.
Or should we create a Pivotal ticket for this?

Copy link
Member Author

@radeusgd radeusgd Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link is just below, as part of this comment: See: https://github.com/enso-org/enso/issues/1600.
Or should we link to a Pivotal ticket instead? (I think one does not exist for this issue currently, so shall it be created?)

@@ -406,6 +407,32 @@ type Table
new_columns = Table_Helpers.reorder_columns internal_columns=this.columns selector=columns position=position on_problems=on_problems warnings=warnings
here.new new_columns

## Returns a new table with the columns sorted by name according to the
specified sort method. By default, sorting will be according to
case-sensitive ascending order based on code points.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, in our design document we've been talking aobut grapheme clusters, not code points. Sorting by code points will work bad, especially for international cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. Well, what this option does is using the default _.compare_to on strings.
IMO changing the way we compare Texts is out of scope of this change (which just adds a new function on Tables), so I'd suggest modifying the default comparison method as part of the work on improving the Text API.
In that case that comment would stay as-is for now and be updated accordingly once that happens. Does that work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the docs to say that the order is according to Text.compare_to.

Will create a separate task to ensure that Text.compare_to is according to grapheme clusters.

@radeusgd radeusgd force-pushed the wip/radeusgd/sort-columns-181034280 branch from 80e2020 to c174fcd Compare February 7, 2022 15:04
@radeusgd radeusgd requested a review from wdanilo February 7, 2022 15:48
@jdunkerley
Copy link
Member

jdunkerley commented Feb 7, 2022

  • Need a ticket for Integer.parse
  • Specific need for is_digit
    • Need a Character class which is an extended grapheme cluster.
    • Need to have various Unicode is statements.
  • Add a ticket for a faster non-regex compare_to for natural order as current would suffer at scale.
    • a123b456d789 ==> a 123 b ..
    • 5 ==> 5
    • a vs 5 (job done)
    • Currently need to compile a regex every time and materialise entire split

@jdunkerley jdunkerley self-requested a review February 7, 2022 16:21
@radeusgd radeusgd force-pushed the wip/radeusgd/sort-columns-181034280 branch from f296c14 to 2dc6ef4 Compare February 7, 2022 17:28
@radeusgd radeusgd force-pushed the wip/radeusgd/sort-columns-181034280 branch from 2dc6ef4 to ed7affb Compare February 8, 2022 11:08
@mwu-tow mwu-tow merged commit 8b24336 into develop Feb 8, 2022
@mwu-tow mwu-tow deleted the wip/radeusgd/sort-columns-181034280 branch February 8, 2022 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants