-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Aggregate Column #3349
Conversation
Still to support Order By on First/Last Need to refactor tests and Table calls
b2726c6
to
761aff6
Compare
distribution/lib/Standard/Table/0.0.0-dev/src/Data/Aggregate_Column.enso
Outdated
Show resolved
Hide resolved
distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Aggregate_Column_Helper.enso
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, some thoughts on warning-handling.
I think we should replace create_closure
for resolve_columns
, if we chose to go that way - as would be good to be consistent in how we do it across functions/sub-libs.
distribution/lib/Standard/Table/0.0.0-dev/src/Data/Aggregate_Column.enso
Show resolved
Hide resolved
distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Aggregate_Column_Aggregator.enso
Outdated
Show resolved
Hide resolved
distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Aggregate_Column_Aggregator.enso
Show resolved
Hide resolved
## Given a Table and a Column create an aggregator | ||
new : Table->Aggregate_Column->Aggregate_Column_Aggregator | ||
new table column = | ||
create_closure c function:(Column->Any->Integer->Any) = function (column.resolve_column table c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmmm, I don't like this name - it doesn't really tell what it does.
More importantly - this function is not necessary anymore, at least once we also merge the resolve_columns
.
Maybe you could borrow the resolve_columns
from my PR to have it here already. Because I think we both agreed that the pattern to use here is to first do aggregate_column.resolve_columns
and then all the other helper functions can assume that they only get these already-resolved aggregates - and thus the create_closure
function would be obsolete because it would boil down to function c
.
Test.specify "should raise warnings when an invalid column index and no valid output" <| | ||
action = table.aggregate [Group_By -3] on_problems=_ | ||
problems = [Column_Indexes_Out_Of_Range [-3], No_Output_Columns] | ||
tester = expect_column_names [] | ||
Problems.test_problem_handling action problems tester | ||
|
||
Test.specify "should raise a warning when an invalid output name" <| | ||
action = table.aggregate [Group_By "Index" ""] on_problems=_ | ||
problems = [Invalid_Output_Column_Names [""]] | ||
tester = expect_column_names ["Column"] | ||
Problems.test_problem_handling action problems tester | ||
|
||
Test.specify "should raise a warning when a duplicate column name" <| | ||
action = table.aggregate [Group_By "Index", Group_By 0] on_problems=_ | ||
problems = [Duplicate_Output_Column_Names ["Index"]] | ||
tester = expect_column_names ["Index", "Index_1"] | ||
Problems.test_problem_handling action problems tester | ||
|
||
Test.specify "should raise a warning when a duplicate column name and rename default names first" <| | ||
action = table.aggregate [Group_By "Value", Group_By "Index" "Value"] on_problems=_ | ||
problems = [Duplicate_Output_Column_Names ["Value"]] | ||
tester = expect_column_names ["Value_1", "Value"] | ||
Problems.test_problem_handling action problems tester |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seem to be no tests for aggregates besides the special one Group_By
and it is a bit different from others, so I think would be worth to make some tests use Sum
or Count
.
More importantly - what do we do in situations like [Count_Distinct "Foo" ignore_nothing=True, Count_Distinct "Foo" ignore_nothing=False]
? They get the same name (fair, makes sense) but compute conceptually different things - so the warning may be confusing to users.
Pull Request Description
Checklist
Please include the following checklist in your PR:
./run dist
and./run watch
.