-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Fixing aggregation protocol to be appropriately associative #50757
Conversation
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
…impls Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Fixed Pandas/Arrow impls; Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
8045b91
to
874ec56
Compare
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
cae46a6
to
746b480
Compare
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
…ntainer) Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
python/ray/data/aggregate.py
Outdated
""" | ||
|
||
def _safe_zero_factory(_): | ||
if ignore_nulls: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, we can move ignore_null
check out of this function. So it doesn't need to be checked for each single record
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! Fixed for combination seq as well
Unique: lambda col, ignore_nulls: set(pac.unique(col).to_pylist()), | ||
} | ||
|
||
return _map[agg_cls] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need add the following at the end of this file.
if __name__ == "__main__":
sys.exit(pytest.main(["-v", __file__]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, keep forgetting about it
return _safe_finalize | ||
|
||
|
||
def _null_safe_combine( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add some unit test for these wrappers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all tested explicitly in test_null_safe_aggregation_protocol
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
…ecution Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Why are these changes needed?
This is a follow-up for #50585, addressing an issue of its combination sequence not being appropriately associative.
Primary hurdle for implementing properly associative aggregation in the presence null values is to be able to distinguish between:
To achieve that in the presence of null values following semantic is established.
Case of ignore_nulls=True:
Combination protocol
Identity (zero) value is
None
(ie simulating 'empty' sequence)Case of ignore_nulls=False:
Combination protocol
NOT ignoring nulls), return it
b/c we're NOT ignoring nulls), return it
Identity (zero) value is an actual zero value for the operation (0 for count, sum, -inf for max, +inf for min, etc)
Changes
_Optional
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.