-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to the Merge routine #5778
Conversation
@@ -1152,7 +1152,7 @@ public void TestGroupBy() | |||
if (originalColumn.Name == "Bool") | |||
continue; | |||
DataFrameColumn headColumn = head.Columns[originalColumn.Name]; | |||
Assert.Equal(originalColumn[5], headColumn[verify[5]]); | |||
Assert.Equal(originalColumn[7], headColumn[verify[5]]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5
-> 7
because null
is not equivalent to default
in the GroupBy
method anymore
@@ -1658,7 +1658,7 @@ public void TestMerge() | |||
Assert.Equal(16, merge.Rows.Count); | |||
Assert.Equal(merge.Columns.Count, left.Columns.Count + right.Columns.Count); | |||
Assert.Null(merge.Columns["Int_left"][12]); | |||
Assert.Null(merge.Columns["Int_left"][5]); | |||
Assert.Null(merge.Columns["Int_left"][15]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5
-> 15
because we don't preserve the ordering of rows in OuterJoin
, so rows with null
values in the Column
we are merging on will always go to the end. I don't think this needs to be a requirement. If somebody requests it at some point, we can think about preserving the ordering then.
Codecov Report
@@ Coverage Diff @@
## main #5778 +/- ##
==========================================
+ Coverage 68.33% 68.40% +0.07%
==========================================
Files 1131 1131
Lines 241008 241156 +148
Branches 25025 25033 +8
==========================================
+ Hits 164686 164957 +271
+ Misses 69824 69716 -108
+ Partials 6498 6483 -15
Flags with carried forward coverage won't be shown. Click here to find out more.
|
for (long i = 0; i < matchedFullRows.Length; i++) | ||
{ | ||
int rowIndex = matchedFullRows[i]; | ||
MatchRowsOnMergedDataFrame(merge, left, right, rowIndex, rowIndex, rowIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually a part of TestMergeEdgeCases_Left, but I added one now anyway
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
Currently, we don't differentiate between
null
values anddefault
values in columns. As a result, these values are considered equivalent in theMerge
routine. This PR fixes that by handling them separately. Also added unit tests so we don't regress in the future.Fixes #5765 and contributes to #5691