-
Notifications
You must be signed in to change notification settings - Fork 666
FIX-#3619: Fix ValueError when doing loc on empty dataframe #3631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3631 +/- ##
===========================================
- Coverage 85.32% 36.90% -48.43%
===========================================
Files 194 181 -13
Lines 16194 15589 -605
===========================================
- Hits 13818 5753 -8065
- Misses 2376 9836 +7460
Continue to review full report at Codecov.
|
This pull request fixes 7 alerts when merging 66ae3f9b2248fb8fac3d1ceea2f2539019b30b52 into 648b6a0 - view on LGTM.com fixed alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can do something like:
if self.df.empty:
return self.df._default_to_pandas(lambda df: df.loc[key])
It's similar to your implementation, but will work on row label objects too I believe
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
83ef26e
to
3724694
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, a couple of comments:
1.) Can we do this for iloc
as well?
2.) Let's add a test to make sure we have the correct behavior
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
do the |
@dchigarev Can you re-run the TeamCity Dask test? I believe it's a flake. |
done, now it's a green |
Signed-off-by: Naren Krishna <naren@ponder.io>
@dchigarev Added tests for |
BTW, the actual problem of why the reported bug occurs is the following way of computing a column lookup for modin/modin/pandas/indexing.py Lines 698 to 701 in 4121358
Pandas backend is failing to process [] instead of a slice(None) since for non-full-axis grabs it tries to do some partitioning stuff under an empty frame and fails.
It's actually pretty lucky, that the scenario from reproducer works well for the modin/modin/pandas/indexing.py Lines 817 to 826 in 4121358
So, actually, the reason why
I'm actually working on aligning these inconsistencies in #3520 and #3513, but since they aren't merged, this all looks kinda unsafe. We actually don't know how other backends would behave on an attempt to index an empty frame since there is no essential implementation for an empty core modin frame. Maybe we should bring some safety and consistency in this PR by defaulting to pandas for both: Sorry, for probably being too picky for a small-change PR, I would also be okay with the "if it works, don't touch it" approach 😄 |
I was wondering about this myself, thanks for the explanation! Let me make that change -- I think it would be better stability-wise :) |
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the changes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Naren Krishna naren@ponder.io
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
loc
on empty dataframe #3619docs/developer/architecture.rst
is up-to-date