-
Couldn't load subscription status.
- Fork 3.9k
ARROW-1976: [Python] Handling unicode pandas columns on parquet.read_table #1476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/pyarrow/pandas_compat.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be six.text_type so that we get unicode in Python 2. Probably, using .decode('utf-8') might also be the better option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also frombytes in pyarrow.compat
|
Thanks @xhochy, fixed! |
python/pyarrow/pandas_compat.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frombytes works only on bytes. Thus the above code is valid in Python 2 but breaks the unittests for Python 3. Removing the str should fix this.
97f003b to
e95b5f1
Compare
|
I'm looking at this patch to get it passing |
|
Sorry @wesm! Ths totally slipped my mind! |
|
I'd like to help. If it's fine with @Licht-T, I can pull his branch and create a new PR to add the required changes there. |
|
Together with the changes from @simnyatsanga this is good to go. |
Removing additional instances of using frombytes with str. Removing additional instances of using frombytes with str.
15a2366 to
0a12652
Compare
|
Cool, I just merged the changes, will await the CI to run |
|
This is still failing |
|
I'm looking at getting the failing tests to pass. Specifically one of the failing tests looks like this The pandas roundtrip is failing in this branch on this line: https://github.com/apache/arrow/blob/master/python/pyarrow/pandas_compat.py#L166 |
|
@Licht-T @wesm @simnyatsanga I'm taking a look at this now. |
|
This is failing because of an assumption about the behavior of |
This closes ARROW-1976.