Skip to content

Conversation

@hedeershowk
Copy link
Contributor

@hedeershowk hedeershowk commented Sep 13, 2023

Any suggestions where to add a test for this bug (and fix)?

edit: took a shot and put it in tests/io/parser/test_header.py

@mroeschke mroeschke added IO CSV read_csv, to_csv Arrow pyarrow functionality labels Sep 18, 2023
StringIO(data),
header=None,
usecols=[0, 1],
dtype="object",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test with string[pyarrow] here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure things. It works fine so long as I make the comparison dataframe string[pyarrow] too of course:

        StringIO(data),
        header=None,
        usecols=[0, 1],
        dtype="string[pyarrow]",
        dtype_backend="pyarrow",
        engine="pyarrow",
    )
    expected = DataFrame([
        ["a", "i"], ["b", "j"]], 
        dtype="string[pyarrow]"
    )
    tm.assert_frame_equal(result, expected)

I'll update the PR with the change.

hedeershowk and others added 3 commits September 19, 2023 22:36
… `DataFrame` to ignore passed arguments) (pandas-dev#55089)

* fixes pandas-dev#55009

* update documentation

* write documentation

* add test

* change formatting

* cite DataDrame directly in docs

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@mroeschke
Copy link
Member

Please review and address the failed pre-commit failure

@hedeershowk
Copy link
Contributor Author

Please review and address the failed pre-commit failure

wasn't seeing any issue in my local pre-commit for some reason

@mroeschke mroeschke added this to the 2.2 milestone Sep 27, 2023
@mroeschke mroeschke merged commit 824a273 into pandas-dev:main Sep 27, 2023
@mroeschke
Copy link
Member

Thanks @hedeershowk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arrow pyarrow functionality IO CSV read_csv, to_csv

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: usecols in pandas.read_csv has incorrect behavior when using pyarrow engine

3 participants