Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise error in read_csv when arguments header and prefix both are not None #31383

Merged
merged 17 commits into from
Feb 3, 2020

Conversation

rushabh-v
Copy link
Contributor

@rushabh-v rushabh-v commented Jan 28, 2020

@simonjayhawkins simonjayhawkins added Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv labels Jan 28, 2020
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rushabh-v. can you add a what's new note. What's the current behaviour? Does this need a deprecation cycle?

pandas/tests/io/parser/test_header.py Outdated Show resolved Hide resolved
pandas/tests/io/parser/test_header.py Outdated Show resolved Hide resolved
@rushabh-v
Copy link
Contributor Author

rushabh-v commented Jan 28, 2020

All checks successful.
No, in my view it doesn't need depreciation cycle.

And I have made all the requested changes @simonjayhawkins

@simonjayhawkins
Copy link
Member

No, it doesn't need depreciation cycle.

IIUC then currently prefix is ignored if header passed.

>>> from io import StringIO
>>> s = StringIO("0,1\n2,3")
>>>
>>> import pandas as pd
>>> pd.__version__
'1.0.0rc0+228.g4edcc5541'
>>>
>>> pd.read_csv(s, header=0, prefix="_X")
   0  1
0  2  3
>>>

This would now raise a ValueError.

IMO this is a breaking change. see what others think.

@rushabh-v
Copy link
Contributor Author

IMO this is a breaking change

oh, I think you are right.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. The intention here is to raise a ValueError (its a bug and confusing to silently ignore)

@TomAugspurger

pandas/io/parsers.py Outdated Show resolved Hide resolved
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look OK, though the test is likely in the wrong file. Should be in pandas/tests/io/parser/test_common.py probably.

@@ -575,6 +575,13 @@ def test_to_csv_headers(self):
recons.reset_index(inplace=True)
tm.assert_frame_equal(to_df, recons)

def test_to_csv_raises_on_header_prefix(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this testing to_csv or read_csv? Looks like it's in the to_csv file, but is testing read_csv?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's testing read_csv that's a mistake in naming it.

But I actually tried putting this test in pandas/tests/io/parser/test_common.py. But the problem there is that all the tests use io.parsers.read_csv there. And io.parsers.read_csv doesn't seem to be going into CParserWrapper. So I tried putting it here. But now I am putting it back in test_common.py and will import the pandas.read_csv there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you take a look, please?

@@ -2040,6 +2040,14 @@ def test_read_csv_memory_growth_chunksize(all_parsers):
pass


def test_read_csv_raises_on_header_prefix():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the all_parsers fixture here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first tried to do that in c053a8f but looks likey parsers.read_csv is not going to CParserWrapper. Which is what we want to test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure it is, what exactly was failing? this is the standard pattern for parser tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is to check whether the error is being raised or not, and the error is raised from CParserWrapper So, the whole test fails here.

Copy link
Member

@gfyoung gfyoung Feb 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to apply your fix to all engines. You just applied to CParserWrapper it seems.

I think the fix should actually be in ParserBase so that all engines get the fix.

c053a8f is the correct test.

@jreback jreback requested a review from gfyoung February 1, 2020 15:01
Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rushabh-v : You're on the right track, but I think you tried too hard to adjust the tests to your fixes, when it really should be the other way around. See my earlier comments.

@pep8speaks
Copy link

pep8speaks commented Feb 2, 2020

Hello @rushabh-v! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-02-03 09:25:48 UTC

Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good. Just want to make sure that tests pass.

If they fail, we can update those accordingly.

@rushabh-v
Copy link
Contributor Author

@gfyoung 2 tests are failing, I don't see any obvious connection of them with this PR.
can you take a look, please? https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=27452&view=logs&j=a3a13ea8-7cf0-5bdb-71bb-6ac8830ae35c&t=add65f64-6c25-5783-8fd6-d9aa1b63d9d4&l=119

@gfyoung
Copy link
Member

gfyoung commented Feb 2, 2020

Indeed, those test failures look unrelated. I just restarted the build.

@rushabh-v
Copy link
Contributor Author

rushabh-v commented Feb 2, 2020

Indeed, those test failures look unrelated. I just restarted the build.

same test failure again.

@gfyoung
Copy link
Member

gfyoung commented Feb 2, 2020

You may need to rebase or merge. Unclear why these warnings are popping up.

cc @pandas-dev/pandas-core : is there something with matplotlib that changed recently?

@rushabh-v
Copy link
Contributor Author

rushabh-v commented Feb 2, 2020

Fails after merge too

@rushabh-v
Copy link
Contributor Author

All greens now @gfyoung

@gfyoung gfyoung merged commit a2721fd into pandas-dev:master Feb 3, 2020
@gfyoung
Copy link
Member

gfyoung commented Feb 3, 2020

Thanks @rushabh-v !

@gfyoung gfyoung added this to the 1.1 milestone Feb 3, 2020
@rushabh-v rushabh-v deleted the pre-head branch February 3, 2020 16:13
vinceatbluelabs added a commit to bluelabsio/records-mover that referenced this pull request Aug 12, 2020
Pandas 1.1 has broken Record Mover's usage of the read_csv() function by adding error checking in cases where a certain argument would be unused.

Details of the Pandas change:

* pandas-dev/pandas#27394
* pandas-dev/pandas#31383

See Records Mover test failures here: 
* https://app.circleci.com/pipelines/github/bluelabsio/records-mover/1089/workflows/e62f1cf0-f8d0-4e22-9652-112df72b02b8/jobs/9439
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.read_csv prefix parameter seems do not works
7 participants