-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: parsing multi-column headers in read_csv (GH6051) #6170
Conversation
@waitingkuo IIUC, the last case is to handle an 'incorrect' multi-index? |
@jreback Yes, multi-index cannot be appended the sequence number in the original version. |
I think at the very least this should be a warning (and then effectively |
Should it also give a warning for the single column case? |
can you post what that case is? |
@waitingkuo can you rebase and see if their are any more issues? |
can you rebase this? |
@waitingkuo what's the state of this? |
@jreback, |
I think you need to give warning for your case 2) above |
What if I use |
I think you still should do a warning, because you are 'expectnig' a multi-index on the columns, but you won't get one because of the incorrect format. (mangle is the default), kind of an older option (now that can parse multi-column). It was supposed to be defaulted to False in future versions but that broke things so didn't go forward with that. |
now it warns if duplicated columns have been mangled woo.csv
woo_mul.csv
|
4,5,6 | ||
7,8,9 | ||
""" | ||
df = self.read_csv(StringIO(data), header=[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the assert_produces_warning
here to assert that you are actually showing the warning (around the read_csv)
also need a one-liner in release notes, API change (and same line in v0.14.0.txt) |
Could you give me a hand? i use
But seems it doesn't catch my warning
Anything went wrong? That warning is warned in parser.pyx, a cython file. Does it matter? |
add this to the
this will then raise on the error and you can see where its coming from |
@waitingkuo update on this? |
@waitingkuo can you address this soon? |
looks good, can you rebase, and add a whatsnew (0.14.1) entry (bug fixes section) |
can you address the comments above. thanks |
@waitingkuo status? |
closing as stale |
@jreback : I'd like to maybe resurrect this given that I've had my share of fun with duplicate columns and indices (which appears to be the cause of this behavior). What do you think of these changes now? |
i thought this is much better nowadays - sure go for revival |
Sadly not...the test written here fails on |
Further testing indicates that this PR was not as ready to be merged as previously thought. It ran into issues when nested duplicates were encountered. Also, no support for the Python engine was provided. |
closes #6051
Bug:
mangle_dupe_cols
cannot work while the header is a list.Originally,
has_mi_columns
will be set as 1 while the header is a list. And themangle_dupe_cols
things would not work whenhas_mi_columns
== 1.This pull request is to
has_mi_columns
will be set as 0, and the header will do the originalmangle_dupe_cols
things as the header is a single integer.For example
would be converted to