Closed
Description
xref #10577 (has test for duplicates with empty data)
I don't expect this is the correct behavior, although it's always possible I'm doing something wrong. Importing data using the names
keyword will clobber the values of columns where the name is duplicated. For example:
from StringIO import StringIO
import pandas as pd
data = """a,1
b,2
c,3"""
names = ['field', 'field']
print pd.read_csv(StringIO(data), names=names, mangle_dupe_cols=True)
print pd.read_csv(StringIO(data), names=names, mangle_dupe_cols=False)
returns
field field
0 1 1
1 2 2
2 3 3
field field
0 1 1
1 2 2
2 3 3
However, this produces the correct result:
df = pd.read_csv(StringIO(data), header=None)
df.columns = names
print df
field field
0 a 1
1 b 2
2 c 3
Interestingly, it works if the field names are in the header:
data_with_header = "field,field\n" + data
print pd.read_csv(StringIO(data_with_header))
field field.1
0 a 1
1 b 2
2 c 3
Is this a bug or am I doing something wrong?