Skip to content

read_csv clobbers values of columns with duplicate names #9424

Closed
@stevenmanton

Description

@stevenmanton

xref #10577 (has test for duplicates with empty data)

I don't expect this is the correct behavior, although it's always possible I'm doing something wrong. Importing data using the names keyword will clobber the values of columns where the name is duplicated. For example:

from StringIO import StringIO
import pandas as pd

data = """a,1
b,2
c,3"""
names = ['field', 'field']

print pd.read_csv(StringIO(data), names=names, mangle_dupe_cols=True)
print pd.read_csv(StringIO(data), names=names, mangle_dupe_cols=False)

returns

   field  field
0      1      1
1      2      2
2      3      3
   field  field
0      1      1
1      2      2
2      3      3

However, this produces the correct result:

df = pd.read_csv(StringIO(data), header=None)
df.columns = names
print df
   field  field
0      a      1
1      b      2
2      c      3

Interestingly, it works if the field names are in the header:

data_with_header = "field,field\n" + data
print pd.read_csv(StringIO(data_with_header))
  field  field.1
0     a        1
1     b        2
2     c        3

Is this a bug or am I doing something wrong?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions