Skip to content

Odd behavior from read_csv with na_values set to non-string values #3611

@rhstanton

Description

@rhstanton

read_csv behaves oddly when na_values is set to non-string values. Sometimes
it correctly replaces the assigned number with NaN, and sometimes it doesn't. Here are some examples. Note in particular the different behavior of the last two statements:

Create file

df = DataFrame({'A' : [-999, 2, 3], 'B' : [1.2, -999, 4.5]})
df.to_csv('test2.csv', sep=' ', index=False)

print read_csv('test2.csv', sep= ' ', header=0, na_values=[-999])


A B
0 NaN 1.2
1 2 -999.0

2 3 4.5

print read_csv('test2.csv', sep= ' ', header=0, na_values=[-999.0])


A B
0 -999 1.2
1 2 NaN

2 3 4.5

print read_csv('test2.csv', sep= ' ', header=0, na_values=[-999.0,-999])


A B
0 -999 1.2
1 2 NaN

2 3 4.5

print read_csv('test2.csv', sep= ' ', header=0, na_values=[-999,-999.0])


A B
0 NaN 1.2
1 2 -999.0

2 3 4.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO DataIO issues that don't fit into a more specific label

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions