Skip to content

read_csv problem with delim_whitespace, skiprows and trailing spaces in skipped rows #8661

Closed
@selasley

Description

@selasley

given this input file with linefeeds indicated by

skip1<lf>
skip2<lf>
0    1    2<lf>
3    4    5<lf>

reading with read_csv() in pandas 0.15.0-42-g20be789 and python 3.4.2 works

df = pd.read_csv('test.txt', skiprows=2, delim_whitespace=True, header=None)
df
   0  1  2
0  0  1  2
1  3  4  5

If I add a space after skip1 so the skipped lines are

skip1 <lf>
skip2<lf>

then read_csv() throws an error
CParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 3

Adding 1 to skiprows
df = pd.read_csv('test.txt', skiprows=3, delim_whitespace=True, header=None)
does not throw an exception and gives the expected DataFrame

Reading with skiprows=2 and without header=None does not throw an exception and produces a DataFrame with a multiindex

     skip2
0 1      2
3 4      5

If there is a space after skip2 so the skipped lines are

skip1<lf>
skip2 <lf>

then
df = pd.read_csv('test.txt', skiprows=2, delim_whitespace=True, header=None)
does not throw an exception but it does not include the 0 1 2 row in the DataFrame

If there are spaces after skip1 and skip2 so the skipped lines are

skip1 <lf>
skip2 <lf>

then
df = pd.read_csv('test.txt', skiprows=2, delim_whitespace=True, header=None)
throws the CParserError exception but
df = pd.read_csv('test.txt', skiprows=3, delim_whitespace=True, header=None)
does not and returns the expected DataFrame

I would expect skiprows to skip the number of lines specified whether or not there are trailing spaces in those lines.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions