Description
given this input file with linefeeds indicated by
skip1<lf>
skip2<lf>
0 1 2<lf>
3 4 5<lf>
reading with read_csv() in pandas 0.15.0-42-g20be789 and python 3.4.2 works
df = pd.read_csv('test.txt', skiprows=2, delim_whitespace=True, header=None)
df
0 1 2
0 0 1 2
1 3 4 5
If I add a space after skip1 so the skipped lines are
skip1 <lf>
skip2<lf>
then read_csv() throws an error
CParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 3
Adding 1 to skiprows
df = pd.read_csv('test.txt', skiprows=3, delim_whitespace=True, header=None)
does not throw an exception and gives the expected DataFrame
Reading with skiprows=2 and without header=None does not throw an exception and produces a DataFrame with a multiindex
skip2
0 1 2
3 4 5
If there is a space after skip2 so the skipped lines are
skip1<lf>
skip2 <lf>
then
df = pd.read_csv('test.txt', skiprows=2, delim_whitespace=True, header=None)
does not throw an exception but it does not include the 0 1 2 row in the DataFrame
If there are spaces after skip1 and skip2 so the skipped lines are
skip1 <lf>
skip2 <lf>
then
df = pd.read_csv('test.txt', skiprows=2, delim_whitespace=True, header=None)
throws the CParserError exception but
df = pd.read_csv('test.txt', skiprows=3, delim_whitespace=True, header=None)
does not and returns the expected DataFrame
I would expect skiprows to skip the number of lines specified whether or not there are trailing spaces in those lines.