Closed
Description
I'm reading the file available at ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt. The data start on line 73.
If I use the default C engine with read_table
I have to specify skiprows=85
to properly load the table:
pd.read_table(
'co2_mm_mlo.txt.', sep=r'\s+', header=None, skiprows=85, engine='c',
names=['year', 'month', 'dec_year', 'average', 'interpolated', 'trend', 'days'])
But if I use the Python engine then the expected skiprows=72
works:
pd.read_table(
'co2_mm_mlo.txt.', sep=r'\s+', header=None, skiprows=72, engine='python',
names=['year', 'month', 'dec_year', 'average', 'interpolated', 'trend', 'days'])
The resulting DataFrame is expected to have 679 rows, but has 691 rows and data from the header if I use skiprows=72
with the C engine.
I've confirmed this behavior on Mac OS X Yosemite with Pandas 0.15.0 and a checkout of master@5cf3d85a7d4c448519fa08f918a114209cfbdf2b.