Skip to content

Bug with read_table, skiprows, and C engine #8679

Closed
@jiffyclub

Description

@jiffyclub

I'm reading the file available at ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt. The data start on line 73.

If I use the default C engine with read_table I have to specify skiprows=85 to properly load the table:

pd.read_table(
        'co2_mm_mlo.txt.', sep=r'\s+', header=None, skiprows=85, engine='c',
        names=['year', 'month', 'dec_year', 'average', 'interpolated', 'trend', 'days'])

But if I use the Python engine then the expected skiprows=72 works:

pd.read_table(
        'co2_mm_mlo.txt.', sep=r'\s+', header=None, skiprows=72, engine='python',
        names=['year', 'month', 'dec_year', 'average', 'interpolated', 'trend', 'days'])

The resulting DataFrame is expected to have 679 rows, but has 691 rows and data from the header if I use skiprows=72 with the C engine.

I've confirmed this behavior on Mac OS X Yosemite with Pandas 0.15.0 and a checkout of master@5cf3d85a7d4c448519fa08f918a114209cfbdf2b.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions