Skip to content

read_csv return wrong dataframe when setting skiprows.  #12775

Closed
@strnam

Description

@strnam

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> from StringIO import StringIO
>>> data = """id,text,num_lines
1,"line 11
line 12",2
2,"line 21
line 22",2
3,"line 31",1"""

>>> pd.read_csv(StringIO(data))
Out[2]: 
   id              text  num_lines
0   1  'line 11\nline 12'          2
1   2  'line 21\nline 22'          2
2   3           'line 31'          1

>>> pd.read_csv(StringIO(data), skiprows=[1])
Out[3]: 
         id              text  num_lines
0  'line 12"'                 2        NaN
1         2  'line 21\nline 22'        2.0
2         3           'line 31'        1.0
...

Expected Output

>>> pd.read_csv(StringIO(data), skiprows=[1])
Out[3]: 
   id              text  num_lines
0   2  'line 21\nline 22'          2
1   3           'line 31'          1
...

It should skip '1,"line 11\nline 12",2' instead skip '1,"line 11'

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.3-300.fc23.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 18.0.1
Cython: None
numpy: 1.11.0
scipy: 0.14.1
statsmodels: 0.6.1
xarray: None
IPython: 3.2.1
sphinx: 1.2.3
patsy: 0.4.1
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: 0.6.0
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions