Skip to content

BUG: read_csv raises exception if first row is incomplete #6710

Closed
@altaurog

Description

@altaurog

xref #4749
xref #8985

This is a bit like issue #4749, but the conditions are different.

I have been able to reproduce it by specifying usecols:

In [98]: mydata='1,2,3\n1,2\n1,2\n'
In [99]: pd.read_csv(StringIO(mydata), names=['a', 'b', 'c'], usecols=['a', 'c'])
Out[99]: 
   a   c
0  1   3
1  1 NaN
2  1 NaN

[3 rows x 2 columns]
In [100]: mydata='1,2\n1,2,3\n4,5,6\n'
In [101]: pd.read_csv(StringIO(mydata), names=['a', 'b', 'c'], usecols=['a', 'c'])
---------------------------------------------------------------------------
CParserError                              Traceback (most recent call last)
<ipython-input-101-f60127771eeb> in <module>()
----> 1 pd.read_csv(StringIO(mydata), names=['a', 'b', 'c'], usecols=['a', 'c'])

/home/altaurog/venv/p27/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format)
    418                     infer_datetime_format=infer_datetime_format)
    419 
--> 420         return _read(filepath_or_buffer, kwds)
    421 
    422     parser_f.__name__ = name

/home/altaurog/venv/p27/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
    223         return parser
    224 
--> 225     return parser.read()
    226 
    227 _parser_defaults = {

/home/altaurog/venv/p27/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in read(self, nrows)
    624                 raise ValueError('skip_footer not supported for iteration')
    625 
--> 626         ret = self._engine.read(nrows)
    627 
    628         if self.options.get('as_recarray'):

/home/altaurog/venv/p27/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in read(self, nrows)
   1068 
   1069         try:
-> 1070             data = self._reader.read(nrows)
   1071         except StopIteration:
   1072             if nrows is None:

/home/altaurog/venv/p27/local/lib/python2.7/site-packages/pandas/parser.so in pandas.parser.TextReader.read (pandas/parser.c:6866)()

/home/altaurog/venv/p27/local/lib/python2.7/site-packages/pandas/parser.so in pandas.parser.TextReader._read_low_memory (pandas/parser.c:7086)()

/home/altaurog/venv/p27/local/lib/python2.7/site-packages/pandas/parser.so in pandas.parser.TextReader._read_rows (pandas/parser.c:7691)()

/home/altaurog/venv/p27/local/lib/python2.7/site-packages/pandas/parser.so in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:7575)()

/home/altaurog/venv/p27/local/lib/python2.7/site-packages/pandas/parser.so in pandas.parser.raise_parser_error (pandas/parser.c:19038)()

CParserError: Error tokenizing data. C error: Expected 2 fields in line 2, saw 3

In [102]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.2.0-4-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.13.1
Cython: 0.20
numpy: 1.8.0
scipy: None
statsmodels: None
IPython: 1.2.1
sphinx: 1.1.3
patsy: None
scikits.timeseries: None
dateutil: 1.5
pytz: 2012c
bottleneck: 0.8.0
tables: 3.1.0
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: 0.7.4
xlsxwriter: None
sqlalchemy: None
lxml: None
bs4: None
html5lib: 0.999
bq: None
apiclient: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions