Skip to content

Commit

Permalink
Merge pull request pandas-dev#10023 from jblackburne/read_csv-newline…
Browse files Browse the repository at this point in the history
…-chunk

read_csv newline fix
  • Loading branch information
jreback committed May 8, 2015
2 parents e686387 + e693c3a commit 2840bea
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 2 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.16.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,7 @@ Bug Fixes
- Bug ``GroupBy.size`` doesn't attach index name properly if grouped by ``TimeGrouper`` (:issue:`9925`)
- Bug causing an exception in slice assignments because ``length_of_indexer`` returns wrong results (:issue:`9995`)
- Bug in csv parser causing lines with initial whitespace plus one non-space character to be skipped. (:issue:`9710`)
- Bug in C csv parser causing spurious NaNs when data started with newline followed by whitespace. (:issue:`10022`)

- Bug causing elements with a null group to spill into the final group when grouping by a ``Categorical`` (:issue:`9603`)
- Bug where .iloc and .loc behavior is not consistent on empty dataframes (:issue:`9964`)
Expand Down
6 changes: 6 additions & 0 deletions pandas/io/tests/test_parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2287,6 +2287,12 @@ def test_single_char_leading_whitespace(self):
result = self.read_csv(StringIO(data), skipinitialspace=True)
tm.assert_frame_equal(result, expected)

def test_chunk_begins_with_newline_whitespace(self):
# GH 10022
data = '\n hello\nworld\n'
result = self.read_csv(StringIO(data), header=None)
self.assertEqual(len(result), 2)


class TestPythonParser(ParserTests, tm.TestCase):
def test_negative_skipfooter_raises(self):
Expand Down
4 changes: 2 additions & 2 deletions pandas/src/parser/tokenizer.c
Original file line number Diff line number Diff line change
Expand Up @@ -854,7 +854,7 @@ int tokenize_delimited(parser_t *self, size_t line_limit)
--i;
} while (i + 1 > self->datapos && *buf != '\n');

if (i + 1 > self->datapos) // reached a newline rather than the beginning
if (*buf == '\n') // reached a newline rather than the beginning
{
++buf; // move pointer to first char after newline
++i;
Expand Down Expand Up @@ -1172,7 +1172,7 @@ int tokenize_delim_customterm(parser_t *self, size_t line_limit)
--i;
} while (i + 1 > self->datapos && *buf != self->lineterminator);

if (i + 1 > self->datapos) // reached a newline rather than the beginning
if (*buf == self->lineterminator) // reached a newline rather than the beginning
{
++buf; // move pointer to first char after newline
++i;
Expand Down

0 comments on commit 2840bea

Please sign in to comment.