Skip to content

BUG: Read CSV on python engine fails when skiprows and chunk size are specified (#55677, #56323) #56250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 5, 2023
Prev Previous commit
Next Next commit
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
  • Loading branch information
pre-commit-ci[bot] committed Nov 30, 2023
commit 35a6929f8306f67ed7cf33ad336768500421a373
4 changes: 3 additions & 1 deletion pandas/io/parsers/python_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -1149,7 +1149,9 @@ def _get_lines(self, rows: int | None = None) -> list[list[Scalar]]:
rows = 0

while True:
new_row: list[Scalar] = self._next_iter_line(row_num=self.pos + rows + 1)
new_row: list[Scalar] = self._next_iter_line(
row_num=self.pos + rows + 1
)
rows += 1

if new_row is not None:
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/io/parser/test_skiprows.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,7 @@ def test_skip_rows_and_n_rows(all_parsers):
expected = DataFrame({"a": [1, 3, 5, 7, 8], "b": ["a", "c", "e", "g", "h"]})
tm.assert_frame_equal(result, expected)


@xfail_pyarrow
def test_skip_rows_with_chunks(all_parsers):
# GH 55677
Expand All @@ -327,4 +328,4 @@ def test_skip_rows_with_chunks(all_parsers):
tm.assert_frame_equal(
df1, DataFrame({"col_a": [20, 30, 60, 70]}, index=[0, 1, 2, 3])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit but you don't need to specify index here; will help condense this to one line

)
tm.assert_frame_equal(df2, DataFrame({"col_a": [80, 90, 100]}, index=[4, 5, 6]))
tm.assert_frame_equal(df2, DataFrame({"col_a": [80, 90, 100]}, index=[4, 5, 6]))