Skip to content

readcsv performance #3350

Closed
Closed
@ViralBShah

Description

@ViralBShah

I am loading a 100MB csv file using readcsv, and it takes 70 seconds. The file reads comma separated values that are a mix of integers and strings. It is 1,600,000 rows and 9 columns. Some rows have a 10th column as well, but that just becomes part of the last column.

The profiler reveals that a majority of the time is spent in split, which is not unexpected, but it would be nice to load such files quickly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions