-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Open
Labels
Component: C++Component: PythonStatus: stale-warningIssues and PRs flagged as stale which are due to be closed if no indication otherwiseIssues and PRs flagged as stale which are due to be closed if no indication otherwiseType: enhancement
Description
I'm trying to read CSV as is. All columns as strings. I don't know the schema of these CSVs and they will vary as they are provided by user.
Right now i'm using pandas.read_csv(dtype=str) which works great, but since final destination of these CSVs are parquet files it seems like much more efficient to use pyarrow.csv.read_csv in future, as soon as this becomes available :)
I tried things like pyarrow.csv.read_csv(convert_types=ConvertOptions(columns_types=defaultdict(lambda: 'string')))
but it doesn't work.
Maybe I just didnt' find something that already exists? :)
Environment: Ubuntu Xenial
Reporter: Bogdan Klichuk
Note: This issue was originally created as ARROW-5811. Please see the migration documentation for further details.
Metadata
Metadata
Assignees
Labels
Component: C++Component: PythonStatus: stale-warningIssues and PRs flagged as stale which are due to be closed if no indication otherwiseIssues and PRs flagged as stale which are due to be closed if no indication otherwiseType: enhancement