-
Notifications
You must be signed in to change notification settings - Fork 82
Closed
Labels
acceptedAccepted for implementationAccepted for implementationbugSomething isn't workingSomething isn't working
Description
Background [Optional]
we have 76 GB variable width variable length (BDW+RDW) with multi-segmented file(47 segments) , file contains 470,000,000 records with 700 columns. I am trying to convert parquet file. It's creating single index(1)( single partition). For parsing file and able to see data correctly with df.show()
Question
How to do I parallelize job across executors?
df.write taking single thread while write into parquet file.
options used
- inputsplit records
- input split size
Metadata
Metadata
Assignees
Labels
acceptedAccepted for implementationAccepted for implementationbugSomething isn't workingSomething isn't working