This repository was archived by the owner on Sep 3, 2022. It is now read-only.
This repository was archived by the owner on Sep 3, 2022. It is now read-only.
Diagnose, benchmark and provide guidance for loading large dataframe from BigQuery #329
Open
Description
Customer query (via Tahir F.)
Do we have any kind of benchmarks / recommendations for the GCE set-up for the amount of data that would be brought into a pandas dataframe?
His question is as follows:
From my perspective, could you advise me the appropriate spec of GCE?
We grade up the GCE spec and it seems to use only 2% of CPU but it takes 5mins to handle 500,000rows data in pandas.
Do you have any idea to improve the performance of datalab?
Does it relate to network or disk issue?