Skip to content

Commit

Permalink
adjust threshold
Browse files Browse the repository at this point in the history
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
  • Loading branch information
dchigarev committed Oct 11, 2023
1 parent 1e591b5 commit ea79b79
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -73,11 +73,11 @@ def split_pandas_df_into_partitions(
# 3. The distributed splitting consumes more memory that the sequential one.
# It was estimated that it requires ~2.5x of the dataframe size, so to avoid
# OOM problems, we fall back to sequential implementation in case it doesn't
# fit into memory (using 3x threshold to be on the safe side).
# fit into memory (using 3.5x threshold to be on the safe side).
enough_elements = (len(df) * len(df.columns)) > 6_000_000
all_numeric_types = all(is_numeric_dtype(dtype) for dtype in df.dtypes)
three_copies_fits_into_memory = psutil.virtual_memory().available > (
df.memory_usage().sum() * 3
df.memory_usage().sum() * 3.5
)
distributed_splitting = (
enough_elements and all_numeric_types and three_copies_fits_into_memory
Expand Down

0 comments on commit ea79b79

Please sign in to comment.