DRILL-4411: hash join should limit batch based on size and number of records by minji-kim · Pull Request #381 · apache/drill

minji-kim · 2016-02-18T17:39:49Z

Right now, hash joins can run out of memory if records are large since the batch is limited only by size (of 4000). This patch implements a simple heuristic. If the allocator for the outputs become larger than 10 MB before outputing 4000 records (say 2000), then set the batch size limit to 2000 for the future batches.

jaltekruse · 2016-02-24T22:22:02Z

.../java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinProbeTemplate.java

  private HashJoinBatch outgoingJoinBatch = null;

-  private static final int TARGET_RECORDS_PER_BATCH = 4000;
+  private int targetRecordsPerBatch = 4000;


I would add a comment here about when this value will be mutated as we are moving it away from immutability, especially since most of the other operators currently have this as an immutable value.

minji-kim · 2016-03-12T03:09:44Z

I made it such that it doesn't adjust the batch size once, but keep the minimum batch size (in terms of number of records) to be at least 1.

adeneche · 2016-03-12T06:06:49Z

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java

    }
+
+    container.clear();
+    outputAllocator.close();


outputAllocator can be null, right ?

Yes, it could be if we close before calling next()/buildSchema(). I will add a check for null.

…ber of records Make batch size (in bytes and records) adjustments more than once Minor change: check for null outputAllocator before closing

ilooner · 2018-06-01T00:03:55Z

@minji-kim Thanks for putting the effort into making this change. Unfortunately a sophisticated version of this was recently completed in this PR #1227 and will be merged soon. Since the objectives of the two changes were the same I will close this PR. Sorry that your change did not receive any attention earlier.

jaltekruse reviewed Feb 24, 2016
View reviewed changes

adeneche reviewed Mar 12, 2016
View reviewed changes

minji-kim force-pushed the DRILL-4411 branch from 29411ac to b432958 Compare March 22, 2016 00:24

DRILL-4411: hash join should limit batch based on size as well as num…

92f2084

…ber of records Make batch size (in bytes and records) adjustments more than once Minor change: check for null outputAllocator before closing

minji-kim force-pushed the DRILL-4411 branch from b432958 to 92f2084 Compare March 22, 2016 02:07

ilooner closed this Jun 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRILL-4411: hash join should limit batch based on size and number of records#381

DRILL-4411: hash join should limit batch based on size and number of records#381
minji-kim wants to merge 1 commit intoapache:masterfrom
minji-kim:DRILL-4411

minji-kim commented Feb 18, 2016

Uh oh!

jaltekruse Feb 24, 2016

Uh oh!

minji-kim commented Mar 12, 2016

Uh oh!

adeneche Mar 12, 2016

Uh oh!

minji-kim Mar 12, 2016

Uh oh!

ilooner commented Jun 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

minji-kim commented Feb 18, 2016

Uh oh!

jaltekruse Feb 24, 2016

Choose a reason for hiding this comment

Uh oh!

minji-kim commented Mar 12, 2016

Uh oh!

adeneche Mar 12, 2016

Choose a reason for hiding this comment

Uh oh!

minji-kim Mar 12, 2016

Choose a reason for hiding this comment

Uh oh!

ilooner commented Jun 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants