Skip to content

DRILL-5516: Limit memory usage for Hbase reader#839

Closed
arina-ielchiieva wants to merge 1 commit intoapache:masterfrom
arina-ielchiieva:DRILL-5516
Closed

DRILL-5516: Limit memory usage for Hbase reader#839
arina-ielchiieva wants to merge 1 commit intoapache:masterfrom
arina-ielchiieva:DRILL-5516

Conversation

@arina-ielchiieva
Copy link
Member

@arina-ielchiieva arina-ielchiieva commented May 16, 2017

To limit memory usage for Hbase reader we are adding max allowed allocated memory contant which will default to 64 mb. Thus batch size will be limited to 4000 (as before if memory limit does not exceed) or to number of records that are within max allowed memory limit. If first row in batch is larger than allowed default, it will be written in batch but batch will contain only this row.

@paul-rogers
Copy link
Contributor

The right approach is not to simply allow HBase to use more memory. The right approach is to limit memory.

Fortunately, another project is underway to do just that. Let's collaborate. In the next week or so I'll do a PR for the framework to limit batch sizes in readers, along with an implementation for the "compliant" text readers.

Maybe you can use that framework to retrofit the HBase reader to also limit it's batch size. Basically, we limit the length of the longest vector to 16 MB.

The present patch, using unlimited memory, has all kinds of other problems -- the very problems we are trying to solve, so it is not helpful to move forward in one area, backward in another.

@arina-ielchiieva
Copy link
Member Author

arina-ielchiieva commented May 17, 2017

This approach does not allow Hbase to use more memory, actually it limits memory usage. Previously when batch size was limited to 4000 rows, we could have one batch using ~3 GB.

Copy link
Contributor

@paul-rogers paul-rogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the earlier explanation. Let's revisit this fix once the bounded-size vector mutator work is available in master.

One minor correction, then good to go.

done:
for (; rowCount < TARGET_RECORD_COUNT; rowCount++) {
// if first row is larger than allowed max size in batch, it will be added anyway
do {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to monitor row count: row count cannot exceed 64K. So, loop termination is either exceeds memory limit OR reaches max row count. For row count, might as well use the original limit, unless we know enough to pick a better limit.

@paul-rogers
Copy link
Contributor

General suggestion, perhaps change the title to more clearly describe the fix. Maybe "Limit memory usage for Hive reader" or some such. I originally read "use max allowable memory" as perhaps meaning to use the full 10 GB that the allocator gives to each operator...

@paul-rogers
Copy link
Contributor

One more comment: any way to unit test this improvement?

@arina-ielchiieva arina-ielchiieva changed the title DRILL-5516: Use max allowed allocated memory when defining batch size… DRILL-5516: Limit memory usage for Hbase reader May 18, 2017
@arina-ielchiieva
Copy link
Member Author

@paul-rogers

  1. Renamed PR as you suggested to better convey changes idea. Thank you for suggestion.
  2. Included row count into batch size determination. Thus batch size will be limited to max allowed rows (as before if memory limit does not exceed) or to number of records that are within max allowed memory limit.
  3. Unfortunately did not add unit tests, Not sure if there is a way to test Hbase reader in isolation for true unit testing (I know you have been working to some changes in this area but I don't know the details).
    But I have tested the fix manually, changing in code max rows size and memory allocation. Could not write Drill "unit tests" for this since it's not easy to mock static final int fields and adding such large data sets to test on real numbers may extend our unit tests running time. I guess in this case it's better to add such test to Functional tests suite. So if possible let's leave this change without unit tests for now.

@sudheeshkatkam
Copy link
Contributor

+1

@asfgit asfgit closed this in 7f98400 May 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants