DRILL-5516: Limit memory usage for Hbase reader#839
DRILL-5516: Limit memory usage for Hbase reader#839arina-ielchiieva wants to merge 1 commit intoapache:masterfrom
Conversation
|
The right approach is not to simply allow HBase to use more memory. The right approach is to limit memory. Fortunately, another project is underway to do just that. Let's collaborate. In the next week or so I'll do a PR for the framework to limit batch sizes in readers, along with an implementation for the "compliant" text readers. Maybe you can use that framework to retrofit the HBase reader to also limit it's batch size. Basically, we limit the length of the longest vector to 16 MB. The present patch, using unlimited memory, has all kinds of other problems -- the very problems we are trying to solve, so it is not helpful to move forward in one area, backward in another. |
|
This approach does not allow Hbase to use more memory, actually it limits memory usage. Previously when batch size was limited to 4000 rows, we could have one batch using ~3 GB. |
paul-rogers
left a comment
There was a problem hiding this comment.
Thanks for the earlier explanation. Let's revisit this fix once the bounded-size vector mutator work is available in master.
One minor correction, then good to go.
| done: | ||
| for (; rowCount < TARGET_RECORD_COUNT; rowCount++) { | ||
| // if first row is larger than allowed max size in batch, it will be added anyway | ||
| do { |
There was a problem hiding this comment.
Still need to monitor row count: row count cannot exceed 64K. So, loop termination is either exceeds memory limit OR reaches max row count. For row count, might as well use the original limit, unless we know enough to pick a better limit.
|
General suggestion, perhaps change the title to more clearly describe the fix. Maybe "Limit memory usage for Hive reader" or some such. I originally read "use max allowable memory" as perhaps meaning to use the full 10 GB that the allocator gives to each operator... |
|
One more comment: any way to unit test this improvement? |
ce3f227 to
2ee209b
Compare
|
|
+1 |
To limit memory usage for Hbase reader we are adding max allowed allocated memory contant which will default to 64 mb. Thus batch size will be limited to 4000 (as before if memory limit does not exceed) or to number of records that are within max allowed memory limit. If first row in batch is larger than allowed default, it will be written in batch but batch will contain only this row.