DRILL-5429: Improve query performance for MapR DB JSON Tables#817
DRILL-5429: Improve query performance for MapR DB JSON Tables#817ppadma wants to merge 1 commit intoapache:masterfrom
Conversation
| newScanSpec, | ||
| groupScan.getColumns()); | ||
| groupScan.getColumns(), | ||
| groupScan.getTableStats()); |
There was a problem hiding this comment.
We should try to use clone() here. All we are doing is copying stuff from one groupscan to another. JsonTableGroupScan already has a clone which clones everything except columns.
@Override public GroupScan clone(List<SchemaPath> columns) { JsonTableGroupScan newScan = new JsonTableGroupScan(this); newScan.columns = columns; return newScan; }
We can create another which would clone everything except scanSpec. This can be used to pass in the newScanSpec generated here. Doing this would also clone the regionsToScan saving the call to init().
| tableStats = new MapRDBTableStats(conf, scanSpec.getTableName()); | ||
|
|
||
| // Fetch tableStats only once and cache it. | ||
| if (tableStats == null) { |
There was a problem hiding this comment.
This can probably be removed if we call clone(). However, it may be a useful check if we end up calling it from some other code-paths. Maybe add some logging to ensure we are not recreating the tableStats?
|
I changed as per your suggestion to use clone. We should recompute regionsToScan as it depends upon scanSpec. We can skip init by copying table and tabletInfo from old scan. Also, we can skip getting tableStats altogether as rowCount can be obtained from tabletInfo. |
gparai
left a comment
There was a problem hiding this comment.
Please address the minor comments regarding adding comments.
Otherwise, LGTM +1
| newScan.computeRegionsToScan(); | ||
| return newScan; | ||
| } | ||
|
|
There was a problem hiding this comment.
Please add comments describing the function
| } | ||
| totalRowCount += tabletInfo.getEstimatedNumRows(); | ||
| } | ||
|
|
There was a problem hiding this comment.
Please add your explanation as a comment
We should recompute regionsToScan as it depends upon scanSpec
| table = MapRDB.getTable(scanSpec.getTableName()); | ||
| tabletInfos = table.getTabletInfos(scanSpec.getCondition()); | ||
|
|
||
| // Calculate totalRowCount for the table |
There was a problem hiding this comment.
Please add a comment explaining why we compute the totalRowCount like so?
totalRowCount += tabletInfo.getEstimatedNumRows();
Cache and reuse table and tabletInfo per query instead of fetching them multiple times. Compute rowCount from tabletInfo instead of expensive tableStats call.
|
+1 |
No description provided.