Skip to content

DRILL-5429: Improve query performance for MapR DB JSON Tables#817

Closed
ppadma wants to merge 1 commit intoapache:masterfrom
ppadma:DRILL-5429
Closed

DRILL-5429: Improve query performance for MapR DB JSON Tables#817
ppadma wants to merge 1 commit intoapache:masterfrom
ppadma:DRILL-5429

Conversation

@ppadma
Copy link
Contributor

@ppadma ppadma commented Apr 13, 2017

No description provided.

newScanSpec,
groupScan.getColumns());
groupScan.getColumns(),
groupScan.getTableStats());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to use clone() here. All we are doing is copying stuff from one groupscan to another. JsonTableGroupScan already has a clone which clones everything except columns.

@Override public GroupScan clone(List<SchemaPath> columns) { JsonTableGroupScan newScan = new JsonTableGroupScan(this); newScan.columns = columns; return newScan; }
We can create another which would clone everything except scanSpec. This can be used to pass in the newScanSpec generated here. Doing this would also clone the regionsToScan saving the call to init().

tableStats = new MapRDBTableStats(conf, scanSpec.getTableName());

// Fetch tableStats only once and cache it.
if (tableStats == null) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be removed if we call clone(). However, it may be a useful check if we end up calling it from some other code-paths. Maybe add some logging to ensure we are not recreating the tableStats?

@ppadma ppadma changed the title DRILL-5429: Cache tableStats per query for MapR DB JSON Tables DRILL-5429: Improve query performance for MapR DB JSON Tables Apr 28, 2017
@ppadma
Copy link
Contributor Author

ppadma commented Apr 28, 2017

I changed as per your suggestion to use clone. We should recompute regionsToScan as it depends upon scanSpec. We can skip init by copying table and tabletInfo from old scan. Also, we can skip getting tableStats altogether as rowCount can be obtained from tabletInfo.
Please review new diffs.

Copy link

@gparai gparai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the minor comments regarding adding comments.
Otherwise, LGTM +1

newScan.computeRegionsToScan();
return newScan;
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments describing the function

}
totalRowCount += tabletInfo.getEstimatedNumRows();
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add your explanation as a comment

We should recompute regionsToScan as it depends upon scanSpec

table = MapRDB.getTable(scanSpec.getTableName());
tabletInfos = table.getTabletInfos(scanSpec.getCondition());

// Calculate totalRowCount for the table
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining why we compute the totalRowCount like so?
totalRowCount += tabletInfo.getEstimatedNumRows();

Cache and reuse table and tabletInfo per query instead of fetching them multiple times.
Compute rowCount from tabletInfo instead of expensive tableStats call.
@parthchandra
Copy link
Contributor

+1

@asfgit asfgit closed this in 27c5f45 May 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants