DRILL-5429: Improve query performance for MapR DB JSON Tables by ppadma · Pull Request #817 · apache/drill

ppadma · 2017-04-13T01:30:24Z

No description provided.

gparai · 2017-04-18T21:46:40Z

...ormat-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushFilterIntoScan.java

                                                                    newScanSpec,
-                                                                    groupScan.getColumns());
+                                                                    groupScan.getColumns(),
+                                                                    groupScan.getTableStats());


We should try to use clone() here. All we are doing is copying stuff from one groupscan to another. JsonTableGroupScan already has a clone which clones everything except columns.

@Override public GroupScan clone(List<SchemaPath> columns) { JsonTableGroupScan newScan = new JsonTableGroupScan(this); newScan.columns = columns; return newScan; }
We can create another which would clone everything except scanSpec. This can be used to pass in the newScanSpec generated here. Doing this would also clone the regionsToScan saving the call to init().

gparai · 2017-04-18T21:48:27Z

...format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableGroupScan.java

-      tableStats = new MapRDBTableStats(conf, scanSpec.getTableName());
+
+      // Fetch tableStats only once and cache it.
+      if (tableStats == null) {


This can probably be removed if we call clone(). However, it may be a useful check if we end up calling it from some other code-paths. Maybe add some logging to ensure we are not recreating the tableStats?

ppadma · 2017-04-28T15:09:30Z

I changed as per your suggestion to use clone. We should recompute regionsToScan as it depends upon scanSpec. We can skip init by copying table and tabletInfo from old scan. Also, we can skip getting tableStats altogether as rowCount can be obtained from tabletInfo.
Please review new diffs.

gparai

Please address the minor comments regarding adding comments.
Otherwise, LGTM +1

gparai · 2017-05-02T21:24:57Z

...format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableGroupScan.java

+    newScan.computeRegionsToScan();
+    return newScan;
+  }
+


Please add comments describing the function

gparai · 2017-05-02T21:28:09Z

...format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableGroupScan.java

-        }
+        totalRowCount += tabletInfo.getEstimatedNumRows();
      }
+


Please add your explanation as a comment

We should recompute regionsToScan as it depends upon scanSpec

gparai · 2017-05-02T21:30:13Z

...format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableGroupScan.java

+      table = MapRDB.getTable(scanSpec.getTableName());
+      tabletInfos = table.getTabletInfos(scanSpec.getCondition());
+
+      // Calculate totalRowCount for the table


Please add a comment explaining why we compute the totalRowCount like so?
totalRowCount += tabletInfo.getEstimatedNumRows();

Cache and reuse table and tabletInfo per query instead of fetching them multiple times. Compute rowCount from tabletInfo instead of expensive tableStats call.

parthchandra · 2017-05-12T23:54:20Z

+1

gparai reviewed Apr 18, 2017

View reviewed changes

ppadma force-pushed the DRILL-5429 branch from bb1f716 to a8d6af0 Compare April 27, 2017 23:00

ppadma changed the title ~~DRILL-5429: Cache tableStats per query for MapR DB JSON Tables~~ DRILL-5429: Improve query performance for MapR DB JSON Tables Apr 28, 2017

gparai approved these changes May 2, 2017

View reviewed changes

DRILL-5429: Improve query performance for MapR DB JSON Tables

c1ae74c

Cache and reuse table and tabletInfo per query instead of fetching them multiple times. Compute rowCount from tabletInfo instead of expensive tableStats call.

ppadma force-pushed the DRILL-5429 branch from a8d6af0 to c1ae74c Compare May 2, 2017 22:52

asfgit closed this in 27c5f45 May 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRILL-5429: Improve query performance for MapR DB JSON Tables#817

DRILL-5429: Improve query performance for MapR DB JSON Tables#817
ppadma wants to merge 1 commit intoapache:masterfrom
ppadma:DRILL-5429

ppadma commented Apr 13, 2017

Uh oh!

gparai Apr 18, 2017

Uh oh!

gparai Apr 18, 2017

Uh oh!

ppadma commented Apr 28, 2017

Uh oh!

gparai left a comment

Uh oh!

gparai May 2, 2017

Uh oh!

gparai May 2, 2017

Uh oh!

gparai May 2, 2017

Uh oh!

parthchandra commented May 12, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ppadma commented Apr 13, 2017

Uh oh!

gparai Apr 18, 2017

Choose a reason for hiding this comment

Uh oh!

gparai Apr 18, 2017

Choose a reason for hiding this comment

Uh oh!

ppadma commented Apr 28, 2017

Uh oh!

gparai left a comment

Choose a reason for hiding this comment

Uh oh!

gparai May 2, 2017

Choose a reason for hiding this comment

Uh oh!

gparai May 2, 2017

Choose a reason for hiding this comment

Uh oh!

gparai May 2, 2017

Choose a reason for hiding this comment

Uh oh!

parthchandra commented May 12, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants