DRILL-5140: Fix CompileException in run-time generated code when record batch has… by vvysotskyi · Pull Request #818 · apache/drill

vvysotskyi · 2017-04-24T16:50:31Z

…rd batch has large number of fields.

Implement recursive splitting of run-time generated code to the inner classes when class members number is close to the class constant pool size.

jinfengni · 2017-04-27T23:37:44Z

exec/java-exec/src/main/java/org/apache/drill/exec/expr/ClassGenerator.java

-  public static enum BlockType {SETUP, EVAL, RESET, CLEANUP};
+
+  /**
+   * Field has 2 indexes within the constant pull: field item + name and type item.


I could not make sense how you calculate this number of 26767. According to JVM spec [1], the constant pool table consists of 2 or more bytes "info" section for each item. In some case like class name, one item could use 2 bytes, while in other cases like field, method, one item could use 4 bytes. Integer uses 4 bytes, and Long constants uses 8 bytes.

That is, if I have a class with fields < 26767, we may still hit the complain of "too many constants".

http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.4

I missed the third index. Fixed estimation of max index value, thanks. Also added comment that explains the way, how it is calculating.

jinfengni · 2017-04-28T00:28:04Z

exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassTransformer.java

        final ClassNames nextGenerated = nextSet.generated;
-        final ClassNode generatedNode = classesToMerge.get(nextGenerated.slash);
+        Pair<byte[], ClassNode> classNodePair = classesToMerge.remove(nextGenerated.slash);
+        final ClassNode generatedNode;


Can you add comments to explain why you are calling remove() on classesToMerge?

jinfengni · 2017-04-28T00:29:32Z

exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassTransformer.java

+        if (classNodePair != null) {
+          generatedNode = classNodePair.getValue();
+        } else {
+          generatedNode = null;


What does it mean by "generatedNode = null"? Will it cause problem in subsequent call at line 288?

It is needed for the case when generated class was empty. Then classesToMerge would be empty too and classesToMerge.remove(nextGenerated.slash) would return null.
No, it wouldn't. In the method MergeAdapter.getMergedClass() there are a many checks for this value.

jinfengni · 2017-04-28T00:30:15Z

exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassTransformer.java

      }

+      for (Map.Entry<String, Pair<byte[], ClassNode>> clazz : classesToMerge.entrySet()) {
+        classLoader.injectByteCode(clazz.getKey().replace(FileUtils.separatorChar, '.'), clazz.getValue().getKey());


Add comments?

paul-rogers · 2017-04-28T01:31:43Z

Was this improvement tested with the "plain Java" mode turned on? Most of the work is in the byte-code fixup, but the generated Java must be validated that it works when compiled directly. Let me know if you need assistance.

…rd batch has large number of fields.

vvysotskyi

@jinfengni I have added comments, and made changes. Could you please take a look?

vvysotskyi · 2017-05-03T17:11:48Z

exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassTransformer.java

        final ClassNames nextGenerated = nextSet.generated;
-        final ClassNode generatedNode = classesToMerge.get(nextGenerated.slash);
+        Pair<byte[], ClassNode> classNodePair = classesToMerge.remove(nextGenerated.slash);
+        final ClassNode generatedNode;


vvysotskyi · 2017-05-03T17:13:25Z

exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassTransformer.java

+        if (classNodePair != null) {
+          generatedNode = classNodePair.getValue();
+        } else {
+          generatedNode = null;


It is needed for the case when generated class was empty. Then classesToMerge would be empty too and classesToMerge.remove(nextGenerated.slash) would return null.
No, it wouldn't. In the method MergeAdapter.getMergedClass() there are a many checks for this value.

vvysotskyi · 2017-05-03T17:13:32Z

exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassTransformer.java

      }

+      for (Map.Entry<String, Pair<byte[], ClassNode>> clazz : classesToMerge.entrySet()) {
+        classLoader.injectByteCode(clazz.getKey().replace(FileUtils.separatorChar, '.'), clazz.getValue().getKey());


vvysotskyi · 2017-05-03T17:22:49Z

exec/java-exec/src/main/java/org/apache/drill/exec/expr/ClassGenerator.java

-  public static enum BlockType {SETUP, EVAL, RESET, CLEANUP};
+
+  /**
+   * Field has 2 indexes within the constant pull: field item + name and type item.


I missed the third index. Fixed estimation of max index value, thanks. Also added comment that explains the way, how it is calculating.

vvysotskyi · 2017-05-03T17:38:31Z

@paul-rogers Yes, I tested this fix with turned on the "plain Java" mode. Debugging generated code using an IDE considerably helped me. Thanks for offering assistance.

jinfengni · 2017-05-19T20:57:42Z

exec/java-exec/src/main/java/org/apache/drill/exec/expr/ClassGenerator.java

+  private LinkedList<SizedJBlock>[] oldBlocks;
+
+  /**
+   * Assumed that field has 3 indexes within the constant pull: index of the CONSTANT_Fieldref_info +


I'm not entirely sure the calculation is correct, in terms of # of entries per field in constant pool of a class.

Per JVM spec, each class field has CONSTANT_Fieldref_info (1 entry), which has class_index and name_and_type_index. The class_index points CONSTANT_Class_info, which is shared by across all the class fields. The second points to CONSTANT_NameAndType_info (1 entry), which points to name (1 entry) and descriptor (1 entry). Therefore, for each class field, at least 4 entries are required in constant pool. Similarly, we could get 4 entries for each method.

Besides fields and methods, we also have to take constant literal into account, like int, float , string ... constant. For constant literals, since we apply source-code copy for build-in-function /udf, it's hard to figure out exactly how many constants are used in the generated class.

Given the above reasons, I'm not sure whether it makes sense to try to come up with a formula to estimate the maximum # of fields a generated class could have. If the estimation is not accurate, then what if we just provides a ballpark estimation and put some 'magic' number here?

In this calculation is taken into account that CONSTANT_NameAndType_info.descriptor has limited range of its values, so it was taken 3 entries for the each class field and method.

In this formula supposed that each class field and local variable use different literal values that have two entries. I am agree with you that there may be cases that have not been covered by this formula.

The formula is needed for at least to consider the number of generated methods, difference between entries count for class fields and local variables. The 'magic' number 1000 was used in this formula to reserve constant pool for class references and unaccounted cases.

Although the proposed formula might still hit problem in certain cases, it seems to work fine for normal cases. Also, for normal queries over hundreds of columns, the run-time generated code remains same as before. Given that, it makes sense to me to merge this code change.

jinfengni · 2017-05-26T17:47:17Z

+1

…rd batch has large number of fields. - Changed estimation of max index value and added comments. close apache#818

jinfengni reviewed Apr 27, 2017

View reviewed changes

jinfengni reviewed Apr 28, 2017

View reviewed changes

DRILL-5140: Fix CompileException in run-time generated code when reco…

57c1e6a

…rd batch has large number of fields.

vvysotskyi commented May 3, 2017

View reviewed changes

vvysotskyi force-pushed the DRILL-5140 branch from 2dd81eb to 3bacd4f Compare May 3, 2017 17:28

vvysotskyi force-pushed the DRILL-5140 branch from 3bacd4f to c24c7b4 Compare May 4, 2017 07:15

DRILL-5140: Changed estimation of max index value and added comments.

c24c7b4

jinfengni reviewed May 19, 2017

View reviewed changes

asfgit closed this in b14e30b Jun 3, 2017

Conversation

vvysotskyi commented Apr 24, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paul-rogers commented Apr 28, 2017

Uh oh!

vvysotskyi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvysotskyi commented May 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jinfengni commented May 26, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants