DRILL-5140: Fix CompileException in run-time generated code when record batch has…#818
DRILL-5140: Fix CompileException in run-time generated code when record batch has…#818vvysotskyi wants to merge 2 commits intoapache:masterfrom
Conversation
| public static enum BlockType {SETUP, EVAL, RESET, CLEANUP}; | ||
|
|
||
| /** | ||
| * Field has 2 indexes within the constant pull: field item + name and type item. |
There was a problem hiding this comment.
I could not make sense how you calculate this number of 26767. According to JVM spec [1], the constant pool table consists of 2 or more bytes "info" section for each item. In some case like class name, one item could use 2 bytes, while in other cases like field, method, one item could use 4 bytes. Integer uses 4 bytes, and Long constants uses 8 bytes.
That is, if I have a class with fields < 26767, we may still hit the complain of "too many constants".
There was a problem hiding this comment.
I missed the third index. Fixed estimation of max index value, thanks. Also added comment that explains the way, how it is calculating.
| final ClassNames nextGenerated = nextSet.generated; | ||
| final ClassNode generatedNode = classesToMerge.get(nextGenerated.slash); | ||
| Pair<byte[], ClassNode> classNodePair = classesToMerge.remove(nextGenerated.slash); | ||
| final ClassNode generatedNode; |
There was a problem hiding this comment.
Can you add comments to explain why you are calling remove() on classesToMerge?
| if (classNodePair != null) { | ||
| generatedNode = classNodePair.getValue(); | ||
| } else { | ||
| generatedNode = null; |
There was a problem hiding this comment.
What does it mean by "generatedNode = null"? Will it cause problem in subsequent call at line 288?
There was a problem hiding this comment.
It is needed for the case when generated class was empty. Then classesToMerge would be empty too and classesToMerge.remove(nextGenerated.slash) would return null.
No, it wouldn't. In the method MergeAdapter.getMergedClass() there are a many checks for this value.
| } | ||
|
|
||
| for (Map.Entry<String, Pair<byte[], ClassNode>> clazz : classesToMerge.entrySet()) { | ||
| classLoader.injectByteCode(clazz.getKey().replace(FileUtils.separatorChar, '.'), clazz.getValue().getKey()); |
|
Was this improvement tested with the "plain Java" mode turned on? Most of the work is in the byte-code fixup, but the generated Java must be validated that it works when compiled directly. Let me know if you need assistance. |
…rd batch has large number of fields.
vvysotskyi
left a comment
There was a problem hiding this comment.
@jinfengni I have added comments, and made changes. Could you please take a look?
| final ClassNames nextGenerated = nextSet.generated; | ||
| final ClassNode generatedNode = classesToMerge.get(nextGenerated.slash); | ||
| Pair<byte[], ClassNode> classNodePair = classesToMerge.remove(nextGenerated.slash); | ||
| final ClassNode generatedNode; |
| if (classNodePair != null) { | ||
| generatedNode = classNodePair.getValue(); | ||
| } else { | ||
| generatedNode = null; |
There was a problem hiding this comment.
It is needed for the case when generated class was empty. Then classesToMerge would be empty too and classesToMerge.remove(nextGenerated.slash) would return null.
No, it wouldn't. In the method MergeAdapter.getMergedClass() there are a many checks for this value.
| } | ||
|
|
||
| for (Map.Entry<String, Pair<byte[], ClassNode>> clazz : classesToMerge.entrySet()) { | ||
| classLoader.injectByteCode(clazz.getKey().replace(FileUtils.separatorChar, '.'), clazz.getValue().getKey()); |
| public static enum BlockType {SETUP, EVAL, RESET, CLEANUP}; | ||
|
|
||
| /** | ||
| * Field has 2 indexes within the constant pull: field item + name and type item. |
There was a problem hiding this comment.
I missed the third index. Fixed estimation of max index value, thanks. Also added comment that explains the way, how it is calculating.
|
@paul-rogers Yes, I tested this fix with turned on the "plain Java" mode. Debugging generated code using an IDE considerably helped me. Thanks for offering assistance. |
| private LinkedList<SizedJBlock>[] oldBlocks; | ||
|
|
||
| /** | ||
| * Assumed that field has 3 indexes within the constant pull: index of the CONSTANT_Fieldref_info + |
There was a problem hiding this comment.
I'm not entirely sure the calculation is correct, in terms of # of entries per field in constant pool of a class.
Per JVM spec, each class field has CONSTANT_Fieldref_info (1 entry), which has class_index and name_and_type_index. The class_index points CONSTANT_Class_info, which is shared by across all the class fields. The second points to CONSTANT_NameAndType_info (1 entry), which points to name (1 entry) and descriptor (1 entry). Therefore, for each class field, at least 4 entries are required in constant pool. Similarly, we could get 4 entries for each method.
Besides fields and methods, we also have to take constant literal into account, like int, float , string ... constant. For constant literals, since we apply source-code copy for build-in-function /udf, it's hard to figure out exactly how many constants are used in the generated class.
Given the above reasons, I'm not sure whether it makes sense to try to come up with a formula to estimate the maximum # of fields a generated class could have. If the estimation is not accurate, then what if we just provides a ballpark estimation and put some 'magic' number here?
There was a problem hiding this comment.
In this calculation is taken into account that CONSTANT_NameAndType_info.descriptor has limited range of its values, so it was taken 3 entries for the each class field and method.
In this formula supposed that each class field and local variable use different literal values that have two entries. I am agree with you that there may be cases that have not been covered by this formula.
The formula is needed for at least to consider the number of generated methods, difference between entries count for class fields and local variables. The 'magic' number 1000 was used in this formula to reserve constant pool for class references and unaccounted cases.
There was a problem hiding this comment.
Although the proposed formula might still hit problem in certain cases, it seems to work fine for normal cases. Also, for normal queries over hundreds of columns, the run-time generated code remains same as before. Given that, it makes sense to me to merge this code change.
|
+1 |
…rd batch has large number of fields. - Changed estimation of max index value and added comments. close apache#818
…rd batch has large number of fields. - Changed estimation of max index value and added comments. close apache#818
…rd batch has large number of fields.
Implement recursive splitting of run-time generated code to the inner classes when class members number is close to the class constant pool size.