Skip to content

DRILL-5140: Fix CompileException in run-time generated code when record batch has…#818

Closed
vvysotskyi wants to merge 2 commits intoapache:masterfrom
vvysotskyi:DRILL-5140
Closed

DRILL-5140: Fix CompileException in run-time generated code when record batch has…#818
vvysotskyi wants to merge 2 commits intoapache:masterfrom
vvysotskyi:DRILL-5140

Conversation

@vvysotskyi
Copy link
Member

…rd batch has large number of fields.

Implement recursive splitting of run-time generated code to the inner classes when class members number is close to the class constant pool size.

public static enum BlockType {SETUP, EVAL, RESET, CLEANUP};

/**
* Field has 2 indexes within the constant pull: field item + name and type item.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not make sense how you calculate this number of 26767. According to JVM spec [1], the constant pool table consists of 2 or more bytes "info" section for each item. In some case like class name, one item could use 2 bytes, while in other cases like field, method, one item could use 4 bytes. Integer uses 4 bytes, and Long constants uses 8 bytes.

That is, if I have a class with fields < 26767, we may still hit the complain of "too many constants".

  1. http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the third index. Fixed estimation of max index value, thanks. Also added comment that explains the way, how it is calculating.

final ClassNames nextGenerated = nextSet.generated;
final ClassNode generatedNode = classesToMerge.get(nextGenerated.slash);
Pair<byte[], ClassNode> classNodePair = classesToMerge.remove(nextGenerated.slash);
final ClassNode generatedNode;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments to explain why you are calling remove() on classesToMerge?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (classNodePair != null) {
generatedNode = classNodePair.getValue();
} else {
generatedNode = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean by "generatedNode = null"? Will it cause problem in subsequent call at line 288?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed for the case when generated class was empty. Then classesToMerge would be empty too and classesToMerge.remove(nextGenerated.slash) would return null.
No, it wouldn't. In the method MergeAdapter.getMergedClass() there are a many checks for this value.

}

for (Map.Entry<String, Pair<byte[], ClassNode>> clazz : classesToMerge.entrySet()) {
classLoader.injectByteCode(clazz.getKey().replace(FileUtils.separatorChar, '.'), clazz.getValue().getKey());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@paul-rogers
Copy link
Contributor

Was this improvement tested with the "plain Java" mode turned on? Most of the work is in the byte-code fixup, but the generated Java must be validated that it works when compiled directly. Let me know if you need assistance.

Copy link
Member Author

@vvysotskyi vvysotskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinfengni I have added comments, and made changes. Could you please take a look?

final ClassNames nextGenerated = nextSet.generated;
final ClassNode generatedNode = classesToMerge.get(nextGenerated.slash);
Pair<byte[], ClassNode> classNodePair = classesToMerge.remove(nextGenerated.slash);
final ClassNode generatedNode;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (classNodePair != null) {
generatedNode = classNodePair.getValue();
} else {
generatedNode = null;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed for the case when generated class was empty. Then classesToMerge would be empty too and classesToMerge.remove(nextGenerated.slash) would return null.
No, it wouldn't. In the method MergeAdapter.getMergedClass() there are a many checks for this value.

}

for (Map.Entry<String, Pair<byte[], ClassNode>> clazz : classesToMerge.entrySet()) {
classLoader.injectByteCode(clazz.getKey().replace(FileUtils.separatorChar, '.'), clazz.getValue().getKey());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

public static enum BlockType {SETUP, EVAL, RESET, CLEANUP};

/**
* Field has 2 indexes within the constant pull: field item + name and type item.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the third index. Fixed estimation of max index value, thanks. Also added comment that explains the way, how it is calculating.

@vvysotskyi
Copy link
Member Author

@paul-rogers Yes, I tested this fix with turned on the "plain Java" mode. Debugging generated code using an IDE considerably helped me. Thanks for offering assistance.

private LinkedList<SizedJBlock>[] oldBlocks;

/**
* Assumed that field has 3 indexes within the constant pull: index of the CONSTANT_Fieldref_info +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure the calculation is correct, in terms of # of entries per field in constant pool of a class.

Per JVM spec, each class field has CONSTANT_Fieldref_info (1 entry), which has class_index and name_and_type_index. The class_index points CONSTANT_Class_info, which is shared by across all the class fields. The second points to CONSTANT_NameAndType_info (1 entry), which points to name (1 entry) and descriptor (1 entry). Therefore, for each class field, at least 4 entries are required in constant pool. Similarly, we could get 4 entries for each method.

Besides fields and methods, we also have to take constant literal into account, like int, float , string ... constant. For constant literals, since we apply source-code copy for build-in-function /udf, it's hard to figure out exactly how many constants are used in the generated class.

Given the above reasons, I'm not sure whether it makes sense to try to come up with a formula to estimate the maximum # of fields a generated class could have. If the estimation is not accurate, then what if we just provides a ballpark estimation and put some 'magic' number here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this calculation is taken into account that CONSTANT_NameAndType_info.descriptor has limited range of its values, so it was taken 3 entries for the each class field and method.

In this formula supposed that each class field and local variable use different literal values that have two entries. I am agree with you that there may be cases that have not been covered by this formula.

The formula is needed for at least to consider the number of generated methods, difference between entries count for class fields and local variables. The 'magic' number 1000 was used in this formula to reserve constant pool for class references and unaccounted cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the proposed formula might still hit problem in certain cases, it seems to work fine for normal cases. Also, for normal queries over hundreds of columns, the run-time generated code remains same as before. Given that, it makes sense to me to merge this code change.

@jinfengni
Copy link
Contributor

+1

jinfengni pushed a commit to jinfengni/incubator-drill that referenced this pull request Jun 1, 2017
…rd batch has large number of fields.

- Changed estimation of max index value and added comments.

close apache#818
jinfengni pushed a commit to jinfengni/incubator-drill that referenced this pull request Jun 1, 2017
…rd batch has large number of fields.

- Changed estimation of max index value and added comments.

close apache#818
@asfgit asfgit closed this in b14e30b Jun 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants