[SPARK-21751][SQL] CodeGeneraor.splitExpressions counts code size more precisely#18966
[SPARK-21751][SQL] CodeGeneraor.splitExpressions counts code size more precisely#18966kiszk wants to merge 9 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Could you add an internal SQLConf for it and make it adjustable?
|
Test build #80749 has finished for PR 18966 at commit
|
|
Test build #80776 has finished for PR 18966 at commit
|
|
ping @gatorsmile |
There was a problem hiding this comment.
Based on my understanding, this is not the number of characters? This is the length of source codes, right?
There was a problem hiding this comment.
If the length of source code does not mean the number of lines of the source code, you are right. This is because we check the sum of String.length.
If it is correct, I will update them based on the length of source codes.
Here are two examples.
abc -> 3
ab
c
-> 4
There was a problem hiding this comment.
To be honest, using the number of characters looks pretty fragile. It is even worse than using the number of lines.
There was a problem hiding this comment.
In this PR, do we change this parameter to use number of lines instead of number of characters, too? It is possible technically.
|
Test build #81100 has finished for PR 18966 at commit
|
There was a problem hiding this comment.
Does this threshold impact this test case?
There was a problem hiding this comment.
@gatorsmile Sorry, I updated the result. This threshold impacts this test case.
|
Test build #81109 has finished for PR 18966 at commit
|
There was a problem hiding this comment.
This flag does not take an effect at runtime.
There was a problem hiding this comment.
What do you mean? Could you elaborate this review comment?
There was a problem hiding this comment.
If you set the conf to a new value at runtime, can you get the value from SQLConf.get.maxCodegenLinesPerFunction?
There was a problem hiding this comment.
I see. I had another interpretation that "this value may not change performance".
Let me check this while I did it like other flags.
There was a problem hiding this comment.
Got it. Depends on calling context, it may take an effect or not. Should we pass I am thinking whether we can pass SQLConf to this method?SQLConf to a constructor or not.
Is there any other thoughts?
There was a problem hiding this comment.
I made this option effective by executing SparkEnv.get.conf
There was a problem hiding this comment.
@gatorsmile Since to make it configurable takes long time, can we do it using hard-coded parameter?
Even in this case, this PR makes better since the estimation does not include characters for comment.
There was a problem hiding this comment.
@kiszk You know, I am just afraid new regression could be introduced due to this change. Sorry for the delay. I really do not have a better solution. I kind of agree on your original solution. Just exclude the characters for comment. At least, it becomes better and take a less risk to hit a regression.
There was a problem hiding this comment.
@gatorsmile I understand your concern about the possibility of new performance regression. I will use the original threshold (max characters) as hard-coded value.
There was a problem hiding this comment.
We might need to emphasize this is not for whole-stage codegen.
There was a problem hiding this comment.
My understanding is not correct. Explain it for expressions only? rename it to
spark.sql.codegen.expressions.maxCodegenLinesPerFunction
|
Test build #81154 has finished for PR 18966 at commit
|
|
retest this please |
|
Test build #81160 has finished for PR 18966 at commit
|
|
ping @gatorsmile |
There was a problem hiding this comment.
This is not following what we are doing for the other SQLConf. I am also thinking if we should just put it into StaticSQLConf. Let me check it with others.
There was a problem hiding this comment.
There are multiple related PRs. Maybe we can wait until the reviews are finished there?
There was a problem hiding this comment.
Here, we do not know precise bytecode size. Will we use estimated bytecode size based on characters per line? Or, other ideas to get precise bytecode size before compiling a method?
|
Test build #82455 has finished for PR 18966 at commit
|
|
Test build #82518 has finished for PR 18966 at commit
|
| """([ |\t]*?\/\/[\s\S]*?\n)""").r // strip //comment | ||
| val codeWithoutComment = commentReg.replaceAllIn(input, "") | ||
| codeWithoutComment.replaceAll("""\n\s*\n""", "\n") // strip ExtraNewLines | ||
| } |
There was a problem hiding this comment.
Could you also add back the test case of this function?
|
LGTM pending Jenkins |
|
Test build #82596 has finished for PR 18966 at commit
|
|
Test build #82598 has finished for PR 18966 at commit
|
|
Thanks! Merged to master. |
What changes were proposed in this pull request?
Current
CodeGeneraor.splitExpressionssplits statements into methods if the total length of statements is more than 1024 characters. The length may include comments or empty line.This PR excludes comment or empty line from the length to reduce the number of generated methods in a class, by using
CodeFormatter.stripExtraNewLinesAndComments()method.How was this patch tested?
Existing tests