-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce amount of expression objects created during evaluations #15552
Reduce amount of expression objects created during evaluations #15552
Conversation
Benchmark (rowsPerSegment) Mode Cnt Score Error Units ExpressionSelectorBenchmark.arithmeticOnLong 1000000 avgt 5 83.731 ± 0.979 ms/op ExpressionSelectorBenchmark.strlenUsingExpressionAsLong 1000000 avgt 5 93.103 ± 15.030 ms/op ExpressionSelectorBenchmark.strlenUsingExpressionAsString 1000000 avgt 5 66.854 ± 0.532 ms/op ExpressionSelectorBenchmark.strlenUsingExtractionFn 1000000 avgt 5 32.302 ± 0.150 ms/op ExpressionSelectorBenchmark.timeFloorUsingCursor 1000000 avgt 5 83.572 ± 0.092 ms/op ExpressionSelectorBenchmark.timeFloorUsingExpression 1000000 avgt 5 91.505 ± 0.646 ms/op ExpressionSelectorBenchmark.timeFloorUsingExtractionFn 1000000 avgt 5 77.279 ± 0.757 ms/op ExpressionSelectorBenchmark.timeFormatUsingExpression 1000000 avgt 5 100.554 ± 1.365 ms/op ExpressionSelectorBenchmark.timeFormatUsingExtractionFn 1000000 avgt 5 77.211 ± 0.555 ms/op
Benchmark (rowsPerSegment) Mode Cnt Score Error Units ExpressionSelectorBenchmark.caseSearched1 1000000 avgt 10 9.264 ± 0.115 ms/op ExpressionSelectorBenchmark.caseSearched2 1000000 avgt 10 272.054 ± 9.239 ms/op ExpressionSelectorBenchmark.caseSearchedWithLookup 1000000 avgt 10 307.767 ± 5.537 ms/op
Benchmark (rowsPerSegment) Mode Cnt Score Error Units ExpressionSelectorBenchmark.caseSearched1 1000000 avgt 10 9.243 ± 0.030 ms/op ExpressionSelectorBenchmark.caseSearched2 1000000 avgt 10 215.796 ± 1.799 ms/op ExpressionSelectorBenchmark.caseSearchedWithLookup 1000000 avgt 10 285.949 ± 8.445 ms/op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM minor nits
@@ -464,15 +473,18 @@ public String toString() | |||
|
|||
class ComplexExpr extends ConstantExpr<Object> | |||
{ | |||
final ExprEval expr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious since we are evaluating this early, does the equals and hash method need to honor this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not included in equals / hashCode as its a computed value from the value
constructor argument;
as a matter of fact... ConstantExpr
also has an outputType
field which is also ignored in the equals
/ hashCode
stuff - as that is passed in as a constructor parameter - actually this could probably be refactored to remove that field; however I'm not sure what benefits that would give...
the other way around could be to build the ContantExpr
on top of an ExprEval
- and use ExprEval.value()
to get the inner value - but that approach would not like how the current BigIntegerExpr
works...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how often Expr
are used in equals/hashcode, my gut says not often since the thing that makes them also usually has the original string which was parsed into Expr
and is likely cheaper to just use the string
processing/src/main/java/org/apache/druid/math/expr/ConstantExpr.java
Outdated
Show resolved
Hide resolved
@@ -464,15 +473,18 @@ public String toString() | |||
|
|||
class ComplexExpr extends ConstantExpr<Object> | |||
{ | |||
final ExprEval expr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how often Expr
are used in equals/hashcode, my gut says not often since the thing that makes them also usually has the original string which was parsed into Expr
and is likely cheaper to just use the string
@@ -55,6 +55,9 @@ public static ExpressionType autoDetect(ExprEval eval, ExprEval otherEval) | |||
{ | |||
ExpressionType type = eval.type(); | |||
ExpressionType otherType = otherEval.type(); | |||
if (type == otherType && eval.value() != null && otherEval.value() != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need more than type == otherType
? Later on we do
type = eval.value() != null ? type : otherType;
otherType = otherEval.value() != null ? otherType : type;
so if the value is null then it just takes on the other type, so i think being null or not here doesn't matter, because if the value is null and the same type then it is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure - I was too cautious...if it will be null
; the type will be String
as per the comment above the method - so it will be only ==
if the other is also a String
In this mode, null values are {@link ExprType#STRING} typed, despite
potentially coming from an underlying numeric column, or when an underlying column was completely missing and so all values are null
regarding those lines:
if eval.value() == null
; after the 1st line type = otherType
; independently from the the 2nd line that will mean that effectively numeric(otherType, otherType)
will be called...
if(eval.value() == null) {
return otherType;
}
if(otherEval.value() == null) {
return type;
}
return numeric(type, otherType);
which made me wonder... can ExpressionType.getType() == COMPLEX
for this method? if it could this change could alter its behaviour - as earlier DOUBLE
was returned in that case
to avoid changing that behaviour - for now I've changed it to:
if (type == otherType && type.getType().isPrimitive()) {
return type;
}
should these pass / fail / or shouldn't exists at all?
final ExprEval<?> complexEval = ExprEval.ofComplex(ExpressionType.UNKNOWN_COMPLEX, new Object());
Assert.assertEquals(ExpressionType.DOUBLE, ExpressionTypeConversion.autoDetect(longEval, complexEval));
Assert.assertEquals(ExpressionType.DOUBLE, ExpressionTypeConversion.autoDetect(doubleEval, complexEval));
Assert.assertEquals(ExpressionType.DOUBLE, ExpressionTypeConversion.autoDetect(arrayEval, complexEval));
Assert.assertEquals(ExpressionType.DOUBLE, ExpressionTypeConversion.autoDetect(complexEval, complexEval));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question, so this function predates expressions supporting complex types, and shouldn't be used by anything that handles arrays or complex types.
additionally, the comment about nulls always being strings isn't quite as true as it once was, null values coming from columns use the correctly typed null constants, so its mostly null constants and nulls created without additional type information (bestEffortOf) which are string types.
I guess autoDetect should be an invalid exception if it is trying to process complex types, but autoDetect is also only used inside of eval methods as far as I can tell, Expr.getOutputType
all use other methods in this class like function
and operator
which do handle complex and array types and ensure that they are compatible, so I'm not sure it proves to be much of a problem in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure the isPrimitive check is needed, thinking about since this method is only used inside of eval and so will be called for every row the cheaper it can be the better, but its probably fine too if want to be conservative about the change
I wanted to remove the
|
If you really want to fix This would be a slight change of behavior from today where they are basically always handled by the functions that call |
This part is problematic since
To safely cache the One is to ensure that the Another is to add a class DoubleExpr extends ConstantExpr<Double>
{
@Nullable
private final ExprEval<?> eval;
// constructor used by Parser
DoubleExpr(Double value)
{
super(ExpressionType.DOUBLE, Preconditions.checkNotNull(value, "value"));
}
// constructor used by singleThreaded
DoubleExpr(ExprEval<?> eval)
{
super(ExpressionType.DOUBLE, Preconditions.checkNotNull(eval.value(), "value"));
this.eval = eval;
}
@Override // overrides method from Expr
public Expr singleThreaded()
{
return new DoubleExpr(ExprEval.ofDouble(value));
}
@Override
public ExprEval eval(ObjectBindings bindings)
{
// thread-safe no matter what, but more optimized if someone had called singleThreaded()
return eval != null ? eval : ExprEval.ofDouble(value);
}
} I'm partial to the second one, since I am thinking we might want to add a |
honestly: I was not thinking that should be counted in when I was making this change; I personally think that this will not end in erroreous behaviour
this reasoning could be wrong and even if it may not happen - it would be better to fix it; even thru the Did it caused any real trouble (I deeply sorry for the it) - but I would be very interested in learning more about it if it did! "makeItReallyImmutable" approachit could achieved with a call to the
|
I noticed this through reading the code, not through anything that happened in production. But this code hasn't shipped in a release yet, so probably not many people are running it in production yet. As to whether it's a real issue I would refer to: https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#wishful-benign-is-resilient There are now a bunch of races in various However, I do suspect the
I think it's possible for thread 1 to write Anyway, IMO even benign races are best avoided— since they are easy to mess up— so even if it was benign, I would still suggest changing the design of the class. |
Yeah I was thinking of something like
Ah, I forgot to add
That's a good question… it's definitely better to do this if we have the info about what type we're going to want from each Expr. Certainly sometimes we do. I'm not sure if we always have it. I think it's related to |
apache#15552)" This reverts commit 7552dc4.
I was looking into a query which was performing a bit poorly because the
case_searched
was touching more than1
columns (if there is only 1 column there is a cache based evaluator).While I was doing that I've noticed that there are a few simple things which could help a bit:
TRUE
/FALSE
instead of creating a new object every timeExprEval
early forConstantExpr
-s (except the one forBigInteger
which seem to have some odd contract)these changes mostly reduce the amount of garbage the query creates during
case_searched
evaluation; althoughExpressionSelectorBenchmark
shows some improvements~15%
- but my manual trials on the taxi dataset with 60M rows showed more improvements - probably due to the fact that these changes mostly only reduce gc pressure.q1.3
execution time improved from
6.69s
to4.08s
ExpressionSelectorBenchmark results