Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only use ExprEval in ConstantExpr if its known that it will be safe #15694

Merged
merged 36 commits into from
Mar 5, 2024

Conversation

kgyrtkirk
Copy link
Member

As discussed after #15552 was merged (relevant conversation starts from this comment ) there might be a chance that the patch have caused the ConstantExpr classes to be unsafe due to the fact that ExprEval is unsafe as it contains a string-caching logic.

To become thread-safe for sure - but still retain the benefits of not producing ExprEval objects every time a constant is evaluated; this patch adds:

  • Expr#singleThreaded which creates a singleThreaded version of the actual expression (caching ExprEval is allowed)
  • Expr#makeSingleThreaded to make a whole subtree of expressions singleThreaded
    • uses Shuttle to create the new expression tree
  • ConstantExpr#singleThreaded creates a specialized ConstantExpr which does cache the ExprEval
  • some @Immutable annotations were added to make it more likely to notice that there might be something off if a similar change will be made around here for some reason

@kgyrtkirk kgyrtkirk marked this pull request as ready for review January 16, 2024 20:50
@LakshSingla
Copy link
Contributor

I had a question regarding this: The original PR aimed to optimize the case with Expr creating a new ExprEval for every constant expression, which we can do away with. WDYT about creating a precached ExprEval (which is thread-safe) for constant expressions, instead of creating a thread-unsafe Expr.
Like in the original patch, it isn't on the caller to figure out if the usage will be thread-unsafe, and remember to call it and the code won't have to traverse the expression tree. The only downside will be pre-caching computing unnecessary stuff, however in one of the comments I read that we do call toString() most of the time, so perhaps it's not as bad.

@clintropolis
Copy link
Member

clintropolis commented Jan 17, 2024

WDYT about creating a precached ExprEval (which is thread-safe) for constant expressions, instead of creating a thread-unsafe Expr.

Expr are threadsafe typically since they are immutable, the problem is that Expr for constants was modified to contain an ExprEval which are not thread-safe. I think I would agree it probably a bit nicer to make a thread-safe ExprEval for that Expr to cache instead of making Expr itself have the thread-safety stuff baked in. Though, it is sort of contrary to the original discussion, so I'm wondering what besides constants might benefit from singleThreaded.

@LakshSingla
Copy link
Contributor

LakshSingla commented Jan 17, 2024

the problem is that Expr for constants was modified to contain an ExprEval which are not thread-safe.

I meant the ConstantExprs would hold a thread-safe version of ExprEval by having the ExprEval's have pre-cached string values in the constructor. Lemme see if I can put my thoughts into some pseudo-code.

@clintropolis
Copy link
Member

clintropolis commented Jan 17, 2024

sorry accidentally hit comment button while i was typing out previous comment 😅 I think i was more or less agreeing with you that it feels a bit off that the Expr itself would not be threadsafe, but maybe its just being used to the way things are right now.

@kgyrtkirk
Copy link
Member Author

The reason I've stuck to this approach beause @gianm have mentioned that this approach might be usefull in the future around at the end of this comment.

Making an immutable ExprEval for this would be also a valid approach for sure - by creating one which doesn't rely on those fields by pre-computing all values and just returning it

partial code
  public ExprEval<T> makeImmutableExprEval(ExprEval e) {
    int intVal = e.asInt();
    long longVal = e.asLong();
    String stringVal = e.asString();
    

    return new ExprEval<T>() {

      @Override
      public int asInt()
      {
        return intVal;
      }

      @Override
      public long asLong()
      {
        return longVal;
      }

      @Override
      public String asString()
      {
        return asString();
      }
      //[...]
    }
  }

Another alternate approach could be to change the way caching works in ExprEval to not hold 2 fields; instead just one :

  • something like Optional<Supplier<String>> stringValue (can be a custom class as well - but I guess this type tells the story) - so that the field is either seen as assigned ; or unassigned (and could possibly be filled in by multiple threads - which I guess is not an issue here).
  • similarily for other cached stuff in StringExprEval

@kgyrtkirk
Copy link
Member Author

note I was looking a bit into the above alternate approaches:

I think the valueHolder might also worth considering - or remain with the #singleThreaded

* For top level expressions use {@link Expr#makeSingleThreaded(Expr)} to
* obtain the single-threaded version of the expression tree.
*/
default Expr singleThreaded()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is a usage mistake waiting to happen. People will call expr.singleThreaded() and then be surprised that it doesn't actually make a single-threaded expr tree. IMO it's better for this method to apply a visitor that goes and replaces all the ConstantExprs, even though it takes some control away from non-constant Exprs to customize their behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made changes to this and replaced it with a better approach

@kgyrtkirk
Copy link
Member Author

its kinda the same what we had in the original PR; but the impact of this highly depends on the number of constants at play....and although it has performance benefits - it really just reduces the amount of garbage being created and by that it even reduces the error of the execution times

I've done the following runs:

I always wanted to look into what tools are available to compare jmh results - the ui I've found is not really designed to compare 4 results...but I think its usable:

@gianm
Copy link
Contributor

gianm commented Feb 20, 2024

@kgyrtkirk what is your analysis of the benchmarking results— what are the take-aways, and do they tell us something about what is the best path?

@kgyrtkirk
Copy link
Member Author

I've made some minor changes the way this single threaded version is created/supported by exprs.

Before replying on perf/etc I wanted to see how a more convoluted case performs: I've added a case statement with a 100 branches.
The results are not that different from earlier ones:

  • It performs around 10% better for these expressions
  • Measurement error is reduced because it reduces GC pressure.
  • I remember a bit more relevant differences for a real query, but never digged into very deeply

revert Vs this branch

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea LGTM. Had a few comments.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still find this change to be sad because its basically a forced function since there is no good reason to be calling eval without first going through the single threaded stuff, but doing it better would just be too much work that it doesn't feel worth it and hopefully we can just drop all non-vectorized evaluation someday so this can just go away

@soumyava
Copy link
Contributor

soumyava commented Mar 5, 2024

The doc failure is unrelated, we can merge this

@clintropolis clintropolis merged commit 27d7c30 into apache:master Mar 5, 2024
82 of 83 checks passed
@cryptoe cryptoe added this to the 29.0.1 milestone Mar 6, 2024
cryptoe added a commit to cryptoe/druid that referenced this pull request Mar 11, 2024
…pache#15694)

*  which creates a singleThreaded version of the actual expression (caching ExprEval is allowed)
*  to make a whole subtree of expressions 'singleThreaded' - uses  to create the new expression tree
*  creates a specialized  which does cache the
* some  annotations were added to make it more likely to notice that there might be something off if a similar change will be made around here for some reason

(cherry picked from commit 27d7c30)
cryptoe added a commit that referenced this pull request Mar 12, 2024
…l be safe (#15694) (#16100)

*  which creates a singleThreaded version of the actual expression (caching ExprEval is allowed)
*  to make a whole subtree of expressions 'singleThreaded' - uses  to create the new expression tree
*  creates a specialized  which does cache the
* some  annotations were added to make it more likely to notice that there might be something off if a similar change will be made around here for some reason

(cherry picked from commit 27d7c30)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants