-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Expressions Engine using ExprTk #1354
Conversation
@ArashPartow ExprTk is an awesome library—I just wanted to say thank you for all the hard work you've put/continue to put into it. It's been awesome using it and integrating it with our API, and I'm excited to be bringing its breadth of functionality into our product. |
Interesting approach using t_scalar as the template arg for exprtk. I didn't think of that when I implemented this in my original fork of perspective, instead typing directly as t_float64 and upcasting when calculating the computed columns in gnode->process(). Looks like your approach supports string-returning computations too. Are you still thinking of implementing autocomplete on the javascript side? |
It would also be interesting to think about how to use exprtk for custom aggregates - could use the vector processing (section 14 of exprtk docs - https://github.com/ArashPartow/exprtk/blob/master/readme.txt#L1712) and hook it up here - https://github.com/finos/perspective/blob/master/cpp/perspective/src/cpp/sparse_tree.cpp#L1144-L1147 using median as an prototype (which processes a vector of values representing all the values in that sparse tree node). Given you've already hooked up exprtk evaluation to t_scalars should be pretty straightforward. |
@nmichaud We just started experimenting today with the idea of re-purposing ExprTK work to replace the existing filters. A nice side effect of this is, without any other changes to the engine or UI, this very cleanly allows you to use filters both to reduce an e.g. bar chart to aggregates satisfying a predicate (as is possible in current perspective), but also alternatively paint the same predicate and chart using the boolean-returning ExprTK column as a column-pivot (not currently possible through the UI or a single table). We need to look into this further for aggregates - vector processing makes it easy to represent Regarding vector processing specifically, something I'm very excited about that we can likely do much sooner is implement hard-coded vector-returning functions within the ExprTK DSL itself, such as |
Not sure I follow what you mean by "paint[ing] the same predicate and chart". Another benefit to using exprtk for filtering is to support compound statements across multiple columns and drop the whole single-level combiner mess.
That's true - the exprtk function would be a bit contorted to support returning multiple types - https://github.com/ArashPartow/exprtk/blob/master/readme.txt#L3580-L3591, but it is possible.
Ah yeah, that's a great use case. |
@nmichaud this is a good example of "paint[ing] the same predicate and chart": Applying a In terms of autocomplete - column name autocompletion will be easy to implement, but since this expression language is far more complicated than our previous one, I think we'll need to look at Monaco or some other more managed text editor solution instead of trying to hand-roll our own. A giant chunk of this PR is just ripping out the old autocomplete + JS recursive descent parser from the first iteration. |
@sc1f Got it, thanks. I misunderstood what @texodus was saying about using it for filtering. Agreed that it is very useful! It's even more powerful if you can use it to 'project' selections across different pivots of the same table (ie brushing and linking): https://twitter.com/naveenma/status/804137377186856963 |
116411c
to
c052437
Compare
4cca820
to
2c964b8
Compare
One thing I noticed - the implementation builds the symbol table and reparses the computed expressions on each gnode update cycle (https://github.com/finos/perspective/pull/1354/files#diff-ecb1d4be36447785a2e972405912b5c20743a0502a2527f23bd9f8b5613f6a38R218-R266). While parsing is fast, it's duplicate work which can be done once on definition of the expression and stored on the t_computed_expression. Also once the expression is parsed, it's possible to track all dependent expression variables (see section 16 - https://github.com/ArashPartow/exprtk/blob/master/readme.txt#L2493) and only recompute if any dependent columns are in the flattened update table. |
60a5246
to
926868b
Compare
WIP: exprtk WIP: col() fn WIP: working WIP WIP: UI fix clean script WIP: exprtk takes scalars WIP: more scalars WIP reorg and add super basic type checking WIP cleanup type checking and string parser WIP WIP WIP: quick math implement recompute reorganize
add get dtype, type checker Real type checking start on UI WIP
Rewrite tests for expressions Fixes and remove old files post rebase WIP: use sc1f/emsdk, fix tests after rebase, fix regular-table to use expr schema recompute uses the right ridxs
Add dbkt() fn Add upper() WIP: store intermediate strings in str fns refactors add intern(), concat() Multiline editing
impl all remaining fns, use mknone() for unimpld start implementing python API
remove old computed_columns API fix date_bucket() use double quotes for colname
start fixing JS tests Fix update tests WIP WIP rewrite type-checking to use STATUS_CLEAR
… exprtk compatibility Comparisons return 1 or 0 WIP
add min, max, null, notnull, start testing for/if/while etc. more tests fix string handling
Add order(), length(), tests Add remaining datetime functions add percent a of b Add invariant tests
Add alias handling in the engine fix alias handling WIP: alias generated by viewer Alias tests and fix behavior
Fix python tests, rebase on master Fix python tests Fix tests and compile with bigobj fix tests post-rebase WIP: fix column add remove, add benchmarks
bin() fixed tests bucket() and test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot more work to be done on this feature, but it's already a massive improvement over what's in v0.8.x
and quite stable; what remains is mostly UX. ExprTK is the sole focus of v0.9.0
, and the next few PRs will refine the integration before this release.
Thanks @nmichaud and @ArashPartow for your feedback, and thanks @sc1f for this monumental PR (nearly 30k lines)!
This PR removes the old
computed_columns
API and replaces it with anexpressions
API that uses ExprTk for expression parsing and evaluation. This is the first step towards massively enhancing Perspective's functionality for cleaning, transforming, and augmenting data passed into it, and allows for a greater feature set than what was allowed for before.The use of ExprTk, specifically, enables a massive amount of new features, including:
"Sales" + (10 + 20 / sqrt(144))
if ("State" == "Texas) 1; else 0;
var x := 0; for (var i := 0; i < 10; i += 1) { x += i };
bucket()
, which combines the numeric and date bucket functions offered in the old APIorder(column, a, b, c...)
function, which allows custom sort orders on string columnsThis fully overhauls the functionality of the previous API, which used a custom-written parser in Javascript and a clunky, object-based API, with a simple API that takes a list of expression strings, which is now unified across Javascript:
And Python:
Examples of the new UI and expressions API in action:
There is still some work left to do before this is ready for a new Perspective release, but merging this large change before the next release allows work to be more effectively parallelized between the additional UI and engine fixes that are required.
Next steps
t_gstate
table