Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Expressions Engine using ExprTk #1354

Merged
merged 18 commits into from
May 12, 2021
Merged

Add Expressions Engine using ExprTk #1354

merged 18 commits into from
May 12, 2021

Conversation

sc1f
Copy link
Contributor

@sc1f sc1f commented Mar 24, 2021

This PR removes the old computed_columns API and replaces it with an expressions API that uses ExprTk for expression parsing and evaluation. This is the first step towards massively enhancing Perspective's functionality for cleaning, transforming, and augmenting data passed into it, and allows for a greater feature set than what was allowed for before.

The use of ExprTk, specifically, enables a massive amount of new features, including:

  • Scalars in expressions, which were previously column-to-column only: "Sales" + (10 + 20 / sqrt(144))
  • Conditionals: if ("State" == "Texas) 1; else 0;
  • Loops: var x := 0; for (var i := 0; i < 10; i += 1) { x += i };
  • (Almost) the full range of mathematical and trigonometry functions offered by ExprTk
  • Polymorphic functions such as bucket(), which combines the numeric and date bucket functions offered in the old API
  • An order(column, a, b, c...) function, which allows custom sort orders on string columns

This fully overhauls the functionality of the previous API, which used a custom-written parser in Javascript and a clunky, object-based API, with a simple API that takes a list of expression strings, which is now unified across Javascript:

const view = await table.view({
  expressions: ['"Sales" + "Profit" / 100', "if (\"State\" == 'Texas') 'okay!'; else 'hmm...'"]
})

And Python:

view = table.view({
  expressions: ['"Sales" + "Profit" / 100', "if (\"Order Date\" == \"Ship Date\") 'shipped same day'; else 'not same day'"]
})

Examples of the new UI and expressions API in action:

sine_exprtk

errors and help msg

There is still some work left to do before this is ready for a new Perspective release, but merging this large change before the next release allows work to be more effectively parallelized between the additional UI and engine fixes that are required.

Next steps

  • Update engine semantics to fully store expression columns per-context instead of sharing the underlying t_gstate table
  • Improve expression editor UI and move it out of the side panel
  • Add various new, useful functions
  • Rewrite documentation for new API

@sc1f sc1f marked this pull request as draft March 24, 2021 00:27
@sc1f
Copy link
Contributor Author

sc1f commented Mar 24, 2021

@ArashPartow ExprTk is an awesome library—I just wanted to say thank you for all the hard work you've put/continue to put into it. It's been awesome using it and integrating it with our API, and I'm excited to be bringing its breadth of functionality into our product.

@texodus texodus linked an issue Mar 24, 2021 that may be closed by this pull request
@texodus texodus added this to the 0.7.0 milestone Mar 24, 2021
@nmichaud
Copy link
Contributor

nmichaud commented Mar 25, 2021

Interesting approach using t_scalar as the template arg for exprtk. I didn't think of that when I implemented this in my original fork of perspective, instead typing directly as t_float64 and upcasting when calculating the computed columns in gnode->process(). Looks like your approach supports string-returning computations too.

Are you still thinking of implementing autocomplete on the javascript side?

@nmichaud
Copy link
Contributor

nmichaud commented Mar 25, 2021

It would also be interesting to think about how to use exprtk for custom aggregates - could use the vector processing (section 14 of exprtk docs - https://github.com/ArashPartow/exprtk/blob/master/readme.txt#L1712) and hook it up here - https://github.com/finos/perspective/blob/master/cpp/perspective/src/cpp/sparse_tree.cpp#L1144-L1147 using median as an prototype (which processes a vector of values representing all the values in that sparse tree node). Given you've already hooked up exprtk evaluation to t_scalars should be pretty straightforward.

@texodus
Copy link
Member

texodus commented Mar 25, 2021

@nmichaud We just started experimenting today with the idea of re-purposing ExprTK work to replace the existing filters. A nice side effect of this is, without any other changes to the engine or UI, this very cleanly allows you to use filters both to reduce an e.g. bar chart to aggregates satisfying a predicate (as is possible in current perspective), but also alternatively paint the same predicate and chart using the boolean-returning ExprTK column as a column-pivot (not currently possible through the UI or a single table).

We need to look into this further for aggregates - vector processing makes it easy to represent getDepencencies() itself via symbol injection, but only if its a numeric type since I do not believe we can replicate the t_tscalar template with primitives. Current aggregate functions, at least, are free to change type between pivot depth levels e.g. count().
In addition to polluting the DSL's API with type-dispatch, I think this may make type-checking somewhat bizarre as we'll need to recursively apply it to each level (e.g. something error messages that apply to specific recursion depths like Type Error - Cannot use aggregate "int_to_string()" at pivot depth level 2, as pivot depth level 1 converted this depth to a string).

Regarding vector processing specifically, something I'm very excited about that we can likely do much sooner is implement hard-coded vector-returning functions within the ExprTK DSL itself, such as window(), which will allow trivial implementation of pseudo-stateful transforms like moving averages directly within the GUI.

@nmichaud
Copy link
Contributor

nmichaud commented Mar 25, 2021

but also alternatively paint the same predicate and chart using the boolean-returning ExprTK column as a column-pivot (not currently possible through the UI or a single table).

Not sure I follow what you mean by "paint[ing] the same predicate and chart". Another benefit to using exprtk for filtering is to support compound statements across multiple columns and drop the whole single-level combiner mess.

We need to look into this further for aggregates - vector processing makes it easy to represent getDepencencies() itself via symbol injection, but only if its a numeric type since I do not believe we can replicate the t_tscalar template with primitives. Current aggregate functions, at least, are free to change type between pivot depth levels e.g. count().
In addition to polluting the DSL's API with type-dispatch, I think this may make type-checking somewhat bizarre as we'll need to recursively apply it to each level (e.g. something error messages that apply to specific recursion depths like Type Error - Cannot use aggregate "int_to_string()" at pivot depth level 2, as pivot depth level 1 converted this depth to a string).

That's true - the exprtk function would be a bit contorted to support returning multiple types - https://github.com/ArashPartow/exprtk/blob/master/readme.txt#L3580-L3591, but it is possible.

Regarding vector processing specifically, something I'm very excited about that we can likely do much sooner is implement hard-coded vector-returning functions within the ExprTK DSL itself, such as window(), which will allow trivial implementation of pseudo-stateful transforms like moving averages directly within the GUI.

Ah yeah, that's a great use case.

@sc1f
Copy link
Contributor Author

sc1f commented Mar 25, 2021

@nmichaud this is a good example of "paint[ing] the same predicate and chart":

Screen Shot 2021-03-24 at 10 45 27 PM

Applying a discount > 0 filter will remove the rows for which the filter returns false, but using the expression here allows us to see discount > 0 over the totality of the data, which will be useful in a lot of cases.

In terms of autocomplete - column name autocompletion will be easy to implement, but since this expression language is far more complicated than our previous one, I think we'll need to look at Monaco or some other more managed text editor solution instead of trying to hand-roll our own. A giant chunk of this PR is just ripping out the old autocomplete + JS recursive descent parser from the first iteration.

@nmichaud
Copy link
Contributor

nmichaud commented Mar 25, 2021

@sc1f Got it, thanks. I misunderstood what @texodus was saying about using it for filtering. Agreed that it is very useful! It's even more powerful if you can use it to 'project' selections across different pivots of the same table (ie brushing and linking): https://twitter.com/naveenma/status/804137377186856963

@texodus texodus modified the milestones: 0.8.0, 0.9.0 Apr 28, 2021
@sc1f
Copy link
Contributor Author

sc1f commented May 7, 2021

expressions

Added a few new features to the editor - the help panel displays all operators, functions, and control flow operators so that the syntax is clear and discoverable, and error messages are now forwarded from the engine.

@nmichaud
Copy link
Contributor

nmichaud commented May 7, 2021

One thing I noticed - the implementation builds the symbol table and reparses the computed expressions on each gnode update cycle (https://github.com/finos/perspective/pull/1354/files#diff-ecb1d4be36447785a2e972405912b5c20743a0502a2527f23bd9f8b5613f6a38R218-R266). While parsing is fast, it's duplicate work which can be done once on definition of the expression and stored on the t_computed_expression. Also once the expression is parsed, it's possible to track all dependent expression variables (see section 16 - https://github.com/ArashPartow/exprtk/blob/master/readme.txt#L2493) and only recompute if any dependent columns are in the flattened update table.

@sc1f sc1f force-pushed the exprtk branch 2 times, most recently from 60a5246 to 926868b Compare May 10, 2021 06:55
@sc1f sc1f marked this pull request as ready for review May 10, 2021 07:00
sc1f and others added 18 commits May 12, 2021 00:07
WIP: exprtk

WIP: col() fn

WIP: working

WIP

WIP: UI

fix clean script

WIP: exprtk takes scalars

WIP: more scalars

WIP reorg and add super basic type checking

WIP cleanup type checking and string parser

WIP

WIP

WIP: quick math

implement recompute

reorganize
add get dtype, type checker

Real type checking

start on UI

WIP
Rewrite tests for expressions

Fixes and remove old files post rebase

WIP: use sc1f/emsdk, fix tests after rebase, fix regular-table to use expr schema

recompute uses the right ridxs
Add dbkt() fn

Add upper()

WIP: store intermediate strings in str fns

refactors

add intern(), concat()

Multiline editing
impl all remaining fns, use mknone() for unimpld

start implementing python API
remove old computed_columns API

fix date_bucket()

use double quotes for colname
start fixing JS tests

Fix update tests

WIP

WIP

rewrite type-checking to use STATUS_CLEAR
… exprtk compatibility

Comparisons return 1 or 0

WIP
add min, max, null, notnull, start testing for/if/while etc.

more tests

fix string handling
Add order(), length(), tests

Add remaining datetime functions

add percent a of b

Add invariant tests
Add alias handling in the engine

fix alias handling

WIP: alias generated by viewer

Alias tests and fix behavior
Fix python tests, rebase on master

Fix python tests

Fix tests and compile with bigobj

fix tests post-rebase

WIP: fix column add remove, add benchmarks
bin()

fixed tests

bucket() and test
Copy link
Member

@texodus texodus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot more work to be done on this feature, but it's already a massive improvement over what's in v0.8.x and quite stable; what remains is mostly UX. ExprTK is the sole focus of v0.9.0, and the next few PRs will refine the integration before this release.

Thanks @nmichaud and @ArashPartow for your feedback, and thanks @sc1f for this monumental PR (nearly 30k lines)!

@texodus texodus merged commit c97fb0a into master May 12, 2021
@texodus texodus deleted the exprtk branch May 12, 2021 05:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment