Add Expressions Engine using ExprTk #1354

sc1f · 2021-03-24T00:27:38Z

This PR removes the old computed_columns API and replaces it with an expressions API that uses ExprTk for expression parsing and evaluation. This is the first step towards massively enhancing Perspective's functionality for cleaning, transforming, and augmenting data passed into it, and allows for a greater feature set than what was allowed for before.

The use of ExprTk, specifically, enables a massive amount of new features, including:

Scalars in expressions, which were previously column-to-column only: "Sales" + (10 + 20 / sqrt(144))
Conditionals: if ("State" == "Texas) 1; else 0;
Loops: var x := 0; for (var i := 0; i < 10; i += 1) { x += i };
(Almost) the full range of mathematical and trigonometry functions offered by ExprTk
Polymorphic functions such as bucket(), which combines the numeric and date bucket functions offered in the old API
An order(column, a, b, c...) function, which allows custom sort orders on string columns

This fully overhauls the functionality of the previous API, which used a custom-written parser in Javascript and a clunky, object-based API, with a simple API that takes a list of expression strings, which is now unified across Javascript:

const view = await table.view({
  expressions: ['"Sales" + "Profit" / 100', "if (\"State\" == 'Texas') 'okay!'; else 'hmm...'"]
})

And Python:

view = table.view({
  expressions: ['"Sales" + "Profit" / 100', "if (\"Order Date\" == \"Ship Date\") 'shipped same day'; else 'not same day'"]
})

Examples of the new UI and expressions API in action:

There is still some work left to do before this is ready for a new Perspective release, but merging this large change before the next release allows work to be more effectively parallelized between the additional UI and engine fixes that are required.

Next steps

Update engine semantics to fully store expression columns per-context instead of sharing the underlying t_gstate table
Improve expression editor UI and move it out of the side panel
Add various new, useful functions
Rewrite documentation for new API

sc1f · 2021-03-24T00:49:14Z

@ArashPartow ExprTk is an awesome library—I just wanted to say thank you for all the hard work you've put/continue to put into it. It's been awesome using it and integrating it with our API, and I'm excited to be bringing its breadth of functionality into our product.

nmichaud · 2021-03-25T00:36:13Z

Interesting approach using t_scalar as the template arg for exprtk. I didn't think of that when I implemented this in my original fork of perspective, instead typing directly as t_float64 and upcasting when calculating the computed columns in gnode->process(). Looks like your approach supports string-returning computations too.

Are you still thinking of implementing autocomplete on the javascript side?

nmichaud · 2021-03-25T01:21:59Z

It would also be interesting to think about how to use exprtk for custom aggregates - could use the vector processing (section 14 of exprtk docs - https://github.com/ArashPartow/exprtk/blob/master/readme.txt#L1712) and hook it up here - https://github.com/finos/perspective/blob/master/cpp/perspective/src/cpp/sparse_tree.cpp#L1144-L1147 using median as an prototype (which processes a vector of values representing all the values in that sparse tree node). Given you've already hooked up exprtk evaluation to t_scalars should be pretty straightforward.

texodus · 2021-03-25T02:00:27Z

@nmichaud We just started experimenting today with the idea of re-purposing ExprTK work to replace the existing filters. A nice side effect of this is, without any other changes to the engine or UI, this very cleanly allows you to use filters both to reduce an e.g. bar chart to aggregates satisfying a predicate (as is possible in current perspective), but also alternatively paint the same predicate and chart using the boolean-returning ExprTK column as a column-pivot (not currently possible through the UI or a single table).

We need to look into this further for aggregates - vector processing makes it easy to represent getDepencencies() itself via symbol injection, but only if its a numeric type since I do not believe we can replicate the t_tscalar template with primitives. Current aggregate functions, at least, are free to change type between pivot depth levels e.g. count().
In addition to polluting the DSL's API with type-dispatch, I think this may make type-checking somewhat bizarre as we'll need to recursively apply it to each level (e.g. something error messages that apply to specific recursion depths like Type Error - Cannot use aggregate "int_to_string()" at pivot depth level 2, as pivot depth level 1 converted this depth to a string).

Regarding vector processing specifically, something I'm very excited about that we can likely do much sooner is implement hard-coded vector-returning functions within the ExprTK DSL itself, such as window(), which will allow trivial implementation of pseudo-stateful transforms like moving averages directly within the GUI.

nmichaud · 2021-03-25T02:19:06Z

but also alternatively paint the same predicate and chart using the boolean-returning ExprTK column as a column-pivot (not currently possible through the UI or a single table).

Not sure I follow what you mean by "paint[ing] the same predicate and chart". Another benefit to using exprtk for filtering is to support compound statements across multiple columns and drop the whole single-level combiner mess.

We need to look into this further for aggregates - vector processing makes it easy to represent getDepencencies() itself via symbol injection, but only if its a numeric type since I do not believe we can replicate the t_tscalar template with primitives. Current aggregate functions, at least, are free to change type between pivot depth levels e.g. count().
In addition to polluting the DSL's API with type-dispatch, I think this may make type-checking somewhat bizarre as we'll need to recursively apply it to each level (e.g. something error messages that apply to specific recursion depths like Type Error - Cannot use aggregate "int_to_string()" at pivot depth level 2, as pivot depth level 1 converted this depth to a string).

That's true - the exprtk function would be a bit contorted to support returning multiple types - https://github.com/ArashPartow/exprtk/blob/master/readme.txt#L3580-L3591, but it is possible.

Regarding vector processing specifically, something I'm very excited about that we can likely do much sooner is implement hard-coded vector-returning functions within the ExprTK DSL itself, such as window(), which will allow trivial implementation of pseudo-stateful transforms like moving averages directly within the GUI.

Ah yeah, that's a great use case.

sc1f · 2021-03-25T02:49:28Z

@nmichaud this is a good example of "paint[ing] the same predicate and chart":

Applying a discount > 0 filter will remove the rows for which the filter returns false, but using the expression here allows us to see discount > 0 over the totality of the data, which will be useful in a lot of cases.

In terms of autocomplete - column name autocompletion will be easy to implement, but since this expression language is far more complicated than our previous one, I think we'll need to look at Monaco or some other more managed text editor solution instead of trying to hand-roll our own. A giant chunk of this PR is just ripping out the old autocomplete + JS recursive descent parser from the first iteration.

nmichaud · 2021-03-25T03:07:48Z

@sc1f Got it, thanks. I misunderstood what @texodus was saying about using it for filtering. Agreed that it is very useful! It's even more powerful if you can use it to 'project' selections across different pivots of the same table (ie brushing and linking): https://twitter.com/naveenma/status/804137377186856963

sc1f · 2021-05-07T14:28:53Z

Added a few new features to the editor - the help panel displays all operators, functions, and control flow operators so that the syntax is clear and discoverable, and error messages are now forwarded from the engine.

nmichaud · 2021-05-07T14:57:26Z

One thing I noticed - the implementation builds the symbol table and reparses the computed expressions on each gnode update cycle (https://github.com/finos/perspective/pull/1354/files#diff-ecb1d4be36447785a2e972405912b5c20743a0502a2527f23bd9f8b5613f6a38R218-R266). While parsing is fast, it's duplicate work which can be done once on definition of the expression and stored on the t_computed_expression. Also once the expression is parsed, it's possible to track all dependent expression variables (see section 16 - https://github.com/ArashPartow/exprtk/blob/master/readme.txt#L2493) and only recompute if any dependent columns are in the flattened update table.

WIP: exprtk WIP: col() fn WIP: working WIP WIP: UI fix clean script WIP: exprtk takes scalars WIP: more scalars WIP reorg and add super basic type checking WIP cleanup type checking and string parser WIP WIP WIP: quick math implement recompute reorganize

add get dtype, type checker Real type checking start on UI WIP

Rewrite tests for expressions Fixes and remove old files post rebase WIP: use sc1f/emsdk, fix tests after rebase, fix regular-table to use expr schema recompute uses the right ridxs

Add dbkt() fn Add upper() WIP: store intermediate strings in str fns refactors add intern(), concat() Multiline editing

impl all remaining fns, use mknone() for unimpld start implementing python API

remove old computed_columns API fix date_bucket() use double quotes for colname

start fixing JS tests Fix update tests WIP WIP rewrite type-checking to use STATUS_CLEAR

… exprtk compatibility Comparisons return 1 or 0 WIP

add min, max, null, notnull, start testing for/if/while etc. more tests fix string handling

Add order(), length(), tests Add remaining datetime functions add percent a of b Add invariant tests

Add alias handling in the engine fix alias handling WIP: alias generated by viewer Alias tests and fix behavior

Fix python tests, rebase on master Fix python tests Fix tests and compile with bigobj fix tests post-rebase WIP: fix column add remove, add benchmarks

bin() fixed tests bucket() and test

texodus

There is a lot more work to be done on this feature, but it's already a massive improvement over what's in v0.8.x and quite stable; what remains is mostly UX. ExprTK is the sole focus of v0.9.0, and the next few PRs will refine the integration before this release.

Thanks @nmichaud and @ArashPartow for your feedback, and thanks @sc1f for this monumental PR (nearly 30k lines)!

finos-cla-bot bot added the cla-present label Mar 24, 2021

sc1f marked this pull request as draft March 24, 2021 00:27

sc1f added breaking C++ JS Python labels Mar 24, 2021

texodus linked an issue Mar 24, 2021 that may be closed by this pull request

Computed columns should accept scalars #1279

Closed

texodus added this to the 0.7.0 milestone Mar 24, 2021

sc1f mentioned this pull request Mar 27, 2021

New Column text box not editable in PerspectiveWidget #1370

Closed

sc1f force-pushed the exprtk branch 6 times, most recently from 116411c to c052437 Compare April 5, 2021 21:10

sc1f force-pushed the exprtk branch 3 times, most recently from 4cca820 to 2c964b8 Compare April 6, 2021 13:55

texodus modified the milestones: 0.7.0, 0.8.0 Apr 14, 2021

sc1f force-pushed the exprtk branch from 953c8d6 to 074ffea Compare April 22, 2021 14:18

texodus modified the milestones: 0.8.0, 0.9.0 Apr 28, 2021

sc1f force-pushed the exprtk branch 2 times, most recently from 60a5246 to 926868b Compare May 10, 2021 06:55

sc1f marked this pull request as ready for review May 10, 2021 07:00

sc1f and others added 18 commits May 12, 2021 00:07

Add type checking, fix UI

5b880d6

add get dtype, type checker Real type checking start on UI WIP

Remove old APIs on viewer, jlab, d3fc

edb886a

Fix tests, start removing old API

2d440b8

Rewrite tests for expressions Fixes and remove old files post rebase WIP: use sc1f/emsdk, fix tests after rebase, fix regular-table to use expr schema recompute uses the right ridxs

Add string functions, date bucket function

ee9dba6

Add dbkt() fn Add upper() WIP: store intermediate strings in str fns refactors add intern(), concat() Multiline editing

Implement exprtk default functions

c04809d

impl all remaining fns, use mknone() for unimpld start implementing python API

Remove old computed columns API entirely

52490c8

remove old computed_columns API fix date_bucket() use double quotes for colname

Fix JS tests, rewrite type-checking to use STATUS_CLEAR

8fbd718

start fixing JS tests Fix update tests WIP WIP rewrite type-checking to use STATUS_CLEAR

Rewrite comparisons to return 1 or 0 instead of DTYPE_BOOL for better…

1fbe733

… exprtk compatibility Comparisons return 1 or 0 WIP

Add min, max, null, notnull, start tests for conditionals

375f973

add min, max, null, notnull, start testing for/if/while etc. more tests fix string handling

Add order, length, percent, more tests

c69f290

Add order(), length(), tests Add remaining datetime functions add percent a of b Add invariant tests

Allow aliasing of columns using // comment

e12d6b4

Add alias handling in the engine fix alias handling WIP: alias generated by viewer Alias tests and fix behavior

Fix expression column add remove on viewer, add benchmarks

3f25d7d

Fix python tests, rebase on master Fix python tests Fix tests and compile with bigobj fix tests post-rebase WIP: fix column add remove, add benchmarks

Add bucket() function

7259946

bin() fixed tests bucket() and test

Add scaffold for error reporting

f6fb313

Add help menu

c40814c

bucket() num and datetime

4cf305a

Generate tests

0a1e9b8

texodus force-pushed the exprtk branch from 926868b to 0a1e9b8 Compare May 12, 2021 04:29

texodus approved these changes May 12, 2021

View reviewed changes

texodus merged commit c97fb0a into master May 12, 2021

texodus deleted the exprtk branch May 12, 2021 05:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Expressions Engine using ExprTk #1354

Add Expressions Engine using ExprTk #1354

sc1f commented Mar 24, 2021 •

edited

Loading

sc1f commented Mar 24, 2021

nmichaud commented Mar 25, 2021 •

edited

Loading

nmichaud commented Mar 25, 2021 •

edited

Loading

texodus commented Mar 25, 2021 •

edited

Loading

nmichaud commented Mar 25, 2021 •

edited

Loading

sc1f commented Mar 25, 2021 •

edited

Loading

nmichaud commented Mar 25, 2021 •

edited

Loading

sc1f commented May 7, 2021

nmichaud commented May 7, 2021

texodus left a comment

Add Expressions Engine using ExprTk #1354

Add Expressions Engine using ExprTk #1354

Conversation

sc1f commented Mar 24, 2021 • edited Loading

Next steps

sc1f commented Mar 24, 2021

nmichaud commented Mar 25, 2021 • edited Loading

nmichaud commented Mar 25, 2021 • edited Loading

texodus commented Mar 25, 2021 • edited Loading

nmichaud commented Mar 25, 2021 • edited Loading

sc1f commented Mar 25, 2021 • edited Loading

nmichaud commented Mar 25, 2021 • edited Loading

sc1f commented May 7, 2021

nmichaud commented May 7, 2021

texodus left a comment

Choose a reason for hiding this comment

sc1f commented Mar 24, 2021 •

edited

Loading

nmichaud commented Mar 25, 2021 •

edited

Loading

nmichaud commented Mar 25, 2021 •

edited

Loading

texodus commented Mar 25, 2021 •

edited

Loading

nmichaud commented Mar 25, 2021 •

edited

Loading

sc1f commented Mar 25, 2021 •

edited

Loading

nmichaud commented Mar 25, 2021 •

edited

Loading