-
Notifications
You must be signed in to change notification settings - Fork 163
Open
Labels
EPIC ⭐Big task that may encompass many smaller onesBig task that may encompass many smaller onesrefactorInternal code changes, clean-ups or reorganizations that are not externally visibleInternal code changes, clean-ups or reorganizations that are not externally visible
Description
Our current approach for creating new Expr
s is overly complicated, as outlined here: https://github.com/h2oai/datatable/blob/master/src/core/expr/!readme.md. This complexity stems mostly from the fact that the Expr
class which performs arithmetic on f-expressions
is defined in pure python, and then needs to be bridged into the C++ core.
A more sane approach would be to define everything in C++, eliminating most of the "middle-man" code. In particular, the following architecture is proposed:
- C++ class
py::FExpr
to replace current pythonExpr
class; - C++ class
py::ColumnNamespace
to replace current pythonFrameProxy
class; - C++ class
dt::expr::FExpr
is a merged version of currentdt::expr::Expr
anddt::expr::Head
. The class is virtual, with the hierarchy following that of theHead
class; - Each
py::FExpr
contains ashared_ptr<dt::expr::FExpr>
; - The
dt::expr::FExpr
class defines virtual methods for evaluation and reproing; - The
Op
enum is removed.
subtasks
- Add support for numeric and comparison methods in
py::XObject<C>
; - Create class
py::FExpr
(which will eventually replace the pure-pythondatatable.expr.Expr
); - Create class
dt::expr::FExpr
which is a backend forpy::FExpr
; - Arrange so that new
FExpr
s can be used alongside old pure-pythonExpr
s; - Create class
py::Namespace
to replace pure-pythondatatable.expr.FrameProxy
; - Convert existing
OldExpr
-based functionality into FExprs:- Frame-expr;
- List-expr;
- Dict-expr;
- Literal exprs:
- None;
- bool;
- int;
- float;
- str;
- type;
- range;
- slice (all);
- slice (numeric);
- slice (string);
- Column selectors
f.A
/f[0]
; -
f.extend()
; -
f.remove()
; - Cast functions;
-
shift()
; -
ifelse()
; -
cut()
; -
qcut()
; - Arithmetic binary operators
-
+
; -
-
; -
*
; -
/
; -
//
; -
%
; -
**
;
-
- Bitwise binary operators
-
&
; -
|
; -
^
; -
<<
; -
>>
;
-
- Unary operations
-
+
; -
-
; -
~
;
-
- Comparison operators
-
<
; -
>
; -
<=
; -
>=
; -
==
; -
!=
;
-
- String methods
-
len()
-
re_match()
;
-
- Reducers
-
mean
, -
min
, -
max
, -
stdev
, -
first
, -
last
, -
sum
, -
count
, -
count0
, -
median
, -
cov
, -
corr
;
-
- Math functions:
- Trigonometric
-
sin
, -
cos
, -
tan
, -
arcsin
, -
arccos
, -
arctan
, -
arctan2
, -
hypot
, -
deg2rad
, -
rad2deg
;
-
- Hyperbolic
-
sinh
, -
cosh
, -
tanh
, -
arsinh
, -
arcosh
, -
arcosh
;
-
- Exponential
-
cbrt
, -
exp
, -
exp2
, -
expm1
, -
log
, -
log10
, -
log1p
, -
log2
, -
logaddexp
, -
logaddexp2
, -
pow
, -
sqrt
, -
square
;
-
- Special
-
erf
, -
erfc
, -
gamma
, -
lgamma
;
-
- Floating
-
abs
, -
ceil
, -
copysign
, -
fabs
, -
floor
, -
frexp
, -
isclose
, -
isfinite
, -
isinf
, -
isna
, -
ldexp
, -
modf
, -
rint
, -
sign
, -
signbit
, -
trunc
;
-
- Miscellaneous
-
clip
, -
divmod
, -
fmod
, -
maximum
, -
minimum
;
-
- Trigonometric
- Row-functions:
-
rowall
, -
rowany
, -
rowcount
, -
rowfirst
, -
rowlast
, -
rowmin
, -
rowmax
, -
rowmean
, -
rowsum
, -
rowsd
;
-
- Documentation:
- Update documentation on how to work with new FExpr infrastructure ("expr/!readme.md");
- Add API documentation for the
py::Namespace
class; - Add API documentation for the
py::FExpr
class;
- Final cleanup:
- Remove python class
datatable.expr.FrameProxy
; - Remove python class
datatable.expr.Expr
; - Remove python enum
datatable.expr.OpCodes
; - Remove the
dt::expr::Op
enum; - Remove the
dt::expr::OldExpr
class; - Remove
args_registry
.
- Remove python class
samukweku
Metadata
Metadata
Assignees
Labels
EPIC ⭐Big task that may encompass many smaller onesBig task that may encompass many smaller onesrefactorInternal code changes, clean-ups or reorganizations that are not externally visibleInternal code changes, clean-ups or reorganizations that are not externally visible