Skip to content

Refactor Expr architecture #2562

@st-pasha

Description

@st-pasha

Our current approach for creating new Exprs is overly complicated, as outlined here: https://github.com/h2oai/datatable/blob/master/src/core/expr/!readme.md. This complexity stems mostly from the fact that the Expr class which performs arithmetic on f-expressions is defined in pure python, and then needs to be bridged into the C++ core.

A more sane approach would be to define everything in C++, eliminating most of the "middle-man" code. In particular, the following architecture is proposed:

  • C++ class py::FExpr to replace current python Expr class;
  • C++ class py::ColumnNamespace to replace current python FrameProxy class;
  • C++ class dt::expr::FExpr is a merged version of current dt::expr::Expr and dt::expr::Head. The class is virtual, with the hierarchy following that of the Head class;
  • Each py::FExpr contains a shared_ptr<dt::expr::FExpr>;
  • The dt::expr::FExpr class defines virtual methods for evaluation and reproing;
  • The Op enum is removed.

subtasks

  • Add support for numeric and comparison methods in py::XObject<C>;
  • Create class py::FExpr (which will eventually replace the pure-python datatable.expr.Expr);
  • Create class dt::expr::FExpr which is a backend for py::FExpr;
  • Arrange so that new FExprs can be used alongside old pure-python Exprs;
  • Create class py::Namespace to replace pure-python datatable.expr.FrameProxy;
  • Convert existing OldExpr-based functionality into FExprs:
    • Frame-expr;
    • List-expr;
    • Dict-expr;
    • Literal exprs:
      • None;
      • bool;
      • int;
      • float;
      • str;
      • type;
      • range;
      • slice (all);
      • slice (numeric);
      • slice (string);
    • Column selectors f.A / f[0];
    • f.extend();
    • f.remove();
    • Cast functions;
    • shift();
    • ifelse();
    • cut();
    • qcut();
    • Arithmetic binary operators
      • +;
      • -;
      • *;
      • /;
      • //;
      • %;
      • **;
    • Bitwise binary operators
      • &;
      • |;
      • ^;
      • <<;
      • >>;
    • Unary operations
      • +;
      • -;
      • ~;
    • Comparison operators
      • <;
      • >;
      • <=;
      • >=;
      • ==;
      • !=;
    • String methods
      • len()
      • re_match();
    • Reducers
      • mean,
      • min,
      • max,
      • stdev,
      • first,
      • last,
      • sum,
      • count,
      • count0,
      • median,
      • cov,
      • corr;
    • Math functions:
      • Trigonometric
        • sin,
        • cos,
        • tan,
        • arcsin,
        • arccos,
        • arctan,
        • arctan2,
        • hypot,
        • deg2rad,
        • rad2deg;
      • Hyperbolic
        • sinh,
        • cosh,
        • tanh,
        • arsinh,
        • arcosh,
        • arcosh;
      • Exponential
        • cbrt,
        • exp,
        • exp2,
        • expm1,
        • log,
        • log10,
        • log1p,
        • log2,
        • logaddexp,
        • logaddexp2,
        • pow,
        • sqrt,
        • square;
      • Special
        • erf,
        • erfc,
        • gamma,
        • lgamma;
      • Floating
        • abs,
        • ceil,
        • copysign,
        • fabs,
        • floor,
        • frexp,
        • isclose,
        • isfinite,
        • isinf,
        • isna,
        • ldexp,
        • modf,
        • rint,
        • sign,
        • signbit,
        • trunc;
      • Miscellaneous
        • clip,
        • divmod,
        • fmod,
        • maximum,
        • minimum;
    • Row-functions:
      • rowall,
      • rowany,
      • rowcount,
      • rowfirst,
      • rowlast,
      • rowmin,
      • rowmax,
      • rowmean,
      • rowsum,
      • rowsd;
  • Documentation:
    • Update documentation on how to work with new FExpr infrastructure ("expr/!readme.md");
    • Add API documentation for the py::Namespace class;
    • Add API documentation for the py::FExpr class;
  • Final cleanup:
    • Remove python class datatable.expr.FrameProxy;
    • Remove python class datatable.expr.Expr;
    • Remove python enum datatable.expr.OpCodes;
    • Remove the dt::expr::Op enum;
    • Remove the dt::expr::OldExpr class;
    • Remove args_registry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EPIC ⭐Big task that may encompass many smaller onesrefactorInternal code changes, clean-ups or reorganizations that are not externally visible

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions