Skip to content

Painless Compiler Extensibility #53702

Closed
Closed
@stu-elastic

Description

@stu-elastic

The Painless compiler can produce more performant code and allow better integration with other query languages by reworking it’s internal structure.

Changing the structure allows us to:

  • Substantially increase runtime performance
  • Integrate other frontends

After this work, we will be able to:

  • Increase painless runtime performance to be comparable to expressions (~20% improvement)
  • Allow SQL to use the painless compiler without awkward workarounds

The monolithic implementation of the existing compiler complicates both of these goals.

We will change the implementation of the compiler to a modern structure:

Frontend -> Intermediate Representation -> Compiler Phases -> Backend

This will allow us to incrementally add performance enhancements and provide an obvious integration point for other languages (they provide the frontend & initial IR).

Background:

Painless uses a single tree with several embedded phases for doing:

  • semantic checking
  • performance improvements
  • bytecode generation
  • collecting appropriate information for determining availability and cacheability of certain input parameters where each of these could happen in the same phase or across multiple phases.

This design has reached its limits, it has several problems:

  • The nodes contained a significant amount of mutable state that changed between phases.
  • The tree itself was mutable due to removal of nodes for constant folding and additional nodes for injection of class scope functions and fields and casting.
  • Tree mutability left certain portions of the tree in possible different states during single phase traversal.
  • Due to the previous issues this made it very difficult to add performance improvements and allow for extensibility for use in other areas such as SQL.
  • Small changes bleed throughout the implementation, a very large state space complicates maintainability.

In progress:

The tree is currently split into a "user" tree and an "IR" tree.

User Tree:
  • Representative of direct input from the generation source, the script author.
  • Nearly immutable at this point in time with some work left to complete to get there fully.
  • Must be checked for semantic validity.
  • Used to generate an IR tree directly. In future work, we will explore adding an extensibility point to produce other types of serialization such as JSON for additional debugging features.
IR Tree:
  • A mutable intermediate representation used by compiler phases to optimize runtime performance.
  • Immutability may make sense, must be investigated to avoid GC issues.
  • Is semantically valid allowing for easier modification.
  • Generates bytecode to create a Java class.

Outcomes:

  • An immutable user tree which is fully representative of the original script.
  • Add an API to allow extensibility for this tree to transform into any type of serialization.
  • Add new external performance phases to the IR tree such as script context-specific optimizations.
    • If a doc is read-only, we can propagate it as a constant and avoid unnecessary map lookups.

Related PRs:
#51278
#51452
#51452
#51690
#51776
#51954
#52612
#52783
#52915
#53075
#53348
#53685

Related Issues:
#49870
#49869

Metadata

Metadata

Assignees

Labels

:Core/Infra/ScriptingScripting abstractions, Painless, and Mustache>refactoringTeam:Core/InfraMeta label for core/infra teamtriagedIssue has been looked at, and is being left open

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions