Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Gandiva] Constructing LLVM module with only necessary functions for better performance #40024

Open
niyue opened this issue Feb 10, 2024 · 2 comments · May be fixed by #40031
Open

[C++][Gandiva] Constructing LLVM module with only necessary functions for better performance #40024

niyue opened this issue Feb 10, 2024 · 2 comments · May be fixed by #40031

Comments

@niyue
Copy link
Contributor

niyue commented Feb 10, 2024

Description

This enhancement request plans to speed up the construct of LLVM module by examining the particular functions used in Gandiva expressions, and avoid unnecessary operations to speed it up.

When constructing an LLVM module for the given expressions, Gandiva performs the following tasks:

  1. Instantiate a new Engine, which internally constructs a new LLVM module
  2. Add many C functions and their pointers that may be called by the expression to the LLVM module.
    • Most of the C functions are user-facing, and will be used in Gandiva expressions by users, such as the random function, and the gdv_fn_base64_decode_utf8 function (which is used by unbase64)
    • Some of the C functions are internally used only, such as gdv_fn_populate_varlen_vector and gdv_fn_context_arena_malloc and not directly used in Gandiva expressions composed by users, but by the LLVM IR composed for the LLVM module
  3. Load LLVM bitcode, which contains many LLVM IR implemented functions into the LLVM module
    • Most of the IR functions are user-facing, and will be used in Gandiva expressions by users, such as the negative function and the log10 function
    • Some of the IR functions are internally used only, such as the bitMapGetBit and bitMapValidityGetBit functions

During the above process, some of the operations are not trivial and they makes the above process not fast enough:

  1. For each of the C function added to the LLVM module, in the end, the C function's pointer will be added and defined in the LLVM module's JITDylib, jit_dylib.define(llvm::orc::absoluteSymbols({{mangle(name), symbol}})). This is not a cheap operation, and since each LLVM module will add many C functions into it (143 such usage so far in the codebase), which makes constructing the LLVM module not fast enough (when cache is not hit).
  2. Loading LLVM bitcode will call llvm::Linker::linkModules to copy the bitcode's module into the Engine's LLVM module, and this is an expensive operation.

Proposal

To speed up the above process, the key observation is:

  1. typically, besides the internally used C functions, only a very small number of C functions are used in most expressions, so we don't have to add map the 143 functions every time (it is very rare that users will come up with some expressions calling 100+ functions at the same time)
  2. typically, besides the internally used IR functions, only a very small number of IR functions are used in most expressions, we could avoid loading the LLVM bitcode and linking them into the LLVM module if the functions are not used at all (for example, all the functions used in the expressions are C functions)

The proposal to improve this part is:

  1. parse the expressions and keep track of the functions used in the expressions
  2. when adding/mapping C functions, if it is an internally used function, we could simply add it, otherwise, check the used functions obtained above, to see if it is really needed to be defined in the LLVM module
  3. Split LLVM bitcode into two parts:
    • one part for storing internal IR functions, more specifically, bitMapGetBit/bitMapSetBit/bitMapValidityGetBit/bitMapClearBitIfFalse. This part of bitcode will always loaded and added to the LLVM module
    • the other part for storing all user-facing IR functions. When loading LLVM bitcode, check if all the functions used in the expressions are C functions, if yes, there is no need to load IR function bitcode at all.

This kind of processing will avoid the expensive operations mentioned above, hence achieving better performance in some cases.

Component(s)

C++ - Gandiva

@kou
Copy link
Member

kou commented Feb 11, 2024

We register functions lazy, right?
It makes sense.

@niyue
Copy link
Contributor Author

niyue commented Feb 11, 2024

We register functions lazy, right?

Not exactly, the idea is to avoid defining non used functions if they are not used in the expressions at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants