Skip to content

Conversation

@natashasehgal
Copy link
Contributor

Summary:
Use SimpleFunction instead of VectorFunction in merge_hll utilizing the new MemoryPool infrastructure in Simple Function

The HashStringAllocator is created from MemoryPool, to get performance of HSA with serialial processing, and use query's memory from MemoryPool for proper memory tracking

Simple function is cleaner and more maintainable code with auto null handling

Differential Revision: D87932002

@netlify
Copy link

netlify bot commented Nov 26, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 4298ce7
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/692778ac32c198000839b450

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 26, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 26, 2025

@natashasehgal has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87932002.

…ator#15641)

Summary:

Add MemoryPool support to SimpleFunction `initialize()` method to enable memory tracking for scalar functions (e.g merge_hll) that use HashStringAllocator.

Some types that require this include Hyperloglog, KHyperloglog, SetDigest, SfmSketch etc. 

**Main Change** 
- Modified SimpleFunctionAdapter to pass MemoryPool from ExprCompiler to UDF initialize()
- Two signatures in UDFHolder::initialize() for backward compatibility - 
  - New signature: initialize(vector<TypePtr>, QueryConfig, MemoryPool*, constantArgs...)
  - Old signature: initialize(vector<TypePtr>, QueryConfig, constantArgs...)

**Issue:** The initial straightforward implementation failed because some UDFs (e.g., Spark's https://www.internalfb.com/code/fbsource/fbcode/velox/functions/sparksql/Rand.h?lines=30 function) use template initialize() methods, that was matching fourth MemoryPool argument:

```cpp
template <typename TInput>
void initialize(vector<TypePtr>&, QueryConfig&, const TInput* param);

V2: Contains Template Detection via SFINAE
Compile-time template detection that probes initialize() with a DummyProbeType. If the method accepts this unrelated type, it's a template and is excluded from receiving MemoryPool.

Differential Revision: D87932003
…#15642)

Summary:

Use SimpleFunction instead of VectorFunction in `merge_hll` utilizing the new MemoryPool infrastructure in Simple Function

The HashStringAllocator is created from MemoryPool, to get performance of HSA with serialial processing, and use query's memory from MemoryPool for proper memory tracking

Simple function is cleaner and more maintainable code with auto null handling

Differential Revision: D87932002
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant