Expecting improved performance by keeping vectors across chunks

For each chunk, vectors are allocated. Example: https://github.com/jamessewell/pgingester/blob/main/src/main.rs#L608

These vectors could be allocated outside the chunk loop and re-used in each iteration. 

They would be cleared at the end of each iteration, only keeping their capacity (allocation), but removing their items.

The current implementation deallocates completely at the end of each iteration and reallocates at the start of the next iteration. 

To no over-allocate, the maximum size could be determined as a clamped value of batch size and total count, such that only space for total count items is allocated, in case total count is less than batch size.

The improvements are going to depend on the allocator used. Since the default allocator is used (rather than, e.g. `mimalloc` or `jemalloc`), I expect the reduced allocations to be noticeable, especially for unnest strategies with small batch sizes.

It might be incorrect to assume the database alone is responsible for the slightly surprising observed results in small batch sizes. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expecting improved performance by keeping vectors across chunks #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Expecting improved performance by keeping vectors across chunks #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions