Skip to content

Expecting improved performance by keeping vectors across chunks #1

@LeoniePhiline

Description

@LeoniePhiline

For each chunk, vectors are allocated. Example: https://github.com/jamessewell/pgingester/blob/main/src/main.rs#L608

These vectors could be allocated outside the chunk loop and re-used in each iteration.

They would be cleared at the end of each iteration, only keeping their capacity (allocation), but removing their items.

The current implementation deallocates completely at the end of each iteration and reallocates at the start of the next iteration.

To no over-allocate, the maximum size could be determined as a clamped value of batch size and total count, such that only space for total count items is allocated, in case total count is less than batch size.

The improvements are going to depend on the allocator used. Since the default allocator is used (rather than, e.g. mimalloc or jemalloc), I expect the reduced allocations to be noticeable, especially for unnest strategies with small batch sizes.

It might be incorrect to assume the database alone is responsible for the slightly surprising observed results in small batch sizes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions