Allow chunks that are larger than the maximum chunk size in index construction #303

jorisdral · 2024-07-21T12:11:54Z

When incrementally constructing a compact/ordinary index and the max chunk size is exceeded, then currently we might split up the output into multiple chunks if the serialised form of the index has a size that is mutiple of the max chunk size. It might be worth it to just return a larger chunk in that case (maybe still a multiple of the max chunk size), since we don't really rely on chunks being a specific size anywhere in the RunBuilder/RunAcc code

The text was updated successfully, but these errors were encountered:

jeltsch · 2024-07-30T20:03:05Z

As we agreed in our project meeting, this seems to be the way to go indeed. Concretely, we concluded in the meeting that, whenever the size of the buffered serialized data exceeds a certain threshold, all available data should be output in form of a single chunk. This particularly means the following:

Chunk sizes don’t have to be multiples of the threshold.
The threshold is not a maximum chunk size anymore but rather a minimum chunk size.
There is no maximum chunk size.

The last point is justified, because in practice chunks will still not become so large that the work of writing the index isn’t appropriately spread over time, which is particularly because serialized keys aren’t expected to be large.

With this new approach, the output of appending to an index shouldn’t have type [Chunk] anymore but rather type Maybe Chunk, as there can be at most one chunk only.

I will implement this new approach of chunk generation already as part of #296 and #299. For the compact index, it remains to be implemented (potentially by using the general-purpose chunk handling to be added by #296).

jorisdral changed the title ~~Do not split up incremental serialisation output into multiple Chunks~~ Allow Chunks that are large than the max chunk size in index construction Jul 21, 2024

jorisdral mentioned this issue Jul 21, 2024

Add general-purpose chunk handling #296

Merged

jeltsch changed the title ~~Allow Chunks that are large than the max chunk size in index construction~~ Allow chunks that are larger than the maximum chunk size in index construction Jul 30, 2024

jeltsch mentioned this issue Aug 2, 2024

Add incremental functionality for the ordinary index #299

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow chunks that are larger than the maximum chunk size in index construction #303

Allow chunks that are larger than the maximum chunk size in index construction #303

jorisdral commented Jul 21, 2024

jeltsch commented Jul 30, 2024 •

edited

Loading

Allow chunks that are larger than the maximum chunk size in index construction #303

Allow chunks that are larger than the maximum chunk size in index construction #303

Comments

jorisdral commented Jul 21, 2024

jeltsch commented Jul 30, 2024 • edited Loading

jeltsch commented Jul 30, 2024 •

edited

Loading