Skip to content

Conversation

ballard26
Copy link
Contributor

Changes iobuf::append(iobuf&&) to reduce fragment allocations by re-using fragments from the moved iobuf. Also reduces the number of duplicate checks. See commit messages for benchmarking results.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.1.x
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

@ballard26 ballard26 changed the title iobuf::append improvements p2 iobuf::append improvements part 2 Mar 25, 2025
}

append(std::move(f).unoptimized_release());
append(f);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that since we're not trimming f here test_next_chunk_allocation_append_iobuf will fail as is since the appended data will now map to 322 fragments instead of 323. If we trim f the test will pass and the behavior will be identical to what it is today. However, I really don't think that behavior is optimal since if we are taking ownership of the fragments we may as well use any free space rather than just trimming it away. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes we should change the test instead of realloc here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brandon do you know why we were trimming before here? Seems weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the trimming was ever intentional. However, before this PR appending an existing io_fragment to an iobuf would entail;

  • The underlying temporary_buffer in the io_fragment to be appended was shared up to the used bytes. This is basically what is acting as a trim.
  • A new io_fragment was allocated and given the shared temporary_buffer.
  • This new fragment was appended to the iobuf

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Mar 25, 2025

CI test results

test results on build#63618
test_id test_kind job_url test_status passed
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.preparing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/63618#0195cb19-3e37-4a61-b714-3ba701e233cb FLAKY 1/5
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.preparing.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/63618#0195cb19-3e37-4833-9a76-ec8374af5b37 FLAKY 1/2
rptest.tests.datalake.cluster_restore_test.DatalakeClusterRestoreTest.test_restore_partition_spec.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.NESSIE ducktape https://buildkite.com/redpanda/redpanda/builds/63618#0195cb19-3e37-4a61-b714-3ba701e233cb FLAKY 1/2
rptest.tests.datalake.schema_evolution_test.SchemaEvolutionE2ETests.test_dropped_column_no_collision.cloud_storage_type=CloudStorageType.S3.query_engine=QueryEngineType.SPARK.produce_mode=ProducerType.AVRO.catalog_type=CatalogType.NESSIE ducktape https://buildkite.com/redpanda/redpanda/builds/63618#0195cb19-3e36-432d-8143-60ae359f93d2 FLAKY 1/3
rptest.tests.datalake.schema_evolution_test.SchemaEvolutionE2ETests.test_dropped_column_no_collision.cloud_storage_type=CloudStorageType.S3.query_engine=QueryEngineType.SPARK.produce_mode=ProducerType.PROTO2.catalog_type=CatalogType.NESSIE ducktape https://buildkite.com/redpanda/redpanda/builds/63618#0195cb19-3e38-4a2c-8dd2-ba661d9cd05a FLAKY 1/2
rptest.tests.datalake.schema_evolution_test.SchemaEvolutionE2ETests.test_dropped_column_no_collision.cloud_storage_type=CloudStorageType.S3.query_engine=QueryEngineType.SPARK.produce_mode=ProducerType.PROTO2.catalog_type=CatalogType.REST_JDBC ducktape https://buildkite.com/redpanda/redpanda/builds/63618#0195cb19-3e37-4a61-b714-3ba701e233cb FLAKY 1/3
rptest.tests.datalake.schema_evolution_test.SchemaEvolutionE2ETests.test_dropped_column_no_collision.cloud_storage_type=CloudStorageType.S3.query_engine=QueryEngineType.SPARK.produce_mode=ProducerType.PROTO3.catalog_type=CatalogType.REST_JDBC ducktape https://buildkite.com/redpanda/redpanda/builds/63618#0195cb19-3e36-432d-8143-60ae359f93d2 FLAKY 1/11
rptest.tests.datalake.schema_evolution_test.SchemaEvolutionE2ETests.test_dropped_column_no_collision.cloud_storage_type=CloudStorageType.S3.query_engine=QueryEngineType.TRINO.produce_mode=ProducerType.AVRO.catalog_type=CatalogType.REST_JDBC ducktape https://buildkite.com/redpanda/redpanda/builds/63618#0195cb19-3e38-4a2c-8dd2-ba661d9cd05a FLAKY 1/3
rptest.tests.datalake.schema_evolution_test.SchemaEvolutionE2ETests.test_dropped_column_no_collision.cloud_storage_type=CloudStorageType.S3.query_engine=QueryEngineType.TRINO.produce_mode=ProducerType.PROTO2.catalog_type=CatalogType.NESSIE ducktape https://buildkite.com/redpanda/redpanda/builds/63618#0195cb19-3e36-432d-8143-60ae359f93d2 FLAKY 1/19
rptest.tests.datalake.schema_evolution_test.SchemaEvolutionE2ETests.test_dropped_column_no_collision.cloud_storage_type=CloudStorageType.S3.query_engine=QueryEngineType.TRINO.produce_mode=ProducerType.PROTO2.catalog_type=CatalogType.REST_JDBC ducktape https://buildkite.com/redpanda/redpanda/builds/63618#0195cb19-3e37-4833-9a76-ec8374af5b37 FLAKY 1/2
rptest.tests.datalake.schema_evolution_test.SchemaEvolutionE2ETests.test_dropped_column_no_collision.cloud_storage_type=CloudStorageType.S3.query_engine=QueryEngineType.TRINO.produce_mode=ProducerType.PROTO3.catalog_type=CatalogType.NESSIE ducktape https://buildkite.com/redpanda/redpanda/builds/63618#0195cb19-3e38-4a2c-8dd2-ba661d9cd05a FLAKY 1/13
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=True.with_iceberg=True.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/63618#0195cb05-b82f-49c9-b63a-2a29317b32c4 FLAKY 1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/63618#0195cb05-b82d-4274-9e6c-e6ab42988505 FLAKY 1/4
test_bytes_rpunit.test_bytes_rpunit unit https://buildkite.com/redpanda/redpanda/builds/63618#0195cabf-e8e9-4f54-8664-84aac46366d5 FAIL 0/2
test_bytes_rpunit.test_bytes_rpunit unit https://buildkite.com/redpanda/redpanda/builds/63618#0195cabf-e8ea-44c5-aaf6-c7e7d81f5a0a FAIL 0/2
test results on build#64009
test_id test_kind job_url test_status passed
rptest.tests.cloud_storage_chunk_read_path_test.CloudStorageChunkReadTest.test_read_when_segment_size_smaller_than_chunk_size ducktape https://buildkite.com/redpanda/redpanda/builds/64009#0195efc8-4e6e-4071-a21f-5486fc9b1c90 FLAKY 1/2
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.preparing.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/64009#0195efc8-4e6c-413b-8c03-14906f5afb69 FLAKY 1/2
storage_e2e_rpfixture.storage_e2e_rpfixture unit https://buildkite.com/redpanda/redpanda/builds/64009#0195ef85-c1c1-4fea-9567-446fb6c6d6dc FLAKY 1/2
test results on build#64059
test_id test_kind job_url test_status passed
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.preparing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/64059#0195f333-2d21-43e2-b6f3-08a945db92e5 FLAKY 1/5
rptest.tests.datalake.simple_connect_test.RedpandaConnectIcebergTest.test_translating_avro_serialized_records.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64059#0195f333-2d20-4539-8710-365aefc1845d FLAKY 1/2

@StephanDollberg
Copy link
Member

It seems to me that the iobuf microbench space is too small for all the various cases that exist to append things to an iobuf. Feels like we are potentially missing out on cases.

@travisdowns
Copy link
Member

This is selected to backport to 25.1.x? Is that right?

@@ -251,6 +251,7 @@ class iobuf {
size_t last_allocation_size() const;

bool try_copy_append(const char* ptr, size_t size);
void append(fragment*);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing comment, especially necessary here as there is a raw pointer involved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will add one.

@ballard26
Copy link
Contributor Author

ballard26 commented Apr 1, 2025

This is selected to backport to 25.1.x? Is that right?

No, I believe we decide against backporting it in our last meeting.

rockwotj
rockwotj previously approved these changes Apr 1, 2025
Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM outside of the comment about docs. Not sure if it was discussed out of band about adding more benchmark patterns per Stephan's comment

}
o.clear();
o._size = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drops the the update_generation() call which we shouldn't do (I guess there is no meta test for this since one didn't fail here).

Is this noticeably faster than clear()? If so maybe we just add the update_generation() call back in explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check if it's any faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effects appear to be negligible, will switch back to using iobuf::clear()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this seems more robust to future changes also.

Instead of allocating a new fragment this PR re-uses the allocated
fragments from the iobuf being appended. This reduces the iobuf::append
benchmark from;
| test                      |  iterations |      median |         mad |         min |         max |      allocs |       tasks |        inst |
| -                         |           - |           - |           - |           - |           - |           - |           - |           - |
| iobuf.append_bench_small  |    33010000 |     8.845ns |     0.009ns |     8.834ns |     8.879ns |       0.008 |       0.000 |       180.1 |
| iobuf.append_bench_medium |        9000 |     2.321us |    15.924ns |     2.305us |     2.595us |       0.627 |       0.000 |      2257.0 |
| iobuf.append_bench_large  |        1000 |     7.376us |    32.411ns |     7.344us |     7.454us |       4.000 |       0.000 |      6345.5 |
To;
| test                      |  iterations |      median |         mad |         min |         max |      allocs |       tasks |        inst |
| -                         |           - |           - |           - |           - |           - |           - |           - |           - |
| iobuf.append_bench_small  |    32347000 |     8.838ns |     0.005ns |     8.829ns |     8.854ns |       0.008 |       0.000 |       174.1 |
| iobuf.append_bench_medium |        9000 |     2.151us |     3.486ns |     2.132us |     2.269us |       0.626 |       0.000 |      2251.5 |
| iobuf.append_bench_large  |        1000 |     7.313us |    13.044ns |     7.240us |     7.368us |       1.750 |       0.000 |      6039.5 |
iobuf::append(fragment) already handles trimming the last fragment
before appending. Therefore there is no need to repeat this work in
iobuf::append(iobuf&&) or iobuf::append(temporary_buffer). This reduces
the append microbench from;
| test                      |  iterations |      median |         mad |         min |         max |      allocs |       tasks |        inst |
| -                         |           - |           - |           - |           - |           - |           - |           - |           - |
| iobuf.append_bench_small  |    32347000 |     8.838ns |     0.005ns |     8.829ns |     8.854ns |       0.008 |       0.000 |       174.1 |
| iobuf.append_bench_medium |        9000 |     2.151us |     3.486ns |     2.132us |     2.269us |       0.626 |       0.000 |      2251.5 |
| iobuf.append_bench_large  |        1000 |     7.313us |    13.044ns |     7.240us |     7.368us |       1.750 |       0.000 |      6039.5 |
To;
| test                      |  iterations |      median |         mad |         min |         max |      allocs |       tasks |        inst |
| -                         |           - |           - |           - |           - |           - |           - |           - |           - |
| iobuf.append_bench_small  |    31733000 |     8.226ns |     0.003ns |     8.217ns |     8.253ns |       0.008 |       0.000 |       160.1 |
| iobuf.append_bench_medium |        9000 |     2.146us |    18.402ns |     2.124us |     2.247us |       0.626 |       0.000 |      2237.9 |
| iobuf.append_bench_large  |        1000 |     7.346us |    14.547ns |     7.320us |     7.362us |       1.750 |       0.000 |      6017.9 |
@travisdowns
Copy link
Member

@StephanDollberg

It seems to me that the iobuf microbench space is too small for all the various cases that exist to append things to an iobuf. Feels like we are potentially missing out on cases.

This is true (that we lack test cases), but is it a problem for this change? It looks to me like this isn't really a "tradeoff" type change, but just strictly better. Are you thinking about the change from trim to no-trim in append(iobuf&&)? It looks to me like the old trim was spurious.

@StephanDollberg
Copy link
Member

it looks to me like this isn't really a "tradeoff" type change, but just strictly better

I mean I am no expert in this code but that's immediately obvious from the change? Especially given that with all the iobuf stuff we have run into a lot of unexpected things already.

Also more fundamentally I think we should write the benchmarks/tests first before we start optimizing things.

@ballard26
Copy link
Contributor Author

It seems to me that the iobuf microbench space is too small for all the various cases that exist to append things to an iobuf. Feels like we are potentially missing out on cases.

Also more fundamentally I think we should write the benchmarks/tests first before we start optimizing things.

Was chatting with Stephan about these points in DMs. My understanding is as follows;

  • I implemented microbenchmarks for the main code paths(copy and zero-copy) foriobuf::append(iobuf&&) in a previous PR. These microbenchmarks should be sufficient to gauge the changes made in this PR since the same function is being optimized.
  • In general there is a lack of microbenchmark coverage for iobuf; only 6 for a type that is fundamental to RP. As an example there are at least 3 other iobuf::append functions that have no microbenchmarks. Hence as a follow-up PR it may be worthwhile to expand the coverage.

@ballard26 ballard26 merged commit 580f901 into redpanda-data:dev Apr 2, 2025
19 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v25.1.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants