pandaproxy: add max memory check for incoming requests #24537

IoannisRP · 2024-12-11T20:08:56Z

In the pandaproxy server, if a request comes in that is larger than the total available memory, every other request is blocked. A test is added to make sure that a request bigger than the available memory returns an error.

For better monitoring, the following metrics have been added,
prefixed with [vectorized|redpanda]_[rest_proxy|schema_registry]_ :

Metric	Type	Description	Labels
`inflight_requests_usage_ratio`	gauge	Usage ratio of in-flight requests in the [rest_proxy\|schema_registry]	`shard`
`inflight_requests_memory_usage_ratio`	gauge	Memory usage ratio of in-flight requests in the [rest_proxy\|schema_registry] in bytes	`shard`
`queued_requests_memory_blocked`	gauge	Number of requests queued in [rest_proxy\|schema_registry], due to memory limitations	`shard`

Backports Required

Release Notes

Improvements

Added metrics for pandaproxy resource usage.

IoannisRP · 2024-12-11T20:09:16Z

/dt

vbotbuildovich · 2024-12-11T23:47:32Z

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59616#0193b7b4-8c87-4047-b52a-c85d1f6a6b2f:

"rptest.tests.datalake.compaction_gaps_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3"

vbotbuildovich · 2024-12-11T23:47:44Z

Retry command for Build#59616

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/datalake/compaction_gaps_test.py::CompactionGapsTest.test_translation_no_gaps@{"cloud_storage_type":1}

src/v/pandaproxy/reply.h

src/v/pandaproxy/server.cc

src/v/pandaproxy/server.h

vbotbuildovich · 2024-12-13T14:28:20Z

Retry command for Build#59721

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/pandaproxy_test.py::PandaProxyInvalidInputsTest.test_topic_produce_request_too_big
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":false}
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest.test_cloud_retention@{"cloud_storage_type":2,"max_consume_rate_mb":20}
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest.test_cloud_retention@{"cloud_storage_type":2,"max_consume_rate_mb":null}

vbotbuildovich · 2024-12-13T14:52:12Z

CI test results

test results on build#59721

test_id	test_kind	job_url	test_status	passed
rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=20.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f30-4082-9f96-97ee012061d2	FAIL	0/6
rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=None.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f2e-4cf0-b5d5-732e2009ef7b	FAIL	0/6
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f2f-429e-ba83-7880fd23e05e	FLAKY	2/6
rptest.tests.pandaproxy_test.PandaProxyInvalidInputsTest.test_topic_produce_request_too_big	ducktape	https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f2d-4362-99bd-847c422c8215	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f30-4082-9f96-97ee012061d2	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=True.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f30-4082-9f96-97ee012061d2	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f30-4082-9f96-97ee012061d2	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=True.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f30-4082-9f96-97ee012061d2	FAIL	0/1

test results on build#59729

test_id	test_kind	job_url	test_status	passed
rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=20.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18ba-4ad7-8a29-7aea1a0d6f63	FAIL	0/6
rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=None.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18b8-44a2-8b92-c1ad1f800d14	FAIL	0/6
rptest.tests.cloud_storage_scrubber_test.CloudStorageScrubberTest.test_scrubber.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18b7-4d6e-a3f1-0a6aea86eae2	FLAKY	5/6
rptest.tests.datalake.compaction_gaps_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18ba-4ad7-8a29-7aea1a0d6f63	FAIL	0/1
rptest.tests.maintenance_test.MaintenanceTest.test_maintenance_sticky.use_rpk=False	ducktape	https://buildkite.com/redpanda/redpanda/builds/59729#0193c0fd-d2e3-4a01-8254-bfd0e464fce8	FAIL	0/1
rptest.tests.pandaproxy_test.PandaProxyInvalidInputsTest.test_topic_produce_request_too_big	ducktape	https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18b7-4d6e-a3f1-0a6aea86eae2	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18ba-4ad7-8a29-7aea1a0d6f63	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=True.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18ba-4ad7-8a29-7aea1a0d6f63	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18ba-4ad7-8a29-7aea1a0d6f63	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=True.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18ba-4ad7-8a29-7aea1a0d6f63	FAIL	0/1

test results on build#59811

test_id	test_kind	job_url	test_status	passed
rptest.tests.e2e_shadow_indexing_test.ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59811#0193d07f-3cce-433d-8e7b-22c2ea6e16fa	FAIL	0/6

test results on build#59848

test_id	test_kind	job_url	test_status	passed
rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=None.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59848#0193d486-cbb6-491c-9fde-409d02d26ddd	FAIL	0/6

IoannisRP · 2024-12-13T15:56:57Z

changes in force-push:

move probe out of server
change metrics to report normalized values
removed header in response

vbotbuildovich · 2024-12-13T18:29:03Z

Retry command for Build#59729

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/maintenance_test.py::MaintenanceTest.test_maintenance_sticky@{"use_rpk":false}
tests/rptest/tests/pandaproxy_test.py::PandaProxyInvalidInputsTest.test_topic_produce_request_too_big
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":true}
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest.test_cloud_retention@{"cloud_storage_type":2,"max_consume_rate_mb":20}
tests/rptest/tests/datalake/compaction_gaps_test.py::CompactionGapsTest.test_translation_no_gaps@{"cloud_storage_type":1}
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest.test_cloud_retention@{"cloud_storage_type":2,"max_consume_rate_mb":null}

src/v/pandaproxy/probe.cc

src/v/pandaproxy/probe.h

src/v/pandaproxy/server.cc

src/v/pandaproxy/server.h

IoannisRP · 2024-12-16T12:24:07Z

Changes in force-push:

Probe gets created before listeners.
Probe gets reset after requests have finished.
Renamed ratio metrics.
Fixed dt test to follow updated kafka service memory limits. Also changed test to operate on 256Mb instead of 128Mb to be less restricted.

IoannisRP · 2024-12-16T12:30:13Z

Changes in force-push:

removed a few headers that sneaked in

src/v/pandaproxy/probe.cc

src/v/pandaproxy/probe.h

Deflaimun · 2024-12-16T14:34:38Z

Hi guys. I asked other doc writers to take a look.

While we're here, I see the labels are defined as "shards". How does that work? It's because they have shard_local_cfg() attached to them? Or labels are something else entirely?

Feel free to DM me if you want to talk about it.

BenPope · 2024-12-16T14:38:54Z

While we're here, I see the labels are defined as "shards". How does that work? It's because they have shard_local_cfg() attached to them? Or labels are something else entirely?

It's a number representing the logical core that the Redpanda reactor is running on. When Redpanda is given 4 CPU Cores there will be 4 reactors, one on each of the CPU cores, e.g.: --smp=4 or -c4, there will be shard 0 to shard 3.

JakeSCahill · 2024-12-16T15:35:35Z

I'll add these metrics to our docs for 24.3 and backport to 24.2: https://redpandadata.atlassian.net/browse/DOC-869

I assume they will go out in the next patch release for each.

IoannisRP · 2024-12-16T15:43:04Z

Changes in force-push:

changed name and description for memory metric

vbotbuildovich · 2024-12-16T19:51:32Z

Retry command for Build#59811

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/e2e_shadow_indexing_test.py::ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy@{"cloud_storage_type":2,"short_retention":true}

src/v/pandaproxy/probe.cc

IoannisRP · 2024-12-17T10:53:36Z

changes in force-push:

reworded semaphore metric descriptions

BenPope

Please fix the cover letter to reflect the latest changes.

vbotbuildovich · 2024-12-17T14:20:16Z

Retry command for Build#59848

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest.test_cloud_retention@{"cloud_storage_type":2,"max_consume_rate_mb":null}

IoannisRP · 2024-12-17T14:32:23Z

CI Failures:

CORE-8573

michael-redpanda

One question about aggregation of metrics

src/v/pandaproxy/probe.cc

vbotbuildovich · 2024-12-18T01:51:26Z

/backport v24.3.x

vbotbuildovich · 2024-12-18T01:51:27Z

/backport v24.2.x

vbotbuildovich · 2024-12-18T01:51:28Z

/backport v24.1.x

dotnwat

👍

github-actions bot added the area/redpanda label Dec 11, 2024

IoannisRP force-pushed the ik-pandaproxy-add-memcheck branch 3 times, most recently from c0293e3 to a8b2628 Compare December 13, 2024 11:07

IoannisRP marked this pull request as ready for review December 13, 2024 11:10

IoannisRP requested review from a team, oleiman, BenPope and michael-redpanda and removed request for a team December 13, 2024 11:11

BenPope reviewed Dec 13, 2024

View reviewed changes

src/v/pandaproxy/reply.h Outdated Show resolved Hide resolved

src/v/pandaproxy/server.cc Outdated Show resolved Hide resolved

src/v/pandaproxy/server.h Outdated Show resolved Hide resolved

IoannisRP force-pushed the ik-pandaproxy-add-memcheck branch from a8b2628 to 83e1b8d Compare December 13, 2024 15:55

IoannisRP requested a review from BenPope December 13, 2024 17:35

BenPope reviewed Dec 16, 2024

View reviewed changes

pandaproxy: add max memory check for incoming requests

efe9624

IoannisRP force-pushed the ik-pandaproxy-add-memcheck branch from 83e1b8d to 55c3907 Compare December 16, 2024 12:19

IoannisRP requested a review from BenPope December 16, 2024 12:24

IoannisRP force-pushed the ik-pandaproxy-add-memcheck branch from 55c3907 to 1c5c454 Compare December 16, 2024 12:29

BenPope reviewed Dec 16, 2024

View reviewed changes

src/v/pandaproxy/probe.cc Outdated Show resolved Hide resolved

src/v/pandaproxy/probe.h Show resolved Hide resolved

BenPope requested a review from a team December 16, 2024 13:17

IoannisRP force-pushed the ik-pandaproxy-add-memcheck branch from 1c5c454 to 0919453 Compare December 16, 2024 15:42

IoannisRP requested a review from BenPope December 16, 2024 15:43

BenPope reviewed Dec 17, 2024

View reviewed changes

src/v/pandaproxy/probe.cc Outdated Show resolved Hide resolved

pandaproxy: add semamphore usage metrics

067fc7b

IoannisRP force-pushed the ik-pandaproxy-add-memcheck branch from 0919453 to 067fc7b Compare December 17, 2024 10:52

IoannisRP requested a review from BenPope December 17, 2024 10:53

BenPope approved these changes Dec 17, 2024

View reviewed changes

michael-redpanda reviewed Dec 17, 2024

View reviewed changes

src/v/pandaproxy/probe.cc Show resolved Hide resolved

michael-redpanda merged commit 151f46e into redpanda-data:dev Dec 18, 2024
16 of 19 checks passed

This was referenced Dec 18, 2024

[v24.3.x] pandaproxy: add max memory check for incoming requests #24603

Merged

[v24.2.x] pandaproxy: add max memory check for incoming requests #24604

Merged

[v24.1.x] pandaproxy: add max memory check for incoming requests #24605

Merged

dotnwat reviewed Jan 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandaproxy: add max memory check for incoming requests #24537

pandaproxy: add max memory check for incoming requests #24537

IoannisRP commented Dec 11, 2024 •

edited

Loading

IoannisRP commented Dec 11, 2024

vbotbuildovich commented Dec 11, 2024

vbotbuildovich commented Dec 11, 2024

vbotbuildovich commented Dec 13, 2024 •

edited

Loading

vbotbuildovich commented Dec 13, 2024 •

edited

Loading

IoannisRP commented Dec 13, 2024

vbotbuildovich commented Dec 13, 2024 •

edited

Loading

IoannisRP commented Dec 16, 2024

IoannisRP commented Dec 16, 2024

Deflaimun commented Dec 16, 2024 •

edited

Loading

BenPope commented Dec 16, 2024

JakeSCahill commented Dec 16, 2024

IoannisRP commented Dec 16, 2024

vbotbuildovich commented Dec 16, 2024

IoannisRP commented Dec 17, 2024

BenPope left a comment

vbotbuildovich commented Dec 17, 2024

IoannisRP commented Dec 17, 2024 •

edited by jira bot

Loading

michael-redpanda left a comment

vbotbuildovich commented Dec 18, 2024

vbotbuildovich commented Dec 18, 2024

vbotbuildovich commented Dec 18, 2024

dotnwat left a comment

pandaproxy: add max memory check for incoming requests #24537

pandaproxy: add max memory check for incoming requests #24537

Conversation

IoannisRP commented Dec 11, 2024 • edited Loading

Backports Required

Release Notes

Improvements

IoannisRP commented Dec 11, 2024

vbotbuildovich commented Dec 11, 2024

vbotbuildovich commented Dec 11, 2024

Retry command for Build#59616

vbotbuildovich commented Dec 13, 2024 • edited Loading

Retry command for Build#59721

vbotbuildovich commented Dec 13, 2024 • edited Loading

CI test results

IoannisRP commented Dec 13, 2024

vbotbuildovich commented Dec 13, 2024 • edited Loading

Retry command for Build#59729

IoannisRP commented Dec 16, 2024

IoannisRP commented Dec 16, 2024

Deflaimun commented Dec 16, 2024 • edited Loading

BenPope commented Dec 16, 2024

JakeSCahill commented Dec 16, 2024

IoannisRP commented Dec 16, 2024

vbotbuildovich commented Dec 16, 2024

Retry command for Build#59811

IoannisRP commented Dec 17, 2024

BenPope left a comment

Choose a reason for hiding this comment

vbotbuildovich commented Dec 17, 2024

Retry command for Build#59848

IoannisRP commented Dec 17, 2024 • edited by jira bot Loading

michael-redpanda left a comment

Choose a reason for hiding this comment

vbotbuildovich commented Dec 18, 2024

vbotbuildovich commented Dec 18, 2024

vbotbuildovich commented Dec 18, 2024

dotnwat left a comment

Choose a reason for hiding this comment

IoannisRP commented Dec 11, 2024 •

edited

Loading

vbotbuildovich commented Dec 13, 2024 •

edited

Loading

vbotbuildovich commented Dec 13, 2024 •

edited

Loading

vbotbuildovich commented Dec 13, 2024 •

edited

Loading

Deflaimun commented Dec 16, 2024 •

edited

Loading

IoannisRP commented Dec 17, 2024 •

edited by jira bot

Loading