Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandaproxy: add max memory check for incoming requests #24537

Merged

Conversation

IoannisRP
Copy link
Contributor

@IoannisRP IoannisRP commented Dec 11, 2024

fixes: CORE-8335

In the pandaproxy server, if a request comes in that is larger than the total available memory, every other request is blocked. A test is added to make sure that a request bigger than the available memory returns an error.

For better monitoring, the following metrics have been added,
prefixed with [vectorized|redpanda]_[rest_proxy|schema_registry]_ :

Metric Type Description Labels Aggregation labels
inflight_requests_usage_ratio gauge Usage ratio of in-flight requests in the [rest_proxy|schema_registry] shard
inflight_requests_memory_usage_ratio gauge Memory usage ratio of in-flight requests in the [rest_proxy|schema_registry] in bytes shard
queued_requests_memory_blocked gauge Number of requests queued in [rest_proxy|schema_registry], due to memory limitations shard

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Improvements

  • Added metrics for pandaproxy resource usage.

@IoannisRP
Copy link
Contributor Author

/dt

@vbotbuildovich
Copy link
Collaborator

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59616#0193b7b4-8c87-4047-b52a-c85d1f6a6b2f:

"rptest.tests.datalake.compaction_gaps_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3"

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#59616

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/datalake/compaction_gaps_test.py::CompactionGapsTest.test_translation_no_gaps@{"cloud_storage_type":1}

@IoannisRP IoannisRP force-pushed the ik-pandaproxy-add-memcheck branch 3 times, most recently from c0293e3 to a8b2628 Compare December 13, 2024 11:07
@IoannisRP IoannisRP marked this pull request as ready for review December 13, 2024 11:10
@IoannisRP IoannisRP requested review from a team, oleiman, BenPope and michael-redpanda and removed request for a team December 13, 2024 11:11
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Dec 13, 2024

Retry command for Build#59721

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/pandaproxy_test.py::PandaProxyInvalidInputsTest.test_topic_produce_request_too_big
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":false}
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest.test_cloud_retention@{"cloud_storage_type":2,"max_consume_rate_mb":20}
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest.test_cloud_retention@{"cloud_storage_type":2,"max_consume_rate_mb":null}

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Dec 13, 2024

CI test results

test results on build#59721
test_id test_kind job_url test_status passed
rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=20.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f30-4082-9f96-97ee012061d2 FAIL 0/6
rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=None.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f2e-4cf0-b5d5-732e2009ef7b FAIL 0/6
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f2f-429e-ba83-7880fd23e05e FLAKY 2/6
rptest.tests.pandaproxy_test.PandaProxyInvalidInputsTest.test_topic_produce_request_too_big ducktape https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f2d-4362-99bd-847c422c8215 FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f30-4082-9f96-97ee012061d2 FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=True.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f30-4082-9f96-97ee012061d2 FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f30-4082-9f96-97ee012061d2 FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=True.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59721#0193bffd-6f30-4082-9f96-97ee012061d2 FAIL 0/1
test results on build#59729
test_id test_kind job_url test_status passed
rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=20.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18ba-4ad7-8a29-7aea1a0d6f63 FAIL 0/6
rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=None.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18b8-44a2-8b92-c1ad1f800d14 FAIL 0/6
rptest.tests.cloud_storage_scrubber_test.CloudStorageScrubberTest.test_scrubber.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18b7-4d6e-a3f1-0a6aea86eae2 FLAKY 5/6
rptest.tests.datalake.compaction_gaps_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18ba-4ad7-8a29-7aea1a0d6f63 FAIL 0/1
rptest.tests.maintenance_test.MaintenanceTest.test_maintenance_sticky.use_rpk=False ducktape https://buildkite.com/redpanda/redpanda/builds/59729#0193c0fd-d2e3-4a01-8254-bfd0e464fce8 FAIL 0/1
rptest.tests.pandaproxy_test.PandaProxyInvalidInputsTest.test_topic_produce_request_too_big ducktape https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18b7-4d6e-a3f1-0a6aea86eae2 FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18ba-4ad7-8a29-7aea1a0d6f63 FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=True.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18ba-4ad7-8a29-7aea1a0d6f63 FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18ba-4ad7-8a29-7aea1a0d6f63 FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=True.with_iceberg=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59729#0193c11a-18ba-4ad7-8a29-7aea1a0d6f63 FAIL 0/1
test results on build#59811
test_id test_kind job_url test_status passed
rptest.tests.e2e_shadow_indexing_test.ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59811#0193d07f-3cce-433d-8e7b-22c2ea6e16fa FAIL 0/6
test results on build#59848
test_id test_kind job_url test_status passed
rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=None.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59848#0193d486-cbb6-491c-9fde-409d02d26ddd FAIL 0/6

@IoannisRP IoannisRP force-pushed the ik-pandaproxy-add-memcheck branch from a8b2628 to 83e1b8d Compare December 13, 2024 15:55
@IoannisRP
Copy link
Contributor Author

changes in force-push:

  • move probe out of server
  • change metrics to report normalized values
  • removed header in response

@IoannisRP IoannisRP requested a review from BenPope December 13, 2024 17:35
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Dec 13, 2024

Retry command for Build#59729

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/maintenance_test.py::MaintenanceTest.test_maintenance_sticky@{"use_rpk":false}
tests/rptest/tests/pandaproxy_test.py::PandaProxyInvalidInputsTest.test_topic_produce_request_too_big
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":false,"with_iceberg":true,"with_tiered_storage":true}
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest.test_cloud_retention@{"cloud_storage_type":2,"max_consume_rate_mb":20}
tests/rptest/tests/datalake/compaction_gaps_test.py::CompactionGapsTest.test_translation_no_gaps@{"cloud_storage_type":1}
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest.test_cloud_retention@{"cloud_storage_type":2,"max_consume_rate_mb":null}

@IoannisRP IoannisRP force-pushed the ik-pandaproxy-add-memcheck branch from 83e1b8d to 55c3907 Compare December 16, 2024 12:19
@IoannisRP
Copy link
Contributor Author

Changes in force-push:

  • Probe gets created before listeners.
  • Probe gets reset after requests have finished.
  • Renamed ratio metrics.
  • Fixed dt test to follow updated kafka service memory limits. Also changed test to operate on 256Mb instead of 128Mb to be less restricted.

@IoannisRP IoannisRP requested a review from BenPope December 16, 2024 12:24
@IoannisRP IoannisRP force-pushed the ik-pandaproxy-add-memcheck branch from 55c3907 to 1c5c454 Compare December 16, 2024 12:29
@IoannisRP
Copy link
Contributor Author

Changes in force-push:

  • removed a few headers that sneaked in

@BenPope BenPope requested a review from a team December 16, 2024 13:17
@Deflaimun
Copy link
Contributor

Deflaimun commented Dec 16, 2024

Hi guys. I asked other doc writers to take a look.

While we're here, I see the labels are defined as "shards". How does that work? It's because they have shard_local_cfg() attached to them? Or labels are something else entirely?

Feel free to DM me if you want to talk about it.

@BenPope
Copy link
Member

BenPope commented Dec 16, 2024

While we're here, I see the labels are defined as "shards". How does that work? It's because they have shard_local_cfg() attached to them? Or labels are something else entirely?

It's a number representing the logical core that the Redpanda reactor is running on. When Redpanda is given 4 CPU Cores there will be 4 reactors, one on each of the CPU cores, e.g.: --smp=4 or -c4, there will be shard 0 to shard 3.

@JakeSCahill
Copy link
Contributor

I'll add these metrics to our docs for 24.3 and backport to 24.2: https://redpandadata.atlassian.net/browse/DOC-869

I assume they will go out in the next patch release for each.

@IoannisRP IoannisRP force-pushed the ik-pandaproxy-add-memcheck branch from 1c5c454 to 0919453 Compare December 16, 2024 15:42
@IoannisRP
Copy link
Contributor Author

Changes in force-push:

  • changed name and description for memory metric

@IoannisRP IoannisRP requested a review from BenPope December 16, 2024 15:43
@vbotbuildovich
Copy link
Collaborator

Retry command for Build#59811

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/e2e_shadow_indexing_test.py::ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy@{"cloud_storage_type":2,"short_retention":true}

@IoannisRP IoannisRP force-pushed the ik-pandaproxy-add-memcheck branch from 0919453 to 067fc7b Compare December 17, 2024 10:52
@IoannisRP
Copy link
Contributor Author

changes in force-push:

  • reworded semaphore metric descriptions

@IoannisRP IoannisRP requested a review from BenPope December 17, 2024 10:53
Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the cover letter to reflect the latest changes.

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#59848

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest.test_cloud_retention@{"cloud_storage_type":2,"max_consume_rate_mb":null}

@IoannisRP
Copy link
Contributor Author

IoannisRP commented Dec 17, 2024

CI Failures:

Copy link
Contributor

@michael-redpanda michael-redpanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about aggregation of metrics

@michael-redpanda michael-redpanda merged commit 151f46e into redpanda-data:dev Dec 18, 2024
16 of 19 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants