Skip to content

Conversation

@benkilimnik
Copy link
Member

@benkilimnik benkilimnik commented Nov 29, 2023

Summary: Dynamically increase the loop limit for newer kernels with higher instruction limits (1 million for kernels > 5.1) by 21x to reduce data loss and raise ingest. More details in #1755.

One open question is whether we want to add vizier flag to toggle this behavior in case there are unforseen performance bottlenecks for certain clusters.

Type of change: /kind feature

Test Plan: Existing targets + perf/demo tests outlined in #1755.

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
@benkilimnik benkilimnik changed the title Raise loop and chunk limit for kernels >5.1 by 21x Raise loop and chunk limit for kernels >5.1 Dec 6, 2023
@benkilimnik benkilimnik changed the title Raise loop and chunk limit for kernels >5.1 Raise loop and chunk limit for newer kernels Dec 6, 2023
Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
@benkilimnik benkilimnik requested a review from a team December 7, 2023 18:30
@benkilimnik benkilimnik marked this pull request as ready for review December 7, 2023 18:30
Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
Status SocketTraceConnector::InitBPF() {
// set BPF loop limit and chunk limit based on kernel version
auto kernel = system::GetCachedKernelVersion();
int loop_limit = 42;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be gflags.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to gflags

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
@benkilimnik benkilimnik requested a review from a team January 18, 2024 19:34
// Kernels >= 5.1 have higher BPF instruction limits (1 million for verifier).
// This enables a 21x increase to our loop and chunk limits
FLAGS_stirling_bpf_loop_limit = 882;
FLAGS_stirling_bpf_chunk_limit = 84;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you pick these numbers? How close are we to the instruction limit with these numbers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually raised the loop/chunk limits to see when the verifier would throw an error and chose values just below that upper bound, which is around 22x our previous limits.

bpf: Argument list too long. Program too large (... insns), at most 4096 insns

Confusingly, the bpf(2) syscall returns the same error whether the program size or its complexity exceeds the limits. See stackoverflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants