DRAFT: tracing: Classify probe misses #855

dwindsor · 2024-05-03T14:37:53Z

Currently, the kernel accounts for missed probes via a counter nmissed in struct kprobe and struct kretprobe. To better profile eBPF performance regarding missed probes, it would be useful to know why each miss occurred.

Add additional members to struct kprobe and struct kretprobe to tally individual misses by cause (e.g. nmissed_ss for drops due to single-stepping, etc).

There will be a companion patch to bpftool to extract these statistics in userspace.

Signed-off-by: David Windsor <dwindsor@gmail.com>

admsev · 2024-05-25T14:38:57Z

@gooroodev please review

gooroodev · 2024-05-25T14:40:28Z

1. Summary of Changes:

The pull request introduces the classification of missed probes in the kprobes subsystem. The key changes include:

Modifying the kprobes_inc_nmissed_count function to accept a new flags parameter that indicates the type of miss.
Adding new constants for different types of misses (KPROBE_MISSED_SS, KPROBE_MISSED_OBJPOOL, KPROBE_MISSED_RETHOOK, KPROBE_MISSED_CALLBACK, KPROBE_MISSED_FTRACE_CALLBACK).
Updating the kprobe and kretprobe structures to include counters for each type of miss.
Adjusting various functions to pass the appropriate miss type flag when calling kprobes_inc_nmissed_count.
Adding debug print statements to log the counts of different types of misses.

2. Issues, Bugs, or Typos:

Issue 1:

The flags check in kprobes_inc_nmissed_count uses else if statements, which means only one type of miss will be counted even if multiple flags are set. This might not be the intended behavior if multiple miss types can be flagged simultaneously.

Improvement:
Replace else if with if to ensure all relevant miss types are counted.

if (flags & KPROBE_MISSED_SS) {
    p->nmissed_ss++;
}
if (flags & KPROBE_MISSED_OBJPOOL) {
    p->nmissed_objpool++;
}
if (flags & KPROBE_MISSED_RETHOOK) {
    p->nmissed_rethook++;
}
if (flags & KPROBE_MISSED_CALLBACK) {
    p->nmissed_callback++;
}
if (flags & KPROBE_MISSED_FTRACE_CALLBACK) {
    p->nmissed_ftrace_callback++;
}

Issue 2:

The debug print statement in kprobes_inc_nmissed_count might be too verbose for production code.

Improvement:
Wrap the debug print statement with a conditional compilation flag to enable it only in debug builds.

#ifdef DEBUG
printk("ss=%lu:objpool=%lu;rethook=%lu;callback=%lu;ftrace=%lu\n", p->nmissed_ss, p->nmissed_objpool, p->nmissed_rethook, p->nmissed_callback, p->nmissed_ftrace_callback);
#endif

3. General Code Quality and Style Review:

Code Readability: The code is generally readable, but the addition of comments explaining the purpose of each miss type and the conditions under which they are incremented would improve understandability.
Consistency: The code maintains consistent formatting and naming conventions, adhering to common kernel coding practices.
Functionality: The changes seem to be logically correct, assuming the intended behavior is to classify and count different types of probe misses.
Performance: The added counters and conditional checks should have minimal performance impact, but the debug print statement could slow down execution if left in production.

Final Remarks:

The pull request effectively introduces a useful feature for classifying and counting different types of missed probes. With the proposed improvements, the code will be more robust and maintainable.

Yours, Gooroo.dev. To receive reviews automatically, install Github App

KernelPRBot · 2024-06-06T11:45:25Z

Hi @dwindsor!

Thanks for your contribution to the Linux kernel!

Linux kernel development happens on mailing lists, rather than on GitHub - this GitHub repository is a read-only mirror that isn't used for accepting contributions. So that your change can become part of Linux, please email it to us as a patch.

Sending patches isn't quite as simple as sending a pull request, but fortunately it is a well documented process.

Here's what to do:

Format your contribution according to kernel requirements
Decide who to send your contribution to
Set up your system to send your contribution as an email
Send your contribution and wait for feedback

How do I format my contribution?

The Linux kernel community is notoriously picky about how contributions are formatted and sent. Fortunately, they have documented their expectations.

Firstly, all contributions need to be formatted as patches. A patch is a plain text document showing the change you want to make to the code, and documenting why it is a good idea.

You can create patches with git format-patch.

Secondly, patches need 'commit messages', which is the human-friendly documentation explaining what the change is and why it's necessary.

Thirdly, changes have some technical requirements. There is a Linux kernel coding style, and there are licensing requirements you need to comply with.

Both of these are documented in the Submitting Patches documentation that is part of the kernel.

Note that you will almost certainly have to modify your existing git commits to satisfy these requirements. Don't worry: there are many guides on the internet for doing this.

Where do I send my contribution?

The Linux kernel is composed of a number of subsystems. These subsystems are maintained by different people, and have different mailing lists where they discuss proposed changes.

If you don't already know what subsystem your change belongs to, the get_maintainer.pl script in the kernel source can help you.

get_maintainer.pl will take the patch or patches you created in the previous step, and tell you who is responsible for them, and what mailing lists are used. You can also take a look at the MAINTAINERS file by hand.

Make sure that your list of recipients includes a mailing list. If you can't find a more specific mailing list, then LKML - the Linux Kernel Mailing List - is the place to send your patches.

It's not usually necessary to subscribe to the mailing list before you send the patches, but if you're interested in kernel development, subscribing to a subsystem mailing list is a good idea. (At this point, you probably don't need to subscribe to LKML - it is a very high traffic list with about a thousand messages per day, which is often not useful for beginners.)

How do I send my contribution?

Use git send-email, which will ensure that your patches are formatted in the standard manner. In order to use git send-email, you'll need to configure git to use your SMTP email server.

For more information about using git send-email, look at the Git documentation or type git help send-email. There are a number of useful guides and tutorials about git send-email that can be found on the internet.

How do I get help if I'm stuck?

Firstly, don't get discouraged! There are an enormous number of resources on the internet, and many kernel developers who would like to see you succeed.

Many issues - especially about how to use certain tools - can be resolved by using your favourite internet search engine.

If you can't find an answer, there are a few places you can turn:

Kernel Newbies - this website contains a lot of useful resources for new kernel developers.
The kernel documentation - see also the Documentation directory in the kernel tree.

If you get really, really stuck, you could try the owners of this bot, @daxtens and @ajdlinux. Please be aware that we do have full-time jobs, so we are almost certainly the slowest way to get answers!

I sent my patch - now what?

You wait.

You can check that your email has been received by checking the mailing list archives for the mailing list you sent your patch to. Messages may not be received instantly, so be patient. Kernel developers are generally very busy people, so it may take a few weeks before your patch is looked at.

Then, you keep waiting. Three things may happen:

You might get a response to your email. Often these will be comments, which may require you to make changes to your patch, or explain why your way is the best way. You should respond to these comments, and you may need to submit another revision of your patch to address the issues raised.
Your patch might be merged into the subsystem tree. Code that becomes part of Linux isn't merged into the main repository straight away - it first goes into the subsystem tree, which is managed by the subsystem maintainer. It is then batched up with a number of other changes sent to Linus for inclusion. (This process is described in some detail in the kernel development process guide).
Your patch might be ignored completely. This happens sometimes - don't take it personally. Here's what to do:
- Wait a bit more - patches often take several weeks to get a response; more if they were sent at a busy time.
- Kernel developers often silently ignore patches that break the rules. Check for obvious violations of the Submitting Patches guidelines, the style guidelines, and any other documentation you can find about your subsystem. Check that you're sending your patch to the right place.
- Try again later. When you resend it, don't add angry commentary, as that will get your patch ignored. It might also get you silently blacklisted.

Further information

Working with the kernel development community - the official documentation for new kernel contributors

Happy hacking!

This message was posted by a bot - if you have any questions or suggestions, please talk to my owners, @ajdlinux and @daxtens, or raise an issue at https://github.com/ajdlinux/KernelPRBot.

The fscache_cookie_lru_timer is initialized when the fscache module is inserted, but is not deleted when the fscache module is removed. If timer_reduce() is called before removing the fscache module, the fscache_cookie_lru_timer will be added to the timer list of the current cpu. Afterwards, a use-after-free will be triggered in the softIRQ after removing the fscache module, as follows: ================================================================== BUG: unable to handle page fault for address: fffffbfff803c9e9 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 21ffea067 P4D 21ffea067 PUD 21ffe6067 PMD 110a7c067 PTE 0 Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G W 6.11.0-rc3 torvalds#855 Tainted: [W]=WARN RIP: 0010:__run_timer_base.part.0+0x254/0x8a0 Call Trace: <IRQ> tmigr_handle_remote_up+0x627/0x810 __walk_groups.isra.0+0x47/0x140 tmigr_handle_remote+0x1fa/0x2f0 handle_softirqs+0x180/0x590 irq_exit_rcu+0x84/0xb0 sysvec_apic_timer_interrupt+0x6e/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 RIP: 0010:default_idle+0xf/0x20 default_idle_call+0x38/0x60 do_idle+0x2b5/0x300 cpu_startup_entry+0x54/0x60 start_secondary+0x20d/0x280 common_startup_64+0x13e/0x148 </TASK> Modules linked in: [last unloaded: netfs] ================================================================== Therefore delete fscache_cookie_lru_timer when removing the fscahe module. Fixes: 12bb21a ("fscache: Implement cookie user counting and resource pinning") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com>

The fscache_cookie_lru_timer is initialized when the fscache module is inserted, but is not deleted when the fscache module is removed. If timer_reduce() is called before removing the fscache module, the fscache_cookie_lru_timer will be added to the timer list of the current cpu. Afterwards, a use-after-free will be triggered in the softIRQ after removing the fscache module, as follows: ================================================================== BUG: unable to handle page fault for address: fffffbfff803c9e9 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 21ffea067 P4D 21ffea067 PUD 21ffe6067 PMD 110a7c067 PTE 0 Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G W 6.11.0-rc3 torvalds#855 Tainted: [W]=WARN RIP: 0010:__run_timer_base.part.0+0x254/0x8a0 Call Trace: <IRQ> tmigr_handle_remote_up+0x627/0x810 __walk_groups.isra.0+0x47/0x140 tmigr_handle_remote+0x1fa/0x2f0 handle_softirqs+0x180/0x590 irq_exit_rcu+0x84/0xb0 sysvec_apic_timer_interrupt+0x6e/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 RIP: 0010:default_idle+0xf/0x20 default_idle_call+0x38/0x60 do_idle+0x2b5/0x300 cpu_startup_entry+0x54/0x60 start_secondary+0x20d/0x280 common_startup_64+0x13e/0x148 </TASK> Modules linked in: [last unloaded: netfs] ================================================================== Therefore delete fscache_cookie_lru_timer when removing the fscahe module. Fixes: 12bb21a ("fscache: Implement cookie user counting and resource pinning") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240826112056.2458299-1-libaokun@huaweicloud.com Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>

commit 72a6e22 upstream. The fscache_cookie_lru_timer is initialized when the fscache module is inserted, but is not deleted when the fscache module is removed. If timer_reduce() is called before removing the fscache module, the fscache_cookie_lru_timer will be added to the timer list of the current cpu. Afterwards, a use-after-free will be triggered in the softIRQ after removing the fscache module, as follows: ================================================================== BUG: unable to handle page fault for address: fffffbfff803c9e9 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 21ffea067 P4D 21ffea067 PUD 21ffe6067 PMD 110a7c067 PTE 0 Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G W 6.11.0-rc3 torvalds#855 Tainted: [W]=WARN RIP: 0010:__run_timer_base.part.0+0x254/0x8a0 Call Trace: <IRQ> tmigr_handle_remote_up+0x627/0x810 __walk_groups.isra.0+0x47/0x140 tmigr_handle_remote+0x1fa/0x2f0 handle_softirqs+0x180/0x590 irq_exit_rcu+0x84/0xb0 sysvec_apic_timer_interrupt+0x6e/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 RIP: 0010:default_idle+0xf/0x20 default_idle_call+0x38/0x60 do_idle+0x2b5/0x300 cpu_startup_entry+0x54/0x60 start_secondary+0x20d/0x280 common_startup_64+0x13e/0x148 </TASK> Modules linked in: [last unloaded: netfs] ================================================================== Therefore delete fscache_cookie_lru_timer when removing the fscahe module. Fixes: 12bb21a ("fscache: Implement cookie user counting and resource pinning") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240826112056.2458299-1-libaokun@huaweicloud.com Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dwindsor and others added 3 commits May 2, 2024 13:44

tracing: classify kprobe misses

eb0981a

Signed-off-by: David Windsor <dwindsor@gmail.com>

Merge branch 'torvalds:master' into tracing-classify-drops

8ae3647

tracing: classify ftrace drops

179395a

Signed-off-by: David Windsor <dwindsor@gmail.com>

dwindsor changed the title ~~tracing: Classify probe misses~~ DRAFT: tracing: Classify probe misses May 3, 2024

kprobes: log drops via printk

2d38bc3

Signed-off-by: David Windsor <dwindsor@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAFT: tracing: Classify probe misses #855

DRAFT: tracing: Classify probe misses #855

dwindsor commented May 3, 2024

admsev commented May 25, 2024

gooroodev commented May 25, 2024

KernelPRBot commented Jun 6, 2024

DRAFT: tracing: Classify probe misses #855

Are you sure you want to change the base?

DRAFT: tracing: Classify probe misses #855

Conversation

dwindsor commented May 3, 2024

admsev commented May 25, 2024

gooroodev commented May 25, 2024

1. Summary of Changes:

2. Issues, Bugs, or Typos:

Issue 1:

Issue 2:

3. General Code Quality and Style Review:

Final Remarks:

KernelPRBot commented Jun 6, 2024

How do I format my contribution?

Where do I send my contribution?

How do I send my contribution?

How do I get help if I'm stuck?

I sent my patch - now what?

Further information