-
Notifications
You must be signed in to change notification settings - Fork 59.6k
UML #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
UML #21
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mturquette
pushed a commit
to mturquette/linux
that referenced
this pull request
Aug 25, 2012
…d reasons
We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:
PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14"
#0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
#1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
#2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
#3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
#4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
#5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
torvalds#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
torvalds#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
torvalds#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
torvalds#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96
torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca
rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.
Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
bootc
pushed a commit
to bootc/linux
that referenced
this pull request
Aug 25, 2012
…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 torvalds#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 torvalds#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
liubogithub
pushed a commit
to liubogithub/btrfs-work
that referenced
this pull request
Aug 29, 2012
When hot-adding a CPU, the system outputs following messages since node_to_cpumask_map[2] was not allocated memory. Booting Node 2 Processor 32 APIC 0xc0 node_to_cpumask_map[2] NULL Pid: 0, comm: swapper/32 Tainted: G A 3.3.5-acd torvalds#21 Call Trace: [<ffffffff81048845>] debug_cpumask_set_cpu+0x155/0x160 [<ffffffff8105e28a>] ? add_timer_on+0xaa/0x120 [<ffffffff8150665f>] numa_add_cpu+0x1e/0x22 [<ffffffff815020bb>] identify_cpu+0x1df/0x1e4 [<ffffffff815020d6>] identify_econdary_cpu+0x16/0x1d [<ffffffff81504614>] smp_store_cpu_info+0x3c/0x3e [<ffffffff81505263>] smp_callin+0x139/0x1be [<ffffffff815052fb>] start_secondary+0x13/0xeb The reason is that the bit of node 2 was not set at numa_nodes_parsed. numa_nodes_parsed is set by only acpi_numa_processor_affinity_init / acpi_numa_x2apic_affinity_init. Thus even if hot-added memory which is same PXM as hot-added CPU is written in ACPI SRAT Table, if the hot-added CPU is not written in ACPI SRAT table, numa_nodes_parsed is not set. But according to ACPI Spec Rev 5.0, it says about ACPI SRAT table as follows: This optional table provides information that allows OSPM to associate processors and memory ranges, including ranges of memory provided by hot-added memory devices, with system localities / proximity domains and clock domains. It means that ACPI SRAT table only provides information for CPUs present at boot time and for memory including hot-added memory. So hot-added memory is written in ACPI SRAT table, but hot-added CPU is not written in it. Thus numa_nodes_parsed should be set by not only acpi_numa_processor_affinity_init / acpi_numa_x2apic_affinity_init but also acpi_numa_memory_affinity_init for the case. Additionally, if system has cpuless memory node, acpi_numa_processor_affinity_init / acpi_numa_x2apic_affinity_init cannot set numa_nodes_parseds since these functions cannot find cpu description for the node. In this case, numa_nodes_parsed needs to be set by acpi_numa_memory_affinity_init. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: liuj97@gmail.com Cc: kosaki.motohiro@gmail.com Link: http://lkml.kernel.org/r/4FCC2098.4030007@jp.fujitsu.com [ merged it ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
liubogithub
pushed a commit
to liubogithub/btrfs-work
that referenced
this pull request
Aug 29, 2012
The warning below triggers on AMD MCM packages because physical package IDs on the cores of a _physical_ socket are the same. I.e., this field says which CPUs belong to the same physical package. However, the same two CPUs belong to two different internal, i.e. "logical" nodes in the same physical socket which is reflected in the CPU-to-node map on x86 with NUMA. Which makes this check wrong on the above topologies so circumvent it. [ 0.444413] Booting Node 0, Processors #1 #2 #3 #4 #5 Ok. [ 0.461388] ------------[ cut here ]------------ [ 0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81() [ 0.473960] Hardware name: Dinar [ 0.477170] sched: CPU torvalds#6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [ 0.486860] Booting Node 1, Processors torvalds#6 [ 0.491104] Modules linked in: [ 0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1 [ 0.499510] Call Trace: [ 0.501946] [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81 [ 0.508185] [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d [ 0.514163] [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48 [ 0.519881] [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81 [ 0.525943] [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371 [ 0.532004] [<ffffffff8144c4ee>] start_secondary+0x19a/0x218 [ 0.537729] ---[ end trace 4eaa2a86a8e2da22 ]--- [ 0.628197] torvalds#7 torvalds#8 torvalds#9 torvalds#10 torvalds#11 Ok. [ 0.807108] Booting Node 3, Processors torvalds#12 torvalds#13 torvalds#14 torvalds#15 torvalds#16 torvalds#17 Ok. [ 0.897587] Booting Node 2, Processors torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 Ok. [ 0.917443] Brought up 24 CPUs We ran a topology sanity check test we have here on it and it all looks ok... hopefully :). Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120529135442.GE29157@aftab.osrc.amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
jbrandeb
pushed a commit
to jbrandeb/linux
that referenced
this pull request
Aug 29, 2012
…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] torvalds#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 torvalds#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 torvalds#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 torvalds#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
shr-project
pushed a commit
to shr-distribution/linux
that referenced
this pull request
Aug 30, 2012
…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RobertCNelson
pushed a commit
to RobertCNelson/linux
that referenced
this pull request
Aug 30, 2012
…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] torvalds#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 torvalds#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 torvalds#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 torvalds#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton at redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust at netapp.com> Signed-off-by: Ben Hutchings <ben at decadent.org.uk>
Quarx2k
pushed a commit
to Quarx2k/linux-allwinner
that referenced
this pull request
Sep 9, 2012
…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] torvalds#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 torvalds#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 torvalds#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 torvalds#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
to koenkooi/linux
that referenced
this pull request
Sep 11, 2012
…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
baerwolf
pushed a commit
to baerwolf/linux-stephan
that referenced
this pull request
Sep 12, 2012
commit a3f83ab upstream. At a boot time I observed following bug: BUG: unable to handle kernel paging request at ffff8800a4244000 IP: [<ffffffff81275b5b>] memcpy+0xb/0x120 PGD 1816063 PUD 1fe7d067 PMD 1ff9f067 PTE 80000000a4244160 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: btusb bluetooth brcmsmac brcmutil crc8 cordic b43 radeon(+) mac80211 cfg80211 ttm ohci_hcd drm_kms_helper rfkill drm ssb agpgart mmc_core sp5100_tco video battery ac thermal processor rtc_cmos thermal_sys snd_hda_codec_hdmi joydev snd_hda_codec_conexant button bcma pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm shpchp pcmcia_core k8temp snd_timer atl1c snd psmouse hwmon i2c_piix4 i2c_algo_bit soundcore evdev i2c_core ehci_hcd sg serio_raw snd_page_alloc loop btrfs Pid: 1008, comm: modprobe Not tainted 3.3.0-rc1 torvalds#21 LENOVO 20046 /AMD CRB RIP: 0010:[<ffffffff81275b5b>] [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP: 0018:ffff8800aa72db00 EFLAGS: 00010246 RAX: ffff8800a4150000 RBX: 0000000000001000 RCX: 0000000000000087 RDX: 0000000000000000 RSI: ffff8800a4244000 RDI: ffff8800a4150bc8 RBP: ffff8800aa72db78 R08: 0000000000000010 R09: ffffffff8174bbec R10: ffffffff812ee010 R11: 0000000000000001 R12: 0000000000001000 R13: 0000000000010000 R14: ffff8800a4140000 R15: ffff8800aaba1800 FS: 00007ff9a3bd4720(0000) GS:ffff8800afa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8800a4244000 CR3: 00000000a9c18000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1008, threadinfo ffff8800aa72c000, task ffff8800aa0e4000) Stack: ffffffffa04e7c7b 0000000000000001 0000000000010000 ffff8800aa72db28 ffffffff00000001 0000000000001000 ffffffff8113cbef 0000000000000020 ffff8800a4243420 ffff880000000002 ffff8800aa72db08 ffff8800a9d42000 Call Trace: [<ffffffffa04e7c7b>] ? radeon_atrm_get_bios_chunk+0x8b/0xd0 [radeon] [<ffffffff8113cbef>] ? kmalloc_order_trace+0x3f/0xb0 [<ffffffffa04a9298>] radeon_get_bios+0x68/0x2f0 [radeon] [<ffffffffa04c7a30>] rv770_init+0x40/0x280 [radeon] [<ffffffffa047d740>] radeon_device_init+0x560/0x600 [radeon] [<ffffffffa047ef4f>] radeon_driver_load_kms+0xaf/0x170 [radeon] [<ffffffffa043cdde>] drm_get_pci_dev+0x18e/0x2c0 [drm] [<ffffffffa04e7e95>] radeon_pci_probe+0xad/0xb5 [radeon] [<ffffffff81296c5f>] local_pci_probe+0x5f/0xd0 [<ffffffff81297418>] pci_device_probe+0x88/0xb0 [<ffffffff813417aa>] ? driver_sysfs_add+0x7a/0xb0 [<ffffffff813418d8>] really_probe+0x68/0x180 [<ffffffff81341be5>] driver_probe_device+0x45/0x70 [<ffffffff81341cb3>] __driver_attach+0xa3/0xb0 [<ffffffff81341c10>] ? driver_probe_device+0x70/0x70 [<ffffffff813400ce>] bus_for_each_dev+0x5e/0x90 [<ffffffff8134172e>] driver_attach+0x1e/0x20 [<ffffffff81341298>] bus_add_driver+0xc8/0x280 [<ffffffff813422c6>] driver_register+0x76/0x140 [<ffffffff812976d6>] __pci_register_driver+0x66/0xe0 [<ffffffffa043d021>] drm_pci_init+0x111/0x120 [drm] [<ffffffff8133c67a>] ? vga_switcheroo_register_handler+0x3a/0x60 [<ffffffffa0229000>] ? 0xffffffffa0228fff [<ffffffffa02290ec>] radeon_init+0xec/0xee [radeon] [<ffffffff810002f2>] do_one_initcall+0x42/0x180 [<ffffffff8109d8d2>] sys_init_module+0x92/0x1e0 [<ffffffff815407a9>] system_call_fastpath+0x16/0x1b Code: 58 2a 43 50 88 43 4e 48 83 c4 08 5b c9 c3 66 90 e8 cb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP <ffff8800aa72db00> CR2: ffff8800a4244000 ---[ end trace fcffa1599cf56382 ]--- Call to acpi_evaluate_object() not always returns 4096 bytes chunks, on my system it can return 2048 bytes chunk, so pass the length of retrieved chunk to memcpy(), not the length of the recieving buffer. Signed-off-by: Igor Murzov <e-mail@date.by> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
RobertCNelson
pushed a commit
to RobertCNelson/linux
that referenced
this pull request
Sep 12, 2012
commit a3f83ab upstream. At a boot time I observed following bug: BUG: unable to handle kernel paging request at ffff8800a4244000 IP: [<ffffffff81275b5b>] memcpy+0xb/0x120 PGD 1816063 PUD 1fe7d067 PMD 1ff9f067 PTE 80000000a4244160 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: btusb bluetooth brcmsmac brcmutil crc8 cordic b43 radeon(+) mac80211 cfg80211 ttm ohci_hcd drm_kms_helper rfkill drm ssb agpgart mmc_core sp5100_tco video battery ac thermal processor rtc_cmos thermal_sys snd_hda_codec_hdmi joydev snd_hda_codec_conexant button bcma pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm shpchp pcmcia_core k8temp snd_timer atl1c snd psmouse hwmon i2c_piix4 i2c_algo_bit soundcore evdev i2c_core ehci_hcd sg serio_raw snd_page_alloc loop btrfs Pid: 1008, comm: modprobe Not tainted 3.3.0-rc1 torvalds#21 LENOVO 20046 /AMD CRB RIP: 0010:[<ffffffff81275b5b>] [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP: 0018:ffff8800aa72db00 EFLAGS: 00010246 RAX: ffff8800a4150000 RBX: 0000000000001000 RCX: 0000000000000087 RDX: 0000000000000000 RSI: ffff8800a4244000 RDI: ffff8800a4150bc8 RBP: ffff8800aa72db78 R08: 0000000000000010 R09: ffffffff8174bbec R10: ffffffff812ee010 R11: 0000000000000001 R12: 0000000000001000 R13: 0000000000010000 R14: ffff8800a4140000 R15: ffff8800aaba1800 FS: 00007ff9a3bd4720(0000) GS:ffff8800afa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8800a4244000 CR3: 00000000a9c18000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1008, threadinfo ffff8800aa72c000, task ffff8800aa0e4000) Stack: ffffffffa04e7c7b 0000000000000001 0000000000010000 ffff8800aa72db28 ffffffff00000001 0000000000001000 ffffffff8113cbef 0000000000000020 ffff8800a4243420 ffff880000000002 ffff8800aa72db08 ffff8800a9d42000 Call Trace: [<ffffffffa04e7c7b>] ? radeon_atrm_get_bios_chunk+0x8b/0xd0 [radeon] [<ffffffff8113cbef>] ? kmalloc_order_trace+0x3f/0xb0 [<ffffffffa04a9298>] radeon_get_bios+0x68/0x2f0 [radeon] [<ffffffffa04c7a30>] rv770_init+0x40/0x280 [radeon] [<ffffffffa047d740>] radeon_device_init+0x560/0x600 [radeon] [<ffffffffa047ef4f>] radeon_driver_load_kms+0xaf/0x170 [radeon] [<ffffffffa043cdde>] drm_get_pci_dev+0x18e/0x2c0 [drm] [<ffffffffa04e7e95>] radeon_pci_probe+0xad/0xb5 [radeon] [<ffffffff81296c5f>] local_pci_probe+0x5f/0xd0 [<ffffffff81297418>] pci_device_probe+0x88/0xb0 [<ffffffff813417aa>] ? driver_sysfs_add+0x7a/0xb0 [<ffffffff813418d8>] really_probe+0x68/0x180 [<ffffffff81341be5>] driver_probe_device+0x45/0x70 [<ffffffff81341cb3>] __driver_attach+0xa3/0xb0 [<ffffffff81341c10>] ? driver_probe_device+0x70/0x70 [<ffffffff813400ce>] bus_for_each_dev+0x5e/0x90 [<ffffffff8134172e>] driver_attach+0x1e/0x20 [<ffffffff81341298>] bus_add_driver+0xc8/0x280 [<ffffffff813422c6>] driver_register+0x76/0x140 [<ffffffff812976d6>] __pci_register_driver+0x66/0xe0 [<ffffffffa043d021>] drm_pci_init+0x111/0x120 [drm] [<ffffffff8133c67a>] ? vga_switcheroo_register_handler+0x3a/0x60 [<ffffffffa0229000>] ? 0xffffffffa0228fff [<ffffffffa02290ec>] radeon_init+0xec/0xee [radeon] [<ffffffff810002f2>] do_one_initcall+0x42/0x180 [<ffffffff8109d8d2>] sys_init_module+0x92/0x1e0 [<ffffffff815407a9>] system_call_fastpath+0x16/0x1b Code: 58 2a 43 50 88 43 4e 48 83 c4 08 5b c9 c3 66 90 e8 cb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP <ffff8800aa72db00> CR2: ffff8800a4244000 ---[ end trace fcffa1599cf56382 ]--- Call to acpi_evaluate_object() not always returns 4096 bytes chunks, on my system it can return 2048 bytes chunk, so pass the length of retrieved chunk to memcpy(), not the length of the recieving buffer. Signed-off-by: Igor Murzov <e-mail@date.by> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
hno
pushed a commit
to hno/linux
that referenced
this pull request
Sep 12, 2012
Fixes issue torvalds#21 on amery/linux-allwinner
hno
pushed a commit
to hno/linux
that referenced
this pull request
Sep 12, 2012
Fix build using O= (issue torvalds#21) and inline build on CM9
koenkooi
pushed a commit
to koenkooi/linux
that referenced
this pull request
Oct 2, 2012
…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
koenkooi
pushed a commit
to koenkooi/linux
that referenced
this pull request
Oct 2, 2012
commit a3f83ab upstream. At a boot time I observed following bug: BUG: unable to handle kernel paging request at ffff8800a4244000 IP: [<ffffffff81275b5b>] memcpy+0xb/0x120 PGD 1816063 PUD 1fe7d067 PMD 1ff9f067 PTE 80000000a4244160 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: btusb bluetooth brcmsmac brcmutil crc8 cordic b43 radeon(+) mac80211 cfg80211 ttm ohci_hcd drm_kms_helper rfkill drm ssb agpgart mmc_core sp5100_tco video battery ac thermal processor rtc_cmos thermal_sys snd_hda_codec_hdmi joydev snd_hda_codec_conexant button bcma pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm shpchp pcmcia_core k8temp snd_timer atl1c snd psmouse hwmon i2c_piix4 i2c_algo_bit soundcore evdev i2c_core ehci_hcd sg serio_raw snd_page_alloc loop btrfs Pid: 1008, comm: modprobe Not tainted 3.3.0-rc1 torvalds#21 LENOVO 20046 /AMD CRB RIP: 0010:[<ffffffff81275b5b>] [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP: 0018:ffff8800aa72db00 EFLAGS: 00010246 RAX: ffff8800a4150000 RBX: 0000000000001000 RCX: 0000000000000087 RDX: 0000000000000000 RSI: ffff8800a4244000 RDI: ffff8800a4150bc8 RBP: ffff8800aa72db78 R08: 0000000000000010 R09: ffffffff8174bbec R10: ffffffff812ee010 R11: 0000000000000001 R12: 0000000000001000 R13: 0000000000010000 R14: ffff8800a4140000 R15: ffff8800aaba1800 FS: 00007ff9a3bd4720(0000) GS:ffff8800afa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8800a4244000 CR3: 00000000a9c18000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1008, threadinfo ffff8800aa72c000, task ffff8800aa0e4000) Stack: ffffffffa04e7c7b 0000000000000001 0000000000010000 ffff8800aa72db28 ffffffff00000001 0000000000001000 ffffffff8113cbef 0000000000000020 ffff8800a4243420 ffff880000000002 ffff8800aa72db08 ffff8800a9d42000 Call Trace: [<ffffffffa04e7c7b>] ? radeon_atrm_get_bios_chunk+0x8b/0xd0 [radeon] [<ffffffff8113cbef>] ? kmalloc_order_trace+0x3f/0xb0 [<ffffffffa04a9298>] radeon_get_bios+0x68/0x2f0 [radeon] [<ffffffffa04c7a30>] rv770_init+0x40/0x280 [radeon] [<ffffffffa047d740>] radeon_device_init+0x560/0x600 [radeon] [<ffffffffa047ef4f>] radeon_driver_load_kms+0xaf/0x170 [radeon] [<ffffffffa043cdde>] drm_get_pci_dev+0x18e/0x2c0 [drm] [<ffffffffa04e7e95>] radeon_pci_probe+0xad/0xb5 [radeon] [<ffffffff81296c5f>] local_pci_probe+0x5f/0xd0 [<ffffffff81297418>] pci_device_probe+0x88/0xb0 [<ffffffff813417aa>] ? driver_sysfs_add+0x7a/0xb0 [<ffffffff813418d8>] really_probe+0x68/0x180 [<ffffffff81341be5>] driver_probe_device+0x45/0x70 [<ffffffff81341cb3>] __driver_attach+0xa3/0xb0 [<ffffffff81341c10>] ? driver_probe_device+0x70/0x70 [<ffffffff813400ce>] bus_for_each_dev+0x5e/0x90 [<ffffffff8134172e>] driver_attach+0x1e/0x20 [<ffffffff81341298>] bus_add_driver+0xc8/0x280 [<ffffffff813422c6>] driver_register+0x76/0x140 [<ffffffff812976d6>] __pci_register_driver+0x66/0xe0 [<ffffffffa043d021>] drm_pci_init+0x111/0x120 [drm] [<ffffffff8133c67a>] ? vga_switcheroo_register_handler+0x3a/0x60 [<ffffffffa0229000>] ? 0xffffffffa0228fff [<ffffffffa02290ec>] radeon_init+0xec/0xee [radeon] [<ffffffff810002f2>] do_one_initcall+0x42/0x180 [<ffffffff8109d8d2>] sys_init_module+0x92/0x1e0 [<ffffffff815407a9>] system_call_fastpath+0x16/0x1b Code: 58 2a 43 50 88 43 4e 48 83 c4 08 5b c9 c3 66 90 e8 cb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP <ffff8800aa72db00> CR2: ffff8800a4244000 ---[ end trace fcffa1599cf56382 ]--- Call to acpi_evaluate_object() not always returns 4096 bytes chunks, on my system it can return 2048 bytes chunk, so pass the length of retrieved chunk to memcpy(), not the length of the recieving buffer. Signed-off-by: Igor Murzov <e-mail@date.by> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
koenkooi
pushed a commit
to koenkooi/linux
that referenced
this pull request
Oct 4, 2012
…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
koenkooi
pushed a commit
to koenkooi/linux
that referenced
this pull request
Oct 4, 2012
commit a3f83ab upstream. At a boot time I observed following bug: BUG: unable to handle kernel paging request at ffff8800a4244000 IP: [<ffffffff81275b5b>] memcpy+0xb/0x120 PGD 1816063 PUD 1fe7d067 PMD 1ff9f067 PTE 80000000a4244160 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: btusb bluetooth brcmsmac brcmutil crc8 cordic b43 radeon(+) mac80211 cfg80211 ttm ohci_hcd drm_kms_helper rfkill drm ssb agpgart mmc_core sp5100_tco video battery ac thermal processor rtc_cmos thermal_sys snd_hda_codec_hdmi joydev snd_hda_codec_conexant button bcma pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm shpchp pcmcia_core k8temp snd_timer atl1c snd psmouse hwmon i2c_piix4 i2c_algo_bit soundcore evdev i2c_core ehci_hcd sg serio_raw snd_page_alloc loop btrfs Pid: 1008, comm: modprobe Not tainted 3.3.0-rc1 torvalds#21 LENOVO 20046 /AMD CRB RIP: 0010:[<ffffffff81275b5b>] [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP: 0018:ffff8800aa72db00 EFLAGS: 00010246 RAX: ffff8800a4150000 RBX: 0000000000001000 RCX: 0000000000000087 RDX: 0000000000000000 RSI: ffff8800a4244000 RDI: ffff8800a4150bc8 RBP: ffff8800aa72db78 R08: 0000000000000010 R09: ffffffff8174bbec R10: ffffffff812ee010 R11: 0000000000000001 R12: 0000000000001000 R13: 0000000000010000 R14: ffff8800a4140000 R15: ffff8800aaba1800 FS: 00007ff9a3bd4720(0000) GS:ffff8800afa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8800a4244000 CR3: 00000000a9c18000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1008, threadinfo ffff8800aa72c000, task ffff8800aa0e4000) Stack: ffffffffa04e7c7b 0000000000000001 0000000000010000 ffff8800aa72db28 ffffffff00000001 0000000000001000 ffffffff8113cbef 0000000000000020 ffff8800a4243420 ffff880000000002 ffff8800aa72db08 ffff8800a9d42000 Call Trace: [<ffffffffa04e7c7b>] ? radeon_atrm_get_bios_chunk+0x8b/0xd0 [radeon] [<ffffffff8113cbef>] ? kmalloc_order_trace+0x3f/0xb0 [<ffffffffa04a9298>] radeon_get_bios+0x68/0x2f0 [radeon] [<ffffffffa04c7a30>] rv770_init+0x40/0x280 [radeon] [<ffffffffa047d740>] radeon_device_init+0x560/0x600 [radeon] [<ffffffffa047ef4f>] radeon_driver_load_kms+0xaf/0x170 [radeon] [<ffffffffa043cdde>] drm_get_pci_dev+0x18e/0x2c0 [drm] [<ffffffffa04e7e95>] radeon_pci_probe+0xad/0xb5 [radeon] [<ffffffff81296c5f>] local_pci_probe+0x5f/0xd0 [<ffffffff81297418>] pci_device_probe+0x88/0xb0 [<ffffffff813417aa>] ? driver_sysfs_add+0x7a/0xb0 [<ffffffff813418d8>] really_probe+0x68/0x180 [<ffffffff81341be5>] driver_probe_device+0x45/0x70 [<ffffffff81341cb3>] __driver_attach+0xa3/0xb0 [<ffffffff81341c10>] ? driver_probe_device+0x70/0x70 [<ffffffff813400ce>] bus_for_each_dev+0x5e/0x90 [<ffffffff8134172e>] driver_attach+0x1e/0x20 [<ffffffff81341298>] bus_add_driver+0xc8/0x280 [<ffffffff813422c6>] driver_register+0x76/0x140 [<ffffffff812976d6>] __pci_register_driver+0x66/0xe0 [<ffffffffa043d021>] drm_pci_init+0x111/0x120 [drm] [<ffffffff8133c67a>] ? vga_switcheroo_register_handler+0x3a/0x60 [<ffffffffa0229000>] ? 0xffffffffa0228fff [<ffffffffa02290ec>] radeon_init+0xec/0xee [radeon] [<ffffffff810002f2>] do_one_initcall+0x42/0x180 [<ffffffff8109d8d2>] sys_init_module+0x92/0x1e0 [<ffffffff815407a9>] system_call_fastpath+0x16/0x1b Code: 58 2a 43 50 88 43 4e 48 83 c4 08 5b c9 c3 66 90 e8 cb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP <ffff8800aa72db00> CR2: ffff8800a4244000 ---[ end trace fcffa1599cf56382 ]--- Call to acpi_evaluate_object() not always returns 4096 bytes chunks, on my system it can return 2048 bytes chunk, so pass the length of retrieved chunk to memcpy(), not the length of the recieving buffer. Signed-off-by: Igor Murzov <e-mail@date.by> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
noamc
referenced
this pull request
in Mellanox/linux
Oct 16, 2012
…rq() workaround for clockevent Timer
request_irq() for TIMER0 failing on CPU1
Ideally we want to use the request_percpu_irq( ) / enable_percpu_irq()
calls from GENERIC_IRQ framework, however that seems to be faltering
even on the boot cpu at the time of first interrupt.
Until that is resolved (with Thomas G), we need to pretend that
TIMER0 is IRQF_SHARED. This also requires yet another hack of explicitly
unmasking the IRQ on that CPU.
Query sent to Thomas Gleixner
======================>8====================================
In a SMP setup, each ARC700 CPU has a in-core TIMER, hooked up to
private IRQ 3 of respective CPU and would serve as the local
clock_event_device.
request_irq( ) for my first CPU which succeeds, looks roughly as
follows:
void __cpuinit arc_clockevent_init(void)
{
int rc;
unsigned int cpu = smp_processor_id();
struct clock_event_device *evt = &per_cpu(arc_clockevent_device,
cpu);
....
rc = request_irq(TIMER0_INT, timer_irq_handler,
IRQF_TIMER | IRQF_DISABLED | IRQF_PERCPU,
"Timer0 (clock-evt-dev)", evt);
....
The exact same call, when done from 2nd CPU fails, as it wants to see
IRQF_SHARED which is semantically not correct, since IRQ is not really
shared, it is a private instance (albeit same value), per cpu.
I figured that the right APIs for our case is the pair:
(request|enable)_percpu_irq to be called for both CPUs, with a prior one
time call to irq_set_percpu_devid(). Is that correct?
Assuming it is, the trouble now is that, even on the first CPU,
handle_level_irq( ) is bailing out w/o calling handle_irq_event()
because irqd_irq_disabled( ) is true. This in turn happens because,
irq_set_percpu_devid(), our much needed init routine, sets IRQ_NOAUTOEN
causing __setup_irq( ) to skip calling irq_startup() => irq_enable()
which would have cleared IRQD_IRQ_DISABLED.
While enable_percpu_irq( ), could have fixed this, it only seems to be
unmasking IRQ at device level, it is not clearing the above flag.
I tried calling enable_irq( ) right after, but that doesn't seem to help
either.
What API am I missing here, to enable the irqd machinery, or am I seeing
a bug where enable_percpu_irq( ) call-chain should somehow be doing it.
======================>8====================================
This needs to be reverted and replaced with right calls once ThomasG
responds to my query.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
noamc
referenced
this pull request
in Mellanox/linux
Oct 16, 2012
…equest_irq() workaround for clockevent Timer" This reverts commit 2985184. Next commit uses the correct APIs, so we no longer need this hack
koenkooi
pushed a commit
to koenkooi/linux
that referenced
this pull request
Oct 17, 2012
…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
koenkooi
pushed a commit
to koenkooi/linux
that referenced
this pull request
Oct 17, 2012
commit a3f83ab upstream. At a boot time I observed following bug: BUG: unable to handle kernel paging request at ffff8800a4244000 IP: [<ffffffff81275b5b>] memcpy+0xb/0x120 PGD 1816063 PUD 1fe7d067 PMD 1ff9f067 PTE 80000000a4244160 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: btusb bluetooth brcmsmac brcmutil crc8 cordic b43 radeon(+) mac80211 cfg80211 ttm ohci_hcd drm_kms_helper rfkill drm ssb agpgart mmc_core sp5100_tco video battery ac thermal processor rtc_cmos thermal_sys snd_hda_codec_hdmi joydev snd_hda_codec_conexant button bcma pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm shpchp pcmcia_core k8temp snd_timer atl1c snd psmouse hwmon i2c_piix4 i2c_algo_bit soundcore evdev i2c_core ehci_hcd sg serio_raw snd_page_alloc loop btrfs Pid: 1008, comm: modprobe Not tainted 3.3.0-rc1 torvalds#21 LENOVO 20046 /AMD CRB RIP: 0010:[<ffffffff81275b5b>] [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP: 0018:ffff8800aa72db00 EFLAGS: 00010246 RAX: ffff8800a4150000 RBX: 0000000000001000 RCX: 0000000000000087 RDX: 0000000000000000 RSI: ffff8800a4244000 RDI: ffff8800a4150bc8 RBP: ffff8800aa72db78 R08: 0000000000000010 R09: ffffffff8174bbec R10: ffffffff812ee010 R11: 0000000000000001 R12: 0000000000001000 R13: 0000000000010000 R14: ffff8800a4140000 R15: ffff8800aaba1800 FS: 00007ff9a3bd4720(0000) GS:ffff8800afa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8800a4244000 CR3: 00000000a9c18000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1008, threadinfo ffff8800aa72c000, task ffff8800aa0e4000) Stack: ffffffffa04e7c7b 0000000000000001 0000000000010000 ffff8800aa72db28 ffffffff00000001 0000000000001000 ffffffff8113cbef 0000000000000020 ffff8800a4243420 ffff880000000002 ffff8800aa72db08 ffff8800a9d42000 Call Trace: [<ffffffffa04e7c7b>] ? radeon_atrm_get_bios_chunk+0x8b/0xd0 [radeon] [<ffffffff8113cbef>] ? kmalloc_order_trace+0x3f/0xb0 [<ffffffffa04a9298>] radeon_get_bios+0x68/0x2f0 [radeon] [<ffffffffa04c7a30>] rv770_init+0x40/0x280 [radeon] [<ffffffffa047d740>] radeon_device_init+0x560/0x600 [radeon] [<ffffffffa047ef4f>] radeon_driver_load_kms+0xaf/0x170 [radeon] [<ffffffffa043cdde>] drm_get_pci_dev+0x18e/0x2c0 [drm] [<ffffffffa04e7e95>] radeon_pci_probe+0xad/0xb5 [radeon] [<ffffffff81296c5f>] local_pci_probe+0x5f/0xd0 [<ffffffff81297418>] pci_device_probe+0x88/0xb0 [<ffffffff813417aa>] ? driver_sysfs_add+0x7a/0xb0 [<ffffffff813418d8>] really_probe+0x68/0x180 [<ffffffff81341be5>] driver_probe_device+0x45/0x70 [<ffffffff81341cb3>] __driver_attach+0xa3/0xb0 [<ffffffff81341c10>] ? driver_probe_device+0x70/0x70 [<ffffffff813400ce>] bus_for_each_dev+0x5e/0x90 [<ffffffff8134172e>] driver_attach+0x1e/0x20 [<ffffffff81341298>] bus_add_driver+0xc8/0x280 [<ffffffff813422c6>] driver_register+0x76/0x140 [<ffffffff812976d6>] __pci_register_driver+0x66/0xe0 [<ffffffffa043d021>] drm_pci_init+0x111/0x120 [drm] [<ffffffff8133c67a>] ? vga_switcheroo_register_handler+0x3a/0x60 [<ffffffffa0229000>] ? 0xffffffffa0228fff [<ffffffffa02290ec>] radeon_init+0xec/0xee [radeon] [<ffffffff810002f2>] do_one_initcall+0x42/0x180 [<ffffffff8109d8d2>] sys_init_module+0x92/0x1e0 [<ffffffff815407a9>] system_call_fastpath+0x16/0x1b Code: 58 2a 43 50 88 43 4e 48 83 c4 08 5b c9 c3 66 90 e8 cb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP <ffff8800aa72db00> CR2: ffff8800a4244000 ---[ end trace fcffa1599cf56382 ]--- Call to acpi_evaluate_object() not always returns 4096 bytes chunks, on my system it can return 2048 bytes chunk, so pass the length of retrieved chunk to memcpy(), not the length of the recieving buffer. Signed-off-by: Igor Murzov <e-mail@date.by> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
hknkkn
pushed a commit
to hknkkn/linux-dynticks
that referenced
this pull request
Oct 29, 2012
Printing the "start_ip" for every secondary cpu is very noisy on a large
system - and doesn't add any value. Drop this message.
Console log before:
Booting Node 0, Processors #1
smpboot cpu 1: start_ip = 96000
#2
smpboot cpu 2: start_ip = 96000
#3
smpboot cpu 3: start_ip = 96000
#4
smpboot cpu 4: start_ip = 96000
...
torvalds#31
smpboot cpu 31: start_ip = 96000
Brought up 32 CPUs
Console log after:
Booting Node 0, Processors #1 #2 #3 #4 #5 torvalds#6 torvalds#7 Ok.
Booting Node 1, Processors torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 Ok.
Booting Node 0, Processors torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 Ok.
Booting Node 1, Processors torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31
Brought up 32 CPUs
Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4f452eb42507460426@agluck-desktop.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
koenkooi
pushed a commit
to koenkooi/linux
that referenced
this pull request
Oct 31, 2012
…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
koenkooi
pushed a commit
to koenkooi/linux
that referenced
this pull request
Oct 31, 2012
commit a3f83ab upstream. At a boot time I observed following bug: BUG: unable to handle kernel paging request at ffff8800a4244000 IP: [<ffffffff81275b5b>] memcpy+0xb/0x120 PGD 1816063 PUD 1fe7d067 PMD 1ff9f067 PTE 80000000a4244160 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: btusb bluetooth brcmsmac brcmutil crc8 cordic b43 radeon(+) mac80211 cfg80211 ttm ohci_hcd drm_kms_helper rfkill drm ssb agpgart mmc_core sp5100_tco video battery ac thermal processor rtc_cmos thermal_sys snd_hda_codec_hdmi joydev snd_hda_codec_conexant button bcma pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm shpchp pcmcia_core k8temp snd_timer atl1c snd psmouse hwmon i2c_piix4 i2c_algo_bit soundcore evdev i2c_core ehci_hcd sg serio_raw snd_page_alloc loop btrfs Pid: 1008, comm: modprobe Not tainted 3.3.0-rc1 torvalds#21 LENOVO 20046 /AMD CRB RIP: 0010:[<ffffffff81275b5b>] [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP: 0018:ffff8800aa72db00 EFLAGS: 00010246 RAX: ffff8800a4150000 RBX: 0000000000001000 RCX: 0000000000000087 RDX: 0000000000000000 RSI: ffff8800a4244000 RDI: ffff8800a4150bc8 RBP: ffff8800aa72db78 R08: 0000000000000010 R09: ffffffff8174bbec R10: ffffffff812ee010 R11: 0000000000000001 R12: 0000000000001000 R13: 0000000000010000 R14: ffff8800a4140000 R15: ffff8800aaba1800 FS: 00007ff9a3bd4720(0000) GS:ffff8800afa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8800a4244000 CR3: 00000000a9c18000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1008, threadinfo ffff8800aa72c000, task ffff8800aa0e4000) Stack: ffffffffa04e7c7b 0000000000000001 0000000000010000 ffff8800aa72db28 ffffffff00000001 0000000000001000 ffffffff8113cbef 0000000000000020 ffff8800a4243420 ffff880000000002 ffff8800aa72db08 ffff8800a9d42000 Call Trace: [<ffffffffa04e7c7b>] ? radeon_atrm_get_bios_chunk+0x8b/0xd0 [radeon] [<ffffffff8113cbef>] ? kmalloc_order_trace+0x3f/0xb0 [<ffffffffa04a9298>] radeon_get_bios+0x68/0x2f0 [radeon] [<ffffffffa04c7a30>] rv770_init+0x40/0x280 [radeon] [<ffffffffa047d740>] radeon_device_init+0x560/0x600 [radeon] [<ffffffffa047ef4f>] radeon_driver_load_kms+0xaf/0x170 [radeon] [<ffffffffa043cdde>] drm_get_pci_dev+0x18e/0x2c0 [drm] [<ffffffffa04e7e95>] radeon_pci_probe+0xad/0xb5 [radeon] [<ffffffff81296c5f>] local_pci_probe+0x5f/0xd0 [<ffffffff81297418>] pci_device_probe+0x88/0xb0 [<ffffffff813417aa>] ? driver_sysfs_add+0x7a/0xb0 [<ffffffff813418d8>] really_probe+0x68/0x180 [<ffffffff81341be5>] driver_probe_device+0x45/0x70 [<ffffffff81341cb3>] __driver_attach+0xa3/0xb0 [<ffffffff81341c10>] ? driver_probe_device+0x70/0x70 [<ffffffff813400ce>] bus_for_each_dev+0x5e/0x90 [<ffffffff8134172e>] driver_attach+0x1e/0x20 [<ffffffff81341298>] bus_add_driver+0xc8/0x280 [<ffffffff813422c6>] driver_register+0x76/0x140 [<ffffffff812976d6>] __pci_register_driver+0x66/0xe0 [<ffffffffa043d021>] drm_pci_init+0x111/0x120 [drm] [<ffffffff8133c67a>] ? vga_switcheroo_register_handler+0x3a/0x60 [<ffffffffa0229000>] ? 0xffffffffa0228fff [<ffffffffa02290ec>] radeon_init+0xec/0xee [radeon] [<ffffffff810002f2>] do_one_initcall+0x42/0x180 [<ffffffff8109d8d2>] sys_init_module+0x92/0x1e0 [<ffffffff815407a9>] system_call_fastpath+0x16/0x1b Code: 58 2a 43 50 88 43 4e 48 83 c4 08 5b c9 c3 66 90 e8 cb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP <ffff8800aa72db00> CR2: ffff8800a4244000 ---[ end trace fcffa1599cf56382 ]--- Call to acpi_evaluate_object() not always returns 4096 bytes chunks, on my system it can return 2048 bytes chunk, so pass the length of retrieved chunk to memcpy(), not the length of the recieving buffer. Signed-off-by: Igor Murzov <e-mail@date.by> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
vineetgarc
referenced
this pull request
in foss-for-synopsys-dwc-arc-processors/linux
Oct 31, 2012
request_irq() for TIMER0 failing on CPU1
Ideally we want to use the request_percpu_irq( ) / enable_percpu_irq()
calls from GENERIC_IRQ framework, however that seems to be faltering
even on the boot cpu at the time of first interrupt.
Until that is resolved (with Thomas G), we need to pretend that
TIMER0 is IRQF_SHARED. This also requires yet another hack of explicitly
unmasking the IRQ on that CPU.
Query sent to Thomas Gleixner
======================>8====================================
In a SMP setup, each ARC700 CPU has a in-core TIMER, hooked up to
private IRQ 3 of respective CPU and would serve as the local
clock_event_device.
request_irq( ) for my first CPU which succeeds, looks roughly as
follows:
void __cpuinit arc_clockevent_init(void)
{
int rc;
unsigned int cpu = smp_processor_id();
struct clock_event_device *evt = &per_cpu(arc_clockevent_device,
cpu);
....
rc = request_irq(TIMER0_INT, timer_irq_handler,
IRQF_TIMER | IRQF_DISABLED | IRQF_PERCPU,
"Timer0 (clock-evt-dev)", evt);
....
The exact same call, when done from 2nd CPU fails, as it wants to see
IRQF_SHARED which is semantically not correct, since IRQ is not really
shared, it is a private instance (albeit same value), per cpu.
I figured that the right APIs for our case is the pair:
(request|enable)_percpu_irq to be called for both CPUs, with a prior one
time call to irq_set_percpu_devid(). Is that correct?
Assuming it is, the trouble now is that, even on the first CPU,
handle_level_irq( ) is bailing out w/o calling handle_irq_event()
because irqd_irq_disabled( ) is true. This in turn happens because,
irq_set_percpu_devid(), our much needed init routine, sets IRQ_NOAUTOEN
causing __setup_irq( ) to skip calling irq_startup() => irq_enable()
which would have cleared IRQD_IRQ_DISABLED.
While enable_percpu_irq( ), could have fixed this, it only seems to be
unmasking IRQ at device level, it is not clearing the above flag.
I tried calling enable_irq( ) right after, but that doesn't seem to help
either.
What API am I missing here, to enable the irqd machinery, or am I seeing
a bug where enable_percpu_irq( ) call-chain should somehow be doing it.
======================>8====================================
This needs to be reverted and replaced with right calls once ThomasG
responds to my query.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
vineetgarc
referenced
this pull request
in foss-for-synopsys-dwc-arc-processors/linux
Oct 31, 2012
…nt Timer" This reverts commit 2985184. Next commit uses the correct APIs, so we no longer need this hack Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
jadonk
pushed a commit
to jadonk/linux
that referenced
this pull request
Nov 13, 2012
At a boot time I observed following bug: BUG: unable to handle kernel paging request at ffff8800a4244000 IP: [<ffffffff81275b5b>] memcpy+0xb/0x120 PGD 1816063 PUD 1fe7d067 PMD 1ff9f067 PTE 80000000a4244160 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: btusb bluetooth brcmsmac brcmutil crc8 cordic b43 radeon(+) mac80211 cfg80211 ttm ohci_hcd drm_kms_helper rfkill drm ssb agpgart mmc_core sp5100_tco video battery ac thermal processor rtc_cmos thermal_sys snd_hda_codec_hdmi joydev snd_hda_codec_conexant button bcma pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm shpchp pcmcia_core k8temp snd_timer atl1c snd psmouse hwmon i2c_piix4 i2c_algo_bit soundcore evdev i2c_core ehci_hcd sg serio_raw snd_page_alloc loop btrfs Pid: 1008, comm: modprobe Not tainted 3.3.0-rc1 torvalds#21 LENOVO 20046 /AMD CRB RIP: 0010:[<ffffffff81275b5b>] [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP: 0018:ffff8800aa72db00 EFLAGS: 00010246 RAX: ffff8800a4150000 RBX: 0000000000001000 RCX: 0000000000000087 RDX: 0000000000000000 RSI: ffff8800a4244000 RDI: ffff8800a4150bc8 RBP: ffff8800aa72db78 R08: 0000000000000010 R09: ffffffff8174bbec R10: ffffffff812ee010 R11: 0000000000000001 R12: 0000000000001000 R13: 0000000000010000 R14: ffff8800a4140000 R15: ffff8800aaba1800 FS: 00007ff9a3bd4720(0000) GS:ffff8800afa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8800a4244000 CR3: 00000000a9c18000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1008, threadinfo ffff8800aa72c000, task ffff8800aa0e4000) Stack: ffffffffa04e7c7b 0000000000000001 0000000000010000 ffff8800aa72db28 ffffffff00000001 0000000000001000 ffffffff8113cbef 0000000000000020 ffff8800a4243420 ffff880000000002 ffff8800aa72db08 ffff8800a9d42000 Call Trace: [<ffffffffa04e7c7b>] ? radeon_atrm_get_bios_chunk+0x8b/0xd0 [radeon] [<ffffffff8113cbef>] ? kmalloc_order_trace+0x3f/0xb0 [<ffffffffa04a9298>] radeon_get_bios+0x68/0x2f0 [radeon] [<ffffffffa04c7a30>] rv770_init+0x40/0x280 [radeon] [<ffffffffa047d740>] radeon_device_init+0x560/0x600 [radeon] [<ffffffffa047ef4f>] radeon_driver_load_kms+0xaf/0x170 [radeon] [<ffffffffa043cdde>] drm_get_pci_dev+0x18e/0x2c0 [drm] [<ffffffffa04e7e95>] radeon_pci_probe+0xad/0xb5 [radeon] [<ffffffff81296c5f>] local_pci_probe+0x5f/0xd0 [<ffffffff81297418>] pci_device_probe+0x88/0xb0 [<ffffffff813417aa>] ? driver_sysfs_add+0x7a/0xb0 [<ffffffff813418d8>] really_probe+0x68/0x180 [<ffffffff81341be5>] driver_probe_device+0x45/0x70 [<ffffffff81341cb3>] __driver_attach+0xa3/0xb0 [<ffffffff81341c10>] ? driver_probe_device+0x70/0x70 [<ffffffff813400ce>] bus_for_each_dev+0x5e/0x90 [<ffffffff8134172e>] driver_attach+0x1e/0x20 [<ffffffff81341298>] bus_add_driver+0xc8/0x280 [<ffffffff813422c6>] driver_register+0x76/0x140 [<ffffffff812976d6>] __pci_register_driver+0x66/0xe0 [<ffffffffa043d021>] drm_pci_init+0x111/0x120 [drm] [<ffffffff8133c67a>] ? vga_switcheroo_register_handler+0x3a/0x60 [<ffffffffa0229000>] ? 0xffffffffa0228fff [<ffffffffa02290ec>] radeon_init+0xec/0xee [radeon] [<ffffffff810002f2>] do_one_initcall+0x42/0x180 [<ffffffff8109d8d2>] sys_init_module+0x92/0x1e0 [<ffffffff815407a9>] system_call_fastpath+0x16/0x1b Code: 58 2a 43 50 88 43 4e 48 83 c4 08 5b c9 c3 66 90 e8 cb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP <ffff8800aa72db00> CR2: ffff8800a4244000 ---[ end trace fcffa1599cf56382 ]--- Call to acpi_evaluate_object() not always returns 4096 bytes chunks, on my system it can return 2048 bytes chunk, so pass the length of retrieved chunk to memcpy(), not the length of the recieving buffer. Signed-off-by: Igor Murzov <e-mail@date.by> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
jadonk
pushed a commit
to jadonk/linux
that referenced
this pull request
Nov 13, 2012
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not update the real num tx queues. netdev_queue_update_kobjects() is already called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when upper layer driver, e.g., FCoE protocol stack is monitoring the netdev event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove extra queues allocated for FCoE, the associated txq sysfs kobjects are already removed, and trying to update the real num queues would cause something like below: ... PID: 25138 TASK: ffff88021e64c440 CPU: 3 COMMAND: "kworker/3:3" #0 [ffff88021f007760] machine_kexec at ffffffff810226d9 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d #2 [ffff88021f0078a0] oops_end at ffffffff813bca78 #3 [ffff88021f0078d0] no_context at ffffffff81029e72 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045 [exception RIP: sysfs_find_dirent+17] RIP: ffffffff81178611 RSP: ffff88021f007bc0 RFLAGS: 00010246 RAX: ffff88021e64c440 RBX: ffffffff8156cc63 RCX: 0000000000000004 RDX: ffffffff8156cc63 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88021f007be0 R8: 0000000000000004 R9: 0000000000000008 R10: ffffffff816fed00 R11: 0000000000000004 R12: 0000000000000000 R13: ffffffff8156cc63 R14: 0000000000000000 R15: ffff8802222a0000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27 torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9 torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38 torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe] torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe] torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe] torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q] torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe] torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe] torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513 torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6 torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4 Signed-off-by: Yi Zou <yi.zou@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Oct 21, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Oct 21, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
intel-lab-lkp
pushed a commit
to intel-lab-lkp/linux
that referenced
this pull request
Oct 22, 2025
… 'T'
When perf report with annotation for a symbol, press 's' and 'T', then exit
the annotate browser. Once annotate the same symbol, the annotate browser
will crash.
The browser.arch was required to be correctly updated when data type
feature was enabled by 'T'. Usually it was initialized by symbol__annotate2
function. If a symbol has already been correctly annotated at the first
time, it should not call the symbol__annotate2 function again, thus the
browser.arch will not get initialized. Then at the second time to show the
annotate browser, the data type needs to be displayed but the browser.arch
is empty.
Stack trace as below:
Perf: Segmentation fault
-------- backtrace --------
#0 0x55d365 in ui__signal_backtrace setup.c:0
#1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930]
#2 0x570f08 in arch__is perf[570f08]
#3 0x562186 in annotate_get_insn_location perf[562186]
#4 0x562626 in __hist_entry__get_data_type annotate.c:0
#5 0x56476d in annotation_line__write perf[56476d]
torvalds#6 0x54e2db in annotate_browser__write annotate.c:0
torvalds#7 0x54d061 in ui_browser__list_head_refresh perf[54d061]
torvalds#8 0x54dc9e in annotate_browser__refresh annotate.c:0
torvalds#9 0x54c03d in __ui_browser__refresh browser.c:0
torvalds#10 0x54ccf8 in ui_browser__run perf[54ccf8]
torvalds#11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92]
torvalds#12 0x552293 in do_annotate hists.c:0
torvalds#13 0x55941c in evsel__hists_browse hists.c:0
torvalds#14 0x55b00f in evlist__tui_browse_hists perf[55b00f]
torvalds#15 0x42ff02 in cmd_report perf[42ff02]
torvalds#16 0x494008 in run_builtin perf.c:0
torvalds#17 0x494305 in handle_internal_command perf.c:0
torvalds#18 0x410547 in main perf[410547]
torvalds#19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0]
torvalds#20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680]
torvalds#21 0x410b75 in _start perf[410b75]
Fixes: 1d4374a ("perf annotate: Add 'T' hot key to toggle data type display")
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
intel-lab-lkp
pushed a commit
to intel-lab-lkp/linux
that referenced
this pull request
Oct 22, 2025
The following command will hang consistently when the GPU is being used due to non regular files (e.g. /dev/dri/renderD129, /dev/dri/card2) being opened to read build IDs: $ perf record -asdg -e cpu-clock -- true Change to non blocking reads to avoid the hang here: #0 __libc_pread64 (offset=<optimised out>, count=0, buf=0x7fffffffa4a0, fd=237) at ../sysdeps/unix/sysv/linux/pread64.c:25 #1 __libc_pread64 (fd=237, buf=0x7fffffffa4a0, count=0, offset=0) at ../sysdeps/unix/sysv/linux/pread64.c:23 #2 ?? () from /lib/x86_64-linux-gnu/libelf.so.1 #3 read_build_id (filename=0x5555562df333 "/dev/dri/card2", bid=0x7fffffffb680, block=true) at util/symbol-elf.c:880 #4 filename__read_build_id (filename=0x5555562df333 "/dev/dri/card2", bid=0x7fffffffb680, block=true) at util/symbol-elf.c:924 #5 dsos__read_build_ids_cb (dso=0x5555562df1d0, data=0x7fffffffb750) at util/dsos.c:84 torvalds#6 __dsos__for_each_dso (dsos=0x55555623de68, cb=0x5555557e7030 <dsos__read_build_ids_cb>, data=0x7fffffffb750) at util/dsos.c:59 torvalds#7 dsos__for_each_dso (dsos=0x55555623de68, cb=0x5555557e7030 <dsos__read_build_ids_cb>, data=0x7fffffffb750) at util/dsos.c:503 torvalds#8 dsos__read_build_ids (dsos=0x55555623de68, with_hits=true) at util/dsos.c:107 torvalds#9 machine__read_build_ids (machine=0x55555623da58, with_hits=true) at util/build-id.c:950 torvalds#10 perf_session__read_build_ids (session=0x55555623d840, with_hits=true) at util/build-id.c:956 torvalds#11 write_build_id (ff=0x7fffffffb958, evlist=0x5555562323d0) at util/header.c:327 torvalds#12 do_write_feat (ff=0x7fffffffb958, type=2, p=0x7fffffffb950, evlist=0x5555562323d0, fc=0x0) at util/header.c:3588 torvalds#13 perf_header__adds_write (header=0x55555623d840, evlist=0x5555562323d0, fd=3, fc=0x0) at util/header.c:3632 torvalds#14 perf_session__do_write_header (session=0x55555623d840, evlist=0x5555562323d0, fd=3, at_exit=true, fc=0x0, write_attrs_after_data=false) at util/header.c:3756 torvalds#15 perf_session__write_header (session=0x55555623d840, evlist=0x5555562323d0, fd=3, at_exit=true) at util/header.c:3796 torvalds#16 record__finish_output (rec=0x5555561838d8 <record>) at builtin-record.c:1899 torvalds#17 __cmd_record (rec=0x5555561838d8 <record>, argc=2, argv=0x7fffffffe320) at builtin-record.c:2967 torvalds#18 cmd_record (argc=2, argv=0x7fffffffe320) at builtin-record.c:4453 torvalds#19 run_builtin (p=0x55555618cbb0 <commands+288>, argc=9, argv=0x7fffffffe320) at perf.c:349 torvalds#20 handle_internal_command (argc=9, argv=0x7fffffffe320) at perf.c:401 torvalds#21 run_argv (argcp=0x7fffffffe16c, argv=0x7fffffffe160) at perf.c:445 torvalds#22 main (argc=9, argv=0x7fffffffe320) at perf.c:553 Fixes: 53b00ff ("perf record: Make --buildid-mmap the default") Signed-off-by: James Clark <james.clark@linaro.org>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Oct 27, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Oct 27, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Oct 27, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Oct 27, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Oct 27, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Oct 28, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Oct 30, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Nov 3, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Nov 5, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Nov 5, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Nov 7, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
mwilczy
pushed a commit
to mwilczy/linux
that referenced
this pull request
Nov 10, 2025
This series enables the display subsystem on the StarFive JH7110 SoC.
This hardware has a complex set of dependencies that this series aims to
solve.
The dom_vout (Video Output) block is a wrapper containing the display
controller (dc8200), the clock generator (voutcrg), and the HDMI IP, all
of which are managed by a single power domain (PD_VOUT).
More importantly, the HDMI IP is a monolithic block (controller and PHY
in one register space) that has a circular dependency with voutcrg:
1. The HDMI Controller needs clocks (like sysclk, mclk) from voutcrg to
function.
2. The voutcrg (for its pixel MUXes) needs the variable pixel clock,
which is generated by the HDMI PHY.
This series breaks this dependency loop by modeling the hardware
correctly:
1. A new vout-subsystem wrapper driver is added. It manages the shared
PD_VOUT power domain and top level bus clocks. It uses
of_platform_populate() to ensure its children (hdmi_mfd, voutcrg,
dc8200) are probed only after power is on.
2. The monolithic hdmi node is refactored into an MFD. A new hdmi-mfd
parent driver is added, which maps the shared register space and
creates a regmap.
3. The MFD populates two children:
- hdmi-phy: A new PHY driver that binds to the MFD. Its only
dependency is the xin24m reference clock. It acts as the clock
provider for the variable pixel clock (hdmi_pclk).
- hdmi-controller: A new DRM bridge driver. It consumes clocks from
voutcrg and the hdmi_pclk/PHY from its sibling hdmi-phy driver.
4. The generic inno-hdmi bridge library is refactored to accept a regmap
from a parent MFD, making this model possible.
This MFD split breaks the circular dependency, as the kernel's deferred
probe can now find a correct, linear probe order: hdmi-phy (probes
first) -> voutcrg (probes second) -> hdmi-controller (probes third).
This series provides all the necessary dt-bindings, the new drivers, the
modification to inno-hdmi, and the final device tree changes to enable
the display.
Series depends on patchsets that are not merged yet:
- dc8200 driver [1]
- th1520 reset (dependency of dc8200 series) [2]
- inno-hdmi bridge [3]
Testing:
I've tested on my monitor using `modetest` for following modes:
#0 2560x1440 59.95 2560 2608 2640 2720 1440 1443 1448 1481 241500
flags: phsync, nvsync; type: preferred, driver [DOESN"T WORK]
#1 2048x1080 60.00 2048 2096 2128 2208 1080 1083 1093 1111 147180
flags: phsync, nvsync; type: driver [DOESN"T WORK]
#2 2048x1080 24.00 2048 2096 2128 2208 1080 1083 1093 1099 58230
flags: phsync, nvsync; type: driver [DOESN'T WORK]
#3 1920x1080 60.00 1920 2008 2052 2200 1080 1084 1089 1125 148500
flags: phsync, pvsync; type: driver [WORKS]
#4 1920x1080 59.94 1920 2008 2052 2200 1080 1084 1089 1125 148352
flags: phsync, pvsync; type: driver [WORKS]
#5 1920x1080 50.00 1920 2448 2492 2640 1080 1084 1089 1125 148500
flags: phsync, pvsync; type: driver [WORKS]
torvalds#6 1600x1200 60.00 1600 1664 1856 2160 1200 1201 1204 1250 162000
flags: phsync, pvsync; type: driver [WORKS]
torvalds#7 1280x1024 75.02 1280 1296 1440 1688 1024 1025 1028 1066 135000
flags: phsync, pvsync; type: driver [WORKS]
torvalds#8 1280x1024 60.02 1280 1328 1440 1688 1024 1025 1028 1066 108000
flags: phsync, pvsync; type: driver [WORKS]
torvalds#9 1152x864 75.00 1152 1216 1344 1600 864 865 868 900 108000 flags:
phsync, pvsync; type: driver [WORKS]
torvalds#10 1280x720 60.00 1280 1390 1430 1650 720 725 730 750 74250 flags:
phsync, pvsync; type: driver [WORKS]
torvalds#11 1280x720 59.94 1280 1390 1430 1650 720 725 730 750 74176 flags:
phsync, pvsync; type: driver [WORKS]
torvalds#12 1280x720 50.00 1280 1720 1760 1980 720 725 730 750 74250 flags:
phsync, pvsync; type: driver [WORKS]
torvalds#13 1024x768 75.03 1024 1040 1136 1312 768 769 772 800 78750 flags:
phsync, pvsync; type: driver [WORKS]
torvalds#14 1024x768 60.00 1024 1048 1184 1344 768 771 777 806 65000 flags:
nhsync, nvsync; type: driver [WORKS]
torvalds#15 800x600 75.00 800 816 896 1056 600 601 604 625 49500 flags:
phsync, pvsync; type: driver [WORKS]
torvalds#16 800x600 60.32 800 840 968 1056 600 601 605 628 40000 flags:
phsync, pvsync; type: driver [WORKS]
torvalds#17 720x576 50.00 720 732 796 864 576 581 586 625 27000 flags: nhsync,
nvsync; type: driver [WORKS]
torvalds#18 720x480 60.00 720 736 798 858 480 489 495 525 27027 flags: nhsync,
nvsync; type: driver [WORKS]
torvalds#19 720x480 59.94 720 736 798 858 480 489 495 525 27000 flags: nhsync,
nvsync; type: driver [WORKS]
torvalds#20 640x480 75.00 640 656 720 840 480 481 484 500 31500 flags: nhsync,
nvsync; type: driver [WORKS]
torvalds#21 640x480 60.00 640 656 752 800 480 490 492 525 25200 flags: nhsync,
nvsync; type: driver [WORKS]
torvalds#22 640x480 59.94 640 656 752 800 480 490 492 525 25175 flags: nhsync,
nvsync; type: driver [WORKS]
torvalds#23 720x400 70.08 720 738 846 900 400 412 414 449 28320 flags: nhsync,
pvsync; type: driver [DOESN'T WORK]
I believe this is a PHY tuning issue that can be fixed in the new
phy-jh7110-inno-hdmi.c driver without changing the overall architecture.
I plan to continue debugging these modes and will submit follow up fixes
as needed.
The core architectural plumbing is sound and ready for review.
Notes:
- The JH7110 does not have a centralized MAINTAINERS entry like the
TH1520, and driver maintainership seems fragmented. I have therefore
added a MAINTAINERS entry for the display subsystem and am willing to
help with its maintenance.
- I am aware that the new phy-jh7110-inno-hdmi.c driver (patch 12) is a
near duplicate of the existing phy-rockchip-inno-hdmi.c. This
duplication is intentional and temporary for this RFC series. My goal
is to first get feedback on the overall architecture (the vout-subsystem
wrapper, the hdmi-mfd split, and the dual-function PHY/CLK driver).
If this architectural approach is acceptable, I will rework the PHY
driver for a formal v1 submission. This will involve refactoring the
common logic from the Rockchip PHY into a generic core driver that both
the Rockchip and this new StarFive PHY driver will use.
Many thanks to the Icenowy Zheng who developed a dc8200 driver, as well
as helped me understand how the SoC and the display pipeline works.
[1] - https://lore.kernel.org/all/20250921083446.790374-1-uwu@icenowy.me/
[2] - https://lore.kernel.org/all/20251014131032.49616-1-ziyao@disroot.org/
[3] - https://lore.kernel.org/all/20251016083843.76675-1-andyshrk@163.com/
# Describe the purpose of this series. The information you put here
# will be used by the project maintainer to make a decision whether
# your patches should be reviewed, and in what priority order. Please be
# very detailed and link to any relevant discussions or sites that the
# maintainer can review to better understand your proposed changes. If you
# only have a single patch in your series, the contents of the cover
# letter will be appended to the "under-the-cut" portion of the patch.
# Lines starting with # will be removed from the cover letter. You can
# use them to add notes or reminders to yourself. If you want to use
# markdown headers in your cover letter, start the line with ">#".
# You can add trailers to the cover letter. Any email addresses found in
# these trailers will be added to the addresses specified/generated
# during the b4 send stage. You can also run "b4 prep --auto-to-cc" to
# auto-populate the To: and Cc: trailers based on the code being
# modified.
To: Michal Wilczynski <m.wilczynski@samsung.com>
To: Conor Dooley <conor@kernel.org>
To: Rob Herring <robh@kernel.org>
To: Krzysztof Kozlowski <krzk+dt@kernel.org>
To: Emil Renner Berthing <kernel@esmil.dk>
To: Hal Feng <hal.feng@starfivetech.com>
To: Michael Turquette <mturquette@baylibre.com>
To: Stephen Boyd <sboyd@kernel.org>
To: Conor Dooley <conor+dt@kernel.org>
To: Xingyu Wu <xingyu.wu@starfivetech.com>
To: Vinod Koul <vkoul@kernel.org>
To: Kishon Vijay Abraham I <kishon@kernel.org>
To: Andrzej Hajda <andrzej.hajda@intel.com>
To: Neil Armstrong <neil.armstrong@linaro.org>
To: Robert Foss <rfoss@kernel.org>
To: Laurent Pinchart <Laurent.pinchart@ideasonboard.com>
To: Jonas Karlman <jonas@kwiboo.se>
To: Jernej Skrabec <jernej.skrabec@gmail.com>
To: David Airlie <airlied@gmail.com>
To: Simona Vetter <simona@ffwll.ch>
To: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
To: Maxime Ripard <mripard@kernel.org>
To: Thomas Zimmermann <tzimmermann@suse.de>
To: Lee Jones <lee@kernel.org>
To: Philipp Zabel <p.zabel@pengutronix.de>
To: Paul Walmsley <paul.walmsley@sifive.com>
To: Palmer Dabbelt <palmer@dabbelt.com>
To: Albert Ou <aou@eecs.berkeley.edu>
To: Alexandre Ghiti <alex@ghiti.fr>
To: Marek Szyprowski <m.szyprowski@samsung.com>
To: Icenowy Zheng <uwu@icenowy.me>
To: Maud Spierings <maudspierings@gocontroll.com>
To: Andy Yan <andyshrk@163.com>
To: Heiko Stuebner <heiko@sntech.de>
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-clk@vger.kernel.org
Cc: linux-phy@lists.infradead.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-riscv@lists.infradead.org
---
Changes in v2:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v1: https://lore.kernel.org/r/20251108-jh7110-clean-send-v1-0-06bf43bb76b1@samsung.com
--- b4-submit-tracking ---
# This section is used internally by b4 prep for tracking purposes.
{
"series": {
"revision": 2,
"change-id": "20251031-jh7110-clean-send-7d2242118026",
"prefixes": [
"RFC"
],
"prerequisites": [
"message-id: <20251014131032.49616-1-ziyao@disroot.org>",
"message-id: <20251016083843.76675-1-andyshrk@163.com>",
"message-id: <20250921083446.790374-1-uwu@icenowy.me>",
"base-commit: v6.17-rc6"
],
"history": {
"v1": [
"20251108-jh7110-clean-send-v1-0-06bf43bb76b1@samsung.com"
]
}
}
}
mj22226
pushed a commit
to mj22226/linux
that referenced
this pull request
Nov 13, 2025
The CMT device can be used as both a clocksource and a clockevent
provider. The driver tries to be smart and power itself on and off, as
well as enabling and disabling its clock when it's not in operation.
This behavior is slightly altered if the CMT is used as an early
platform device in which case the device is left powered on after probe,
but the clock is still enabled and disabled at runtime.
This has worked for a long time, but recent improvements in PREEMPT_RT
and PROVE_LOCKING have highlighted an issue. As the CMT registers itself
as a clockevent provider, clockevents_register_device(), it needs to use
raw spinlocks internally as this is the context of which the clockevent
framework interacts with the CMT driver. However in the context of
holding a raw spinlock the CMT driver can't really manage its power
state or clock with calls to pm_runtime_*() and clk_*() as these calls
end up in other platform drivers using regular spinlocks to control
power and clocks.
This mix of spinlock contexts trips a lockdep warning.
=============================
[ BUG: Invalid wait context ]
6.17.0-rc3-arm64-renesas-03071-gb3c4f4122b28-dirty torvalds#21 Not tainted
-----------------------------
swapper/1/0 is trying to lock:
ffff00000898d180 (&dev->power.lock){-...}-{3:3}, at: __pm_runtime_resume+0x38/0x88
ccree e6601000.crypto: ARM CryptoCell 630P Driver: HW version 0xAF400001/0xDCC63000, Driver version 5.0
other info that might help us debug this:
ccree e6601000.crypto: ARM ccree device initialized
context-{5:5}
2 locks held by swapper/1/0:
#0: ffff80008173c298 (tick_broadcast_lock){-...}-{2:2}, at: __tick_broadcast_oneshot_control+0xa4/0x3a8
#1: ffff0000089a5858 (&ch->lock){....}-{2:2}
usbcore: registered new interface driver usbhid
, at: sh_cmt_start+0x30/0x364
stack backtrace:
CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.17.0-rc3-arm64-renesas-03071-gb3c4f4122b28-dirty torvalds#21 PREEMPT
Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
Call trace:
show_stack+0x14/0x1c (C)
dump_stack_lvl+0x6c/0x90
dump_stack+0x14/0x1c
__lock_acquire+0x904/0x1584
lock_acquire+0x220/0x34c
_raw_spin_lock_irqsave+0x58/0x80
__pm_runtime_resume+0x38/0x88
sh_cmt_start+0x54/0x364
sh_cmt_clock_event_set_oneshot+0x64/0xb8
clockevents_switch_state+0xfc/0x13c
tick_broadcast_set_event+0x30/0xa4
__tick_broadcast_oneshot_control+0x1e0/0x3a8
tick_broadcast_oneshot_control+0x30/0x40
cpuidle_enter_state+0x40c/0x680
cpuidle_enter+0x30/0x40
do_idle+0x1f4/0x26c
cpu_startup_entry+0x34/0x40
secondary_start_kernel+0x11c/0x13c
__secondary_switched+0x74/0x78
For non-PREEMPT_RT builds this is not really an issue, but for
PREEMPT_RT builds where normal spinlocks can sleep this might be an
issue. Be cautious and always leave the power and clock running after
probe.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251016182022.1837417-1-niklas.soderlund+renesas@ragnatech.se
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Nov 15, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Nov 16, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Nov 20, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Nov 22, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Nov 23, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Nov 23, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Nov 23, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Nov 25, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci
added a commit
to guidosarducci/linux
that referenced
this pull request
Nov 26, 2025
Sequence of ctx checks and rewrites in real world: 1. Possible CO-RE rewrites of access size & offset, maybe breaking #2 2. Verifier env->ops->is_valid_access(), testing access size == sizeof(u64) 3. Verifier env->ops->convert_ctx_access(), rewrite size & offset Position of access check above is strange, really only works on 64-bit and likely unnoticed for lack of systematic 32-bit testing. Test changing *is_valid_access() to always check size != sizeof(u64), but on 32-bit systems also check size != sizeof(u32) On 32-bit armhf, test_progs hits ~100 instances of failures such as: (NOTE: this results from CO-RE relocation patching to u32 load size) libbpf: prog 'change_tcp_cc': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int change_tcp_cc(struct bpf_iter__tcp *ctx) @ bpf_iter_setsockopt.c:40 0: (b4) w2 = 0 ; R2_w=0 ; if (!bpf_tcp_sk(ctx->sk_common)) @ bpf_iter_setsockopt.c:46 1: (61) r1 = *(u32 *)(r1 +8) func 'bpf_iter_tcp' size 4 must be 8 invalid bpf_context access off=8 size=4 is_valid_access=tracing_prog_is_valid_access processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'change_tcp_cc': failed to load: -EACCES libbpf: failed to load object 'bpf_iter_setsockopt' libbpf: failed to load BPF skeleton 'bpf_iter_setsockopt': -EACCES serial_test_bpf_iter_setsockopt:FAIL:iter_skel unexpected error: -13 torvalds#21 bpf_iter_setsockopt:FAIL (NOTE: no error for tests without CO-RE patching which have u64 load size) This means *_is_valid_access() can't check ctx pointers simply using: if (size != sizeof(__u64)) return false; or if (size != sizeof(void *) return false; And what's required instead is a combo like: if (size != sizeof(__u64) && size != sizeof(long)) return false; Implement above as convenience function and use in: - btf_ctx_access() - cg_sockopt_is_valid_access() - bpf_skb_is_valid_access() - sock_addr_is_valid_access() - sock_ops_is_valid_access() - sk_msg_is_valid_access() - flow_dissector_is_valid_access() This eliminates all 'invalid bpf_context access" errors on 32-bit armhf except one with nf_is_valid_access() which is fixed in the next patch. Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
mj22226
pushed a commit
to mj22226/linux
that referenced
this pull request
Nov 27, 2025
The CMT device can be used as both a clocksource and a clockevent
provider. The driver tries to be smart and power itself on and off, as
well as enabling and disabling its clock when it's not in operation.
This behavior is slightly altered if the CMT is used as an early
platform device in which case the device is left powered on after probe,
but the clock is still enabled and disabled at runtime.
This has worked for a long time, but recent improvements in PREEMPT_RT
and PROVE_LOCKING have highlighted an issue. As the CMT registers itself
as a clockevent provider, clockevents_register_device(), it needs to use
raw spinlocks internally as this is the context of which the clockevent
framework interacts with the CMT driver. However in the context of
holding a raw spinlock the CMT driver can't really manage its power
state or clock with calls to pm_runtime_*() and clk_*() as these calls
end up in other platform drivers using regular spinlocks to control
power and clocks.
This mix of spinlock contexts trips a lockdep warning.
=============================
[ BUG: Invalid wait context ]
6.17.0-rc3-arm64-renesas-03071-gb3c4f4122b28-dirty torvalds#21 Not tainted
-----------------------------
swapper/1/0 is trying to lock:
ffff00000898d180 (&dev->power.lock){-...}-{3:3}, at: __pm_runtime_resume+0x38/0x88
ccree e6601000.crypto: ARM CryptoCell 630P Driver: HW version 0xAF400001/0xDCC63000, Driver version 5.0
other info that might help us debug this:
ccree e6601000.crypto: ARM ccree device initialized
context-{5:5}
2 locks held by swapper/1/0:
#0: ffff80008173c298 (tick_broadcast_lock){-...}-{2:2}, at: __tick_broadcast_oneshot_control+0xa4/0x3a8
#1: ffff0000089a5858 (&ch->lock){....}-{2:2}
usbcore: registered new interface driver usbhid
, at: sh_cmt_start+0x30/0x364
stack backtrace:
CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.17.0-rc3-arm64-renesas-03071-gb3c4f4122b28-dirty torvalds#21 PREEMPT
Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
Call trace:
show_stack+0x14/0x1c (C)
dump_stack_lvl+0x6c/0x90
dump_stack+0x14/0x1c
__lock_acquire+0x904/0x1584
lock_acquire+0x220/0x34c
_raw_spin_lock_irqsave+0x58/0x80
__pm_runtime_resume+0x38/0x88
sh_cmt_start+0x54/0x364
sh_cmt_clock_event_set_oneshot+0x64/0xb8
clockevents_switch_state+0xfc/0x13c
tick_broadcast_set_event+0x30/0xa4
__tick_broadcast_oneshot_control+0x1e0/0x3a8
tick_broadcast_oneshot_control+0x30/0x40
cpuidle_enter_state+0x40c/0x680
cpuidle_enter+0x30/0x40
do_idle+0x1f4/0x26c
cpu_startup_entry+0x34/0x40
secondary_start_kernel+0x11c/0x13c
__secondary_switched+0x74/0x78
For non-PREEMPT_RT builds this is not really an issue, but for
PREEMPT_RT builds where normal spinlocks can sleep this might be an
issue. Be cautious and always leave the power and clock running after
probe.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251016182022.1837417-1-niklas.soderlund+renesas@ragnatech.se
ZXlieC
pushed a commit
to Xlie-Electronic-Customs/linux
that referenced
this pull request
Dec 7, 2025
… 'T'
When perf report with annotation for a symbol, press 's' and 'T', then exit
the annotate browser. Once annotate the same symbol, the annotate browser
will crash.
The browser.arch was required to be correctly updated when data type
feature was enabled by 'T'. Usually it was initialized by symbol__annotate2
function. If a symbol has already been correctly annotated at the first
time, it should not call the symbol__annotate2 function again, thus the
browser.arch will not get initialized. Then at the second time to show the
annotate browser, the data type needs to be displayed but the browser.arch
is empty.
Stack trace as below:
Perf: Segmentation fault
-------- backtrace --------
#0 0x55d365 in ui__signal_backtrace setup.c:0
#1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930]
#2 0x570f08 in arch__is perf[570f08]
#3 0x562186 in annotate_get_insn_location perf[562186]
#4 0x562626 in __hist_entry__get_data_type annotate.c:0
#5 0x56476d in annotation_line__write perf[56476d]
torvalds#6 0x54e2db in annotate_browser__write annotate.c:0
torvalds#7 0x54d061 in ui_browser__list_head_refresh perf[54d061]
torvalds#8 0x54dc9e in annotate_browser__refresh annotate.c:0
torvalds#9 0x54c03d in __ui_browser__refresh browser.c:0
torvalds#10 0x54ccf8 in ui_browser__run perf[54ccf8]
torvalds#11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92]
torvalds#12 0x552293 in do_annotate hists.c:0
torvalds#13 0x55941c in evsel__hists_browse hists.c:0
torvalds#14 0x55b00f in evlist__tui_browse_hists perf[55b00f]
torvalds#15 0x42ff02 in cmd_report perf[42ff02]
torvalds#16 0x494008 in run_builtin perf.c:0
torvalds#17 0x494305 in handle_internal_command perf.c:0
torvalds#18 0x410547 in main perf[410547]
torvalds#19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0]
torvalds#20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680]
torvalds#21 0x410b75 in _start perf[410b75]
Fixes: 1d4374a ("perf annotate: Add 'T' hot key to toggle data type display")
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
hbirth
pushed a commit
to hbirth/linux
that referenced
this pull request
Dec 10, 2025
fuse: change FUSE DLM_LOCK to request start and end of area
jbrun3t
added a commit
to jbrun3t/linux
that referenced
this pull request
Dec 22, 2025
This relocates register pokes of the HDMI VPU encoder out of the HDMI phy driver. As far as HDMI is concerned, the sequence in which the setup is done remains mostly the same. This was tested with modetest, cycling through the following resolutions: #0 3840x2160 60.00 #1 3840x2160 59.94 #2 3840x2160 50.00 #3 3840x2160 30.00 #4 3840x2160 29.97 #5 3840x2160 25.00 torvalds#6 3840x2160 24.00 torvalds#7 3840x2160 23.98 torvalds#8 1920x1080 60.00 torvalds#9 1920x1080 60.00 torvalds#10 1920x1080 59.94 torvalds#11 1920x1080i 30.00 torvalds#12 1920x1080i 29.97 torvalds#13 1920x1080 50.00 torvalds#14 1920x1080i 25.00 torvalds#15 1920x1080 30.00 torvalds#16 1920x1080 29.97 torvalds#17 1920x1080 25.00 torvalds#18 1920x1080 24.00 torvalds#19 1920x1080 23.98 torvalds#20 1280x1024 60.02 torvalds#21 1152x864 59.97 torvalds#22 1280x720 60.00 torvalds#23 1280x720 59.94 torvalds#24 1280x720 50.00 torvalds#25 1024x768 60.00 torvalds#26 800x600 60.32 torvalds#27 720x576 50.00 torvalds#28 720x480 59.94 No regression to report. This is part of an effort to clean up Amlogic HDMI related drivers which should eventually allow to stop using the component API and HHI syscon. Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.