Closed
Description
Describe the bug
Using v3d performance counters, e.g. from GALLIUM_HUD
leads to random kernel panics after a few runs.
Steps to reproduce the behaviour
- Get and build https://gitlab.freedesktop.org/mesa/kmscube/
- Set
GALLIUM_HUD
to show some performance counters. E.g like this:
GALLIUM_HUD_VISIBLE=false
GALLIUM_HUD=stdout
GALLIUM_HUD+=,fps
GALLIUM_HUD+=,frametime
GALLIUM_HUD+=,cpu
GALLIUM_HUD+=,samples-passed
GALLIUM_HUD+=,primitives-generated
GALLIUM_HUD+=,PTB-primitives-discarded-outside-viewport
GALLIUM_HUD+=,QPU-total-idle-clk-cycles
GALLIUM_HUD+=,QPU-total-active-clk-cycles-vertex-coord-shading
GALLIUM_HUD+=,QPU-total-active-clk-cycles-fragment-shading
GALLIUM_HUD+=,QPU-total-clk-cycles-executing-valid-instr
GALLIUM_HUD+=,QPU-total-clk-cycles-waiting-TMU
GALLIUM_HUD+=,QPU-total-clk-cycles-waiting-varyings
GALLIUM_HUD+=,QPU-total-instr-cache-hit
GALLIUM_HUD+=,QPU-total-instr-cache-miss
GALLIUM_HUD+=,TMU-total-text-quads-access
GALLIUM_HUD+=,TMU-total-text-cache-miss
GALLIUM_HUD+=,L2T-total-cache-hit
GALLIUM_HUD+=,L2T-total-cache-miss
GALLIUM_HUD+=,QPU-total-clk-cycles-waiting-vertex-coord-shading
GALLIUM_HUD+=,QPU-total-clk-cycles-waiting-fragment-shading
GALLIUM_HUD+=,TLB-partial-quads-written-to-color-buffer
GALLIUM_HUD+=,TMU-active-cycles
GALLIUM_HUD+=,TMU-stalled-cycles
GALLIUM_HUD+=,L2T-TMU-reads
GALLIUM_HUD+=,L2T-TMU-write-miss
GALLIUM_HUD+=,L2T-TMU-read-miss
GALLIUM_HUD+=,TMU-MRU-hits
./build/kmscube
- Run kmscube a few times for a few seconds each.
- After 3-4th time kernel will panic at some random and often unrelated location. I've seen stacks from drm+v3d, ext4, usb, etc. It's completely arbitrary.
Not sure if any specific counter, or combination of, is causing this. Using just a single counter doesn't seem to crash even after several tries. Enabling 10-20 counters crashes on a second run.
Device (s)
Raspberry Pi 4 Mod. B
System
Raspberry Pi reference 2024-07-04
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 0b115f302a8f1e5bd3523614d7f45b9d447434c7, stage2
Aug 30 2024 19:17:39
Copyright (c) 2012 Broadcom
version 2808975b80149bbfe86844655fe45c7de66fc078 (clean) (release) (start)
Linux lgpi00 6.6.47+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.47-1+rpt1 (2024-09-02) aarch64 GNU/Linux
Logs
Usually panic messages and stacks are completely unrelated, I've seen from usb and ext4 call stacks.
One that might be relevant:
[ 391.090878] ------------[ cut here ]------------
[ 391.090891] WARNING: CPU: 0 PID: 1188 at mm/slab_common.c:994 free_large_kmalloc+0x78/0xb8
[ 391.090916] Modules linked in: cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep brcmfmac_wcc brcmfmac vc4 brcmutil cfg80211 binfmt_misc snd_soc_hdmi_codec hci_uart uvcvideo drm_display_helper b
tbcm cec bluetooth uvc rpivid_hevc(C) bcm2835_codec(C) drm_dma_helper v3d drm_kms_helper bcm2835_isp(C) v4l2_mem2mem bcm2835_v4l2(C) bcm2835_mmal_vchiq(C) raspberrypi_hwmon videobuf2_vmalloc videobuf2_dma_contig
gpu_sched snd_soc_core drm_shmem_helper ecdh_generic ecc videobuf2_memops rfkill videobuf2_v4l2 videodev snd_compress snd_pcm_dmaengine libaes raspberrypi_gpiomem snd_bcm2835(C) vc_sm_cma(C) snd_pcm videobuf2_com
mon snd_timer snd mc nvmem_rmem uio_pdrv_genirq uio drm fuse dm_mod drm_panel_orientation_quirks backlight ip_tables x_tables ipv6 i2c_brcmstb
[ 391.091067] CPU: 0 PID: 1188 Comm: kmscube Tainted: G C 6.6.47+rpt-rpi-v8 #1 Debian 1:6.6.47-1+rpt1
[ 391.091075] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[ 391.091079] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 391.091085] pc : free_large_kmalloc+0x78/0xb8
[ 391.091092] lr : kfree+0x134/0x140
[ 391.091098] sp : ffffffc0800fbb60
[ 391.091101] x29: ffffffc0800fbb60 x28: ffffffe1ace63b48 x27: ffffffc0800fbcf8
[ 391.091111] x26: ffffffc0800fbcf8 x25: ffffff804174b400 x24: 0000000000000049
[ 391.091121] x23: ffffffc0800fbcf8 x22: ffffffe1acf050a0 x21: ffffffe1acf04a84
[ 391.091130] x20: ffffff8044873820 x19: fffffffe010a8c40 x18: 0000000000000000
[ 391.091140] x17: 0000000000000000 x16: ffffffe2126d4760 x15: 00000000ffeaffc0
[ 391.091149] x14: 0000000000000004 x13: ffffff8044873808 x12: 0000000000000000
[ 391.091158] x11: ffffff804b1b1de8 x10: ffffff804b1b1da8 x9 : ffffffe2126d4894
[ 391.091167] x8 : ffffff804b1b1dd0 x7 : 0000000000000000 x6 : 0000000000000228
[ 391.091175] x5 : 0000000000000000 x4 : 0000000000000000 x3 : fffffffe010a8c40
[ 391.091184] x2 : 0000000000000001 x1 : ffffff8042a31498 x0 : 4000000000000000
[ 391.091193] Call trace:
[ 391.091196] free_large_kmalloc+0x78/0xb8
[ 391.091203] kfree+0x134/0x140
[ 391.091209] v3d_perfmon_put.part.0+0x64/0x90 [v3d]
[ 391.091237] v3d_perfmon_destroy_ioctl+0x54/0x80 [v3d]
[ 391.091254] drm_ioctl_kernel+0xd8/0x190 [drm]
[ 391.091378] drm_ioctl+0x220/0x4c0 [drm]
[ 391.091462] drm_compat_ioctl+0x118/0x140 [drm]
[ 391.091546] __arm64_compat_sys_ioctl+0x158/0x180
[ 391.091558] invoke_syscall+0x50/0x128
[ 391.091567] el0_svc_common.constprop.0+0x48/0xf0
[ 391.091574] do_el0_svc_compat+0x24/0x48
[ 391.091581] el0_svc_compat+0x30/0x88
[ 391.091591] el0t_32_sync_handler+0x98/0x140
[ 391.091595] el0t_32_sync+0x194/0x198
[ 391.091601] ---[ end trace 0000000000000000 ]---
[ 391.091950] object pointer: 0x000000007ee5e213
Additional context
No response
Metadata
Metadata
Assignees
Labels
No labels