Skip to content

Using v3d performance counters causes kernel panic #6389

Closed
@w23

Description

@w23

Describe the bug

Using v3d performance counters, e.g. from GALLIUM_HUD leads to random kernel panics after a few runs.

Steps to reproduce the behaviour

  1. Get and build https://gitlab.freedesktop.org/mesa/kmscube/
  2. Set GALLIUM_HUD to show some performance counters. E.g like this:
GALLIUM_HUD_VISIBLE=false
GALLIUM_HUD=stdout
GALLIUM_HUD+=,fps
GALLIUM_HUD+=,frametime
GALLIUM_HUD+=,cpu
GALLIUM_HUD+=,samples-passed
GALLIUM_HUD+=,primitives-generated
GALLIUM_HUD+=,PTB-primitives-discarded-outside-viewport
GALLIUM_HUD+=,QPU-total-idle-clk-cycles
GALLIUM_HUD+=,QPU-total-active-clk-cycles-vertex-coord-shading
GALLIUM_HUD+=,QPU-total-active-clk-cycles-fragment-shading
GALLIUM_HUD+=,QPU-total-clk-cycles-executing-valid-instr
GALLIUM_HUD+=,QPU-total-clk-cycles-waiting-TMU
GALLIUM_HUD+=,QPU-total-clk-cycles-waiting-varyings
GALLIUM_HUD+=,QPU-total-instr-cache-hit
GALLIUM_HUD+=,QPU-total-instr-cache-miss
GALLIUM_HUD+=,TMU-total-text-quads-access
GALLIUM_HUD+=,TMU-total-text-cache-miss
GALLIUM_HUD+=,L2T-total-cache-hit
GALLIUM_HUD+=,L2T-total-cache-miss
GALLIUM_HUD+=,QPU-total-clk-cycles-waiting-vertex-coord-shading
GALLIUM_HUD+=,QPU-total-clk-cycles-waiting-fragment-shading
GALLIUM_HUD+=,TLB-partial-quads-written-to-color-buffer
GALLIUM_HUD+=,TMU-active-cycles
GALLIUM_HUD+=,TMU-stalled-cycles
GALLIUM_HUD+=,L2T-TMU-reads
GALLIUM_HUD+=,L2T-TMU-write-miss
GALLIUM_HUD+=,L2T-TMU-read-miss
GALLIUM_HUD+=,TMU-MRU-hits
./build/kmscube
  1. Run kmscube a few times for a few seconds each.
  2. After 3-4th time kernel will panic at some random and often unrelated location. I've seen stacks from drm+v3d, ext4, usb, etc. It's completely arbitrary.

Not sure if any specific counter, or combination of, is causing this. Using just a single counter doesn't seem to crash even after several tries. Enabling 10-20 counters crashes on a second run.

Device (s)

Raspberry Pi 4 Mod. B

System

Raspberry Pi reference 2024-07-04
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 0b115f302a8f1e5bd3523614d7f45b9d447434c7, stage2

Aug 30 2024 19:17:39
Copyright (c) 2012 Broadcom
version 2808975b80149bbfe86844655fe45c7de66fc078 (clean) (release) (start)

Linux lgpi00 6.6.47+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.47-1+rpt1 (2024-09-02) aarch64 GNU/Linux

Logs

Usually panic messages and stacks are completely unrelated, I've seen from usb and ext4 call stacks.
One that might be relevant:

[  391.090878] ------------[ cut here ]------------
[  391.090891] WARNING: CPU: 0 PID: 1188 at mm/slab_common.c:994 free_large_kmalloc+0x78/0xb8
[  391.090916] Modules linked in: cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep brcmfmac_wcc brcmfmac vc4 brcmutil cfg80211 binfmt_misc snd_soc_hdmi_codec hci_uart uvcvideo drm_display_helper b
tbcm cec bluetooth uvc rpivid_hevc(C) bcm2835_codec(C) drm_dma_helper v3d drm_kms_helper bcm2835_isp(C) v4l2_mem2mem bcm2835_v4l2(C) bcm2835_mmal_vchiq(C) raspberrypi_hwmon videobuf2_vmalloc videobuf2_dma_contig
gpu_sched snd_soc_core drm_shmem_helper ecdh_generic ecc videobuf2_memops rfkill videobuf2_v4l2 videodev snd_compress snd_pcm_dmaengine libaes raspberrypi_gpiomem snd_bcm2835(C) vc_sm_cma(C) snd_pcm videobuf2_com
mon snd_timer snd mc nvmem_rmem uio_pdrv_genirq uio drm fuse dm_mod drm_panel_orientation_quirks backlight ip_tables x_tables ipv6 i2c_brcmstb
[  391.091067] CPU: 0 PID: 1188 Comm: kmscube Tainted: G         C         6.6.47+rpt-rpi-v8 #1  Debian 1:6.6.47-1+rpt1
[  391.091075] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[  391.091079] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  391.091085] pc : free_large_kmalloc+0x78/0xb8
[  391.091092] lr : kfree+0x134/0x140
[  391.091098] sp : ffffffc0800fbb60
[  391.091101] x29: ffffffc0800fbb60 x28: ffffffe1ace63b48 x27: ffffffc0800fbcf8
[  391.091111] x26: ffffffc0800fbcf8 x25: ffffff804174b400 x24: 0000000000000049
[  391.091121] x23: ffffffc0800fbcf8 x22: ffffffe1acf050a0 x21: ffffffe1acf04a84
[  391.091130] x20: ffffff8044873820 x19: fffffffe010a8c40 x18: 0000000000000000
[  391.091140] x17: 0000000000000000 x16: ffffffe2126d4760 x15: 00000000ffeaffc0
[  391.091149] x14: 0000000000000004 x13: ffffff8044873808 x12: 0000000000000000
[  391.091158] x11: ffffff804b1b1de8 x10: ffffff804b1b1da8 x9 : ffffffe2126d4894
[  391.091167] x8 : ffffff804b1b1dd0 x7 : 0000000000000000 x6 : 0000000000000228
[  391.091175] x5 : 0000000000000000 x4 : 0000000000000000 x3 : fffffffe010a8c40
[  391.091184] x2 : 0000000000000001 x1 : ffffff8042a31498 x0 : 4000000000000000
[  391.091193] Call trace:
[  391.091196]  free_large_kmalloc+0x78/0xb8
[  391.091203]  kfree+0x134/0x140
[  391.091209]  v3d_perfmon_put.part.0+0x64/0x90 [v3d]
[  391.091237]  v3d_perfmon_destroy_ioctl+0x54/0x80 [v3d]
[  391.091254]  drm_ioctl_kernel+0xd8/0x190 [drm]
[  391.091378]  drm_ioctl+0x220/0x4c0 [drm]
[  391.091462]  drm_compat_ioctl+0x118/0x140 [drm]
[  391.091546]  __arm64_compat_sys_ioctl+0x158/0x180
[  391.091558]  invoke_syscall+0x50/0x128
[  391.091567]  el0_svc_common.constprop.0+0x48/0xf0
[  391.091574]  do_el0_svc_compat+0x24/0x48
[  391.091581]  el0_svc_compat+0x30/0x88
[  391.091591]  el0t_32_sync_handler+0x98/0x140
[  391.091595]  el0t_32_sync+0x194/0x198
[  391.091601] ---[ end trace 0000000000000000 ]---
[  391.091950] object pointer: 0x000000007ee5e213

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions