Skip to content

drm/vc4: Fix gpu reset#6239

Open
carlonluca wants to merge 1 commit intoraspberrypi:rpi-6.1.yfrom
carlonluca:tm-6.1.74
Open

drm/vc4: Fix gpu reset#6239
carlonluca wants to merge 1 commit intoraspberrypi:rpi-6.1.yfrom
carlonluca:tm-6.1.74

Conversation

@carlonluca
Copy link

This commit changes the bind procedure by avoiding the call to pm_runtime_resume_and_get. Since commit #71be60f, the reset of the GPU caused incorrect rendering and crashes. With this patch the GPU reset procedure properly resets the GPU, the rendering resumes properly and no crash occurs.

This commit changes the bind procedure by avoiding the call to
pm_runtime_resume_and_get. Since commit #71be60f, the reset of the
GPU caused incorrect rendering and crashes. With this patch the
GPU reset procedure properly resets the GPU, the rendering resumes
properly and no crash occurs.

Signed-off-by: Luca Carlon <carlon.luca@gmail.com>
@6by9
Copy link
Contributor

6by9 commented Jun 25, 2024

The rpi-6.1.y branch is now the legacy branch and isn't taking new downstream features/fixes, only fixes from upstream.

vc4_v3d.c is also almost unchanged from mainline, and those minor diffs should be merged upstream imminently. Fixes should be submitted to dri-devel for review.

The patch also feels dubious as you now have a call to clk_prepare_enable with no clk_disable_unprepare, and no call to power the domain before accessing the registers. That may mask a problem in the reset path that does something wrong in recovery, but it's only by messing up the accounting in pm_runtime and clks.
The original patch commit text describes the issue on unbind and bind afresh that it was fixing. Have you tested that scenario with your fix?

Minor comment that 71be60f is a hash from the rpi-5.15.y branch. It was merged into mainline as torvalds/linux@266cff3 in the 6.1 merge window.

@mairacanal
Copy link
Contributor

Just reenforcing @6by9 comment, I can't see how this patch could fix any GPU reset issue. By not using pm_runtime_resume_and_get(), we ignore the device's usage counter, which means that its value won't be incremented, and this will compromise the whole PM mechanism. Probably, this solution is just hiding a defect in the GPU reset mechanism.

About moving V3D_WRITE(V3D_BPOA, 0); and V3D_WRITE(V3D_BPOS, 0);, I cannot see how this could influence the GPU reset. The out-of-memory interrupt won't be enabled on vc4_irq_enable().

Check the comment on vc4_irq_enable():

	/* Enable the render done interrupts. The out-of-memory interrupt is
	 * enabled as soon as we have a binner BO allocated.
	 */

It is hard for me to see what this patch is trying to achieve. I believe it might fix your problem because the device's usage counter was compromised and it wasn't incremented.

@kiruthikpurpose

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants