Skip to content

Commit 1dab456

Browse files
icklematt-auld
authored andcommitted
drm/i915/reset: Handle reset timeouts under unrelated kernel hangs
When resuming after hibernate sometimes we see hangs in unrelated kernel subsystems. These hangs often result in the following i915 trace: i915 0000:00:02.0: [drm] *ERROR* \ intel_gt_reset_global timed out, cancelling all in-flight rendering implying our reset task has been starved by the hanging kernel subsystem, causing us to inappropiately declare the system as wedged beyond recovery. The trace would be caused by our synchronize_srcu_expedited() taking more than the allowed 5s due to the unrelated kernel hang. But we neither need to perform that synchronisation inside the reset watchdog, nor do we need such a short timeout before declaring the device as unrecoverable. v2: Restore watchdog timeout to the previous 5 seconds (Ashutosh) Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/3575 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220630043959.5708-1-ashutosh.dixit@intel.com
1 parent 17cd10a commit 1dab456

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

drivers/gpu/drm/i915/gt/intel_reset.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1281,9 +1281,6 @@ static void intel_gt_reset_global(struct intel_gt *gt,
12811281
intel_wedge_on_timeout(&w, gt, 5 * HZ) {
12821282
intel_display_prepare_reset(gt->i915);
12831283

1284-
/* Flush everyone using a resource about to be clobbered */
1285-
synchronize_srcu_expedited(&gt->reset.backoff_srcu);
1286-
12871284
intel_gt_reset(gt, engine_mask, reason);
12881285

12891286
intel_display_finish_reset(gt->i915);
@@ -1392,6 +1389,9 @@ void intel_gt_handle_error(struct intel_gt *gt,
13921389
}
13931390
}
13941391

1392+
/* Flush everyone using a resource about to be clobbered */
1393+
synchronize_srcu_expedited(&gt->reset.backoff_srcu);
1394+
13951395
intel_gt_reset_global(gt, engine_mask, msg);
13961396

13971397
if (!intel_uc_uses_guc_submission(&gt->uc)) {

0 commit comments

Comments
 (0)