Skip to content

Commit

Permalink
drm/xe/guc: Dead CT helper
Browse files Browse the repository at this point in the history
Add a worker function helper for asynchronously dumping state when an
internal/fatal error is detected in CT processing. Being asynchronous
is required to avoid deadlocks and scheduling-while-atomic or
process-stalled-for-too-long issues. Also check for a bunch more error
conditions and improve the handling of some existing checks.

v2: Use compile time CONFIG check for new (but not directly CT_DEAD
related) checks and use unsigned int for a bitmask, rename
CT_DEAD_RESET to CT_DEAD_REARM and add some explaining comments,
rename 'hxg' macro parameter to 'ctb' - review feedback from Michal W.
Drop CT_DEAD_ALIVE as no need for a bitfield define to just set the
entire mask to zero.
v3: Fix kerneldoc
v4: Nullify some floating pointers after free.
v5: Add section headings and device info to make the state dump look
more like a devcoredump to allow parsing by the same tools (eventual
aim is to just call the devcoredump code itself, but that currently
requires an xe_sched_job, which is not available in the CT code).
v6: Fix potential for leaking snapshots with concurrent error
conditions (review feedback from Julia F).
v7: Don't complain about unexpected G2H messages yet because there is
a known issue causing them. Fix bit shift bug with v6 change. Add GT
id to fake coredump headers and use puts instead of printf.
v8: Disable the head mis-match check in g2h_read because it is failing
on various discrete platforms due to unknown reasons.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-9-John.C.Harrison@Intel.com
  • Loading branch information
johnharr-intel committed Oct 8, 2024
1 parent 754e707 commit d2c5a5a
Show file tree
Hide file tree
Showing 5 changed files with 335 additions and 29 deletions.
1 change: 1 addition & 0 deletions drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ struct guc_ct_buffer_desc {
#define GUC_CTB_STATUS_OVERFLOW (1 << 0)
#define GUC_CTB_STATUS_UNDERFLOW (1 << 1)
#define GUC_CTB_STATUS_MISMATCH (1 << 2)
#define GUC_CTB_STATUS_DISABLED (1 << 3)
u32 reserved[13];
} __packed;
static_assert(sizeof(struct guc_ct_buffer_desc) == 64);
Expand Down
2 changes: 1 addition & 1 deletion drivers/gpu/drm/xe/xe_guc.c
Original file line number Diff line number Diff line change
Expand Up @@ -1180,7 +1180,7 @@ void xe_guc_print_info(struct xe_guc *guc, struct drm_printer *p)

xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);

xe_guc_ct_print(&guc->ct, p, false);
xe_guc_ct_print(&guc->ct, p);
xe_guc_submit_print(guc, p);
}

Expand Down
Loading

0 comments on commit d2c5a5a

Please sign in to comment.