-
Notifications
You must be signed in to change notification settings - Fork 100
[Intel-SIG] [Meteor Lake] Sync 6.6 EDAC driver patches to Deepin 6.6 #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Intel-SIG] [Meteor Lake] Sync 6.6 EDAC driver patches to Deepin 6.6 #79
Conversation
Errors of some I/O devices can be signaled by MCE and logged in IOMCA bank. Add MCACOD code of generic I/O error and related macros for MCi_MISC to support IOMCA logging. See Intel Software Developers' Manual, version 071, volume 3B, section "IOMCA". Intel-SIG: x86/mce: Add MCACOD code for generic I/O error. intel/linux-intel-lts@1afcf245df31 Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> [ Qingdong Shi: amend commit log ] Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
Integrated Error Handlers (IEHs) are PCIe devices which aggregate and report error events of different severities (correctable, non-fatal uncorrectable, and fatal uncorrectable) from various I/O devices, e.g., PCIe devices, legacy PCI devices. Each error severity is notified by one of {SMI, NMI, MCE} which is configured by BIOS/platform firmware. The first IEH-supported platform is Intel Tiger Lake-U CPU. The driver reads/prints the error severity and error source (bus/device/function) logged in the IEH(s) and restarts the system on fatal I/O device error. Intel-SIG: EDAC/ieh: Add I/O device EDAC driver for Intel CPUs with IEH. intel/linux-intel-lts@9f721ed88580 Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> [ Qingdong Shi: amend commit log ] Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
Tiger Lake-H SoC shares the same Integrated Error Handler(IEH) architecture with Tiger Lake-U, so can use the same ieh_edac driver. Add Tiger Lake-H IEH device ID for I/O device EDAC support. Intel-SIG: EDAC/ieh: Add I/O device EDAC support for Intel Tiger Lake-H SoC. intel/linux-intel-lts@39b1cbd8dc70 Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> [ Qingdong Shi: amend commit log ] Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
The igen6_edac driver is the root to capture the In-Band ECC error event. There are some external modules which want to be notified about the In-Band ECC errors for specific error handling. So add the registration APIs for those external modules for the In-Band ECC errors. Intel-SIG: EDAC/igen6: Add registration APIs for In-Band ECC error notification. intel/linux-intel-lts@da1536bdff36 Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> [ Qingdong Shi: amend commit log ] Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
Mainline: commit c4a5398("EDAC/igen6: Add Intel Alder Lake-N SoCs support") from: v6.8-rc1 Add several Intel Alder Lake-N SoC compute die IDs for EDAC support. Alder Lake-N (one memory controller) is a cut-down derivative of Alder Lake-P (two memory controllers). Intel-SIG: Upstream commit c4a5398 ("EDAC/igen6: Add Intel Alder Lake-N SoCs support"). Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> [ Qingdong Shi: amend commit log ] Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
Mainline: commit d23627a("EDAC/igen6: Add Intel Raptor Lake-P SoCs support") from: v6.8-rc1 Add several Intel Raptor Lake-P SoC compute die IDs for EDAC support. These Raptor Lake-P SoCs use similar memory controller and IBECC as Alder Lake-P SoC but extend the most significant bit of error address logged in IBECC from bit 38 to bit 45. Intel-SIG: Upstream commit d23627a ("EDAC/igen6: Add Intel Raptor Lake-P SoCs support"). upstream link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d23627a7688f&dt=2 Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> [ Qingdong Shi: amend commit log ] Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
Mainline: commit a264f71 ("EDAC/igen6: Make get_mchbar() helper function") from: v6.8-rc1 Make get_mchbar() helper function to retrieve the BAR address of the memory controller. No function changes. Intel-SIG: Upstream commit a264f71 ("EDAC/igen6: Make get_mchbar() helper function"). Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> [ Qingdong Shi: amend commit log ] Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
Mainline: commit 3c77090 ("EDAC/igen6: Add Intel Meteor Lake-PS SoCs support") from: v6.8-rc1 Add several Intel Meteor Lake-PS SoC compute die IDs for EDAC support. These Meteor Lake-PS SoCs use similar memory controller and IBECC as Alder Lake-P SoC. Intel-SIG: Upstream commit 3c77090 ("EDAC/igen6: Add Intel Meteor Lake-PS SoCs support"). Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> [ Qingdong Shi: amend commit log ] Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
Mainline: commit 6807434 ("EDAC/igen6: Add Intel Meteor Lake-P SoCs support") from: v6.8-rc1 Add several Intel Meteor Lake-P SoC compute die IDs for EDAC support. These Meteor Lake-P SoCs use similar memory controller and IBECC as Alder Lake-P SoC. Intel-SIG: Upstream commit 6807434 ("EDAC/igen6: Add Intel Meteor Lake-P SoCs support"). Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> [ Qingdong Shi: amend commit log ] Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
commit 667574e upstream. When tries to demote 1G hugetlb folios, a lockdep warning is observed: ============================================ WARNING: possible recursive locking detected 6.10.0-rc6-00452-ga4d0275fa660-dirty deepin-community#79 Not tainted -------------------------------------------- bash/710 is trying to acquire lock: ffffffff8f0a7850 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0x244/0x460 but task is already holding lock: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&h->resize_lock); lock(&h->resize_lock); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by bash/710: #0: ffff8f118439c3f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0 #1: ffff8f11893b9e88 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0 #2: ffff8f1183dc4428 (kn->active#98){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0 #3: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460 stack backtrace: CPU: 3 PID: 710 Comm: bash Not tainted 6.10.0-rc6-00452-ga4d0275fa660-dirty deepin-community#79 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x68/0xa0 __lock_acquire+0x10f2/0x1ca0 lock_acquire+0xbe/0x2d0 __mutex_lock+0x6d/0x400 demote_store+0x244/0x460 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x380/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xb9/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fa61db14887 RSP: 002b:00007ffc56c48358 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa61db14887 RDX: 0000000000000002 RSI: 000055a030050220 RDI: 0000000000000001 RBP: 000055a030050220 R08: 00007fa61dbd1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 R13: 00007fa61dc1b780 R14: 00007fa61dc17600 R15: 00007fa61dc16a00 </TASK> Lockdep considers this an AA deadlock because the different resize_lock mutexes reside in the same lockdep class, but this is a false positive. Place them in distinct classes to avoid these warnings. Link: https://lkml.kernel.org/r/20240712031314.2570452-1-linmiaohe@huawei.com Fixes: 8531fc6 ("hugetlb: add hugetlb demote page support") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit b10c359e644235bc1c1899981d7203cbf5abf443)
commit 667574e upstream. When tries to demote 1G hugetlb folios, a lockdep warning is observed: ============================================ WARNING: possible recursive locking detected 6.10.0-rc6-00452-ga4d0275fa660-dirty deepin-community#79 Not tainted -------------------------------------------- bash/710 is trying to acquire lock: ffffffff8f0a7850 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0x244/0x460 but task is already holding lock: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&h->resize_lock); lock(&h->resize_lock); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by bash/710: #0: ffff8f118439c3f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0 #1: ffff8f11893b9e88 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0 #2: ffff8f1183dc4428 (kn->active#98){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0 #3: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460 stack backtrace: CPU: 3 PID: 710 Comm: bash Not tainted 6.10.0-rc6-00452-ga4d0275fa660-dirty deepin-community#79 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x68/0xa0 __lock_acquire+0x10f2/0x1ca0 lock_acquire+0xbe/0x2d0 __mutex_lock+0x6d/0x400 demote_store+0x244/0x460 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x380/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xb9/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fa61db14887 RSP: 002b:00007ffc56c48358 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa61db14887 RDX: 0000000000000002 RSI: 000055a030050220 RDI: 0000000000000001 RBP: 000055a030050220 R08: 00007fa61dbd1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 R13: 00007fa61dc1b780 R14: 00007fa61dc17600 R15: 00007fa61dc16a00 </TASK> Lockdep considers this an AA deadlock because the different resize_lock mutexes reside in the same lockdep class, but this is a false positive. Place them in distinct classes to avoid these warnings. Link: https://lkml.kernel.org/r/20240712031314.2570452-1-linmiaohe@huawei.com Fixes: 8531fc6 ("hugetlb: add hugetlb demote page support") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 60c223ed1f6f47ad36da6b4032e3b45cad8fb41f)
commit 667574e upstream. When tries to demote 1G hugetlb folios, a lockdep warning is observed: ============================================ WARNING: possible recursive locking detected 6.10.0-rc6-00452-ga4d0275fa660-dirty deepin-community#79 Not tainted -------------------------------------------- bash/710 is trying to acquire lock: ffffffff8f0a7850 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0x244/0x460 but task is already holding lock: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&h->resize_lock); lock(&h->resize_lock); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by bash/710: #0: ffff8f118439c3f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0 #1: ffff8f11893b9e88 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0 #2: ffff8f1183dc4428 (kn->active#98){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0 #3: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460 stack backtrace: CPU: 3 PID: 710 Comm: bash Not tainted 6.10.0-rc6-00452-ga4d0275fa660-dirty deepin-community#79 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x68/0xa0 __lock_acquire+0x10f2/0x1ca0 lock_acquire+0xbe/0x2d0 __mutex_lock+0x6d/0x400 demote_store+0x244/0x460 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x380/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xb9/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fa61db14887 RSP: 002b:00007ffc56c48358 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa61db14887 RDX: 0000000000000002 RSI: 000055a030050220 RDI: 0000000000000001 RBP: 000055a030050220 R08: 00007fa61dbd1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 R13: 00007fa61dc1b780 R14: 00007fa61dc17600 R15: 00007fa61dc16a00 </TASK> Lockdep considers this an AA deadlock because the different resize_lock mutexes reside in the same lockdep class, but this is a false positive. Place them in distinct classes to avoid these warnings. Link: https://lkml.kernel.org/r/20240712031314.2570452-1-linmiaohe@huawei.com Fixes: 8531fc6 ("hugetlb: add hugetlb demote page support") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 99a49b6)
commit 667574e upstream. When tries to demote 1G hugetlb folios, a lockdep warning is observed: ============================================ WARNING: possible recursive locking detected 6.10.0-rc6-00452-ga4d0275fa660-dirty #79 Not tainted -------------------------------------------- bash/710 is trying to acquire lock: ffffffff8f0a7850 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0x244/0x460 but task is already holding lock: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&h->resize_lock); lock(&h->resize_lock); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by bash/710: #0: ffff8f118439c3f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0 #1: ffff8f11893b9e88 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0 #2: ffff8f1183dc4428 (kn->active#98){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0 #3: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460 stack backtrace: CPU: 3 PID: 710 Comm: bash Not tainted 6.10.0-rc6-00452-ga4d0275fa660-dirty #79 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x68/0xa0 __lock_acquire+0x10f2/0x1ca0 lock_acquire+0xbe/0x2d0 __mutex_lock+0x6d/0x400 demote_store+0x244/0x460 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x380/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xb9/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fa61db14887 RSP: 002b:00007ffc56c48358 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa61db14887 RDX: 0000000000000002 RSI: 000055a030050220 RDI: 0000000000000001 RBP: 000055a030050220 R08: 00007fa61dbd1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 R13: 00007fa61dc1b780 R14: 00007fa61dc17600 R15: 00007fa61dc16a00 </TASK> Lockdep considers this an AA deadlock because the different resize_lock mutexes reside in the same lockdep class, but this is a false positive. Place them in distinct classes to avoid these warnings. Link: https://lkml.kernel.org/r/20240712031314.2570452-1-linmiaohe@huawei.com Fixes: 8531fc6 ("hugetlb: add hugetlb demote page support") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 99a49b6)
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR synchronizes EDAC driver updates for Intel Meteor Lake and related SoCs, adding in-band ECC error notification support (IBECC), new SKU configurations, and a new IEH EDAC driver.
- Introduce notifier API for IBECC error handling
- Extend
igen6
driver with Meteor Lake-P/S SKU support and error address extraction - Add a new EDAC driver (
ieh_edac
) for Intel Integrated Error Handler devices, plus build and Kconfig entries - Update MCE definitions to include generic I/O error codes
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
drivers/edac/igen6_edac.h | Declare IBECC notifier API and ibecc_err_info struct |
drivers/edac/igen6_edac.c | Implement IBECC notifier chain, new SKU configs, and helpers |
drivers/edac/ieh_edac.c | New IEH EDAC driver implementation |
drivers/edac/Makefile | Add ieh_edac.o to EDAC build targets |
drivers/edac/Kconfig | Add EDAC_IEH configuration entry |
arch/x86/include/asm/mce.h | Add MCACOD_IOERR and PCI misc bitfield macros |
Comments suppressed due to low confidence (2)
drivers/edac/igen6_edac.c:85
- [nitpick] The macro name 'ECC_ERROR_LOG_ADDR45' is unclear about its purpose; consider renaming to a descriptive name like 'ECC_ERROR_LOG_ADDR_WIDE' to indicate it extracts a wider address field.
#define ECC_ERROR_LOG_ADDR45(v) GET_BITFIELD(v, 5, 45)
drivers/edac/igen6_edac.h:1
- [nitpick] It would be helpful to add brief function comments for
ibecc_err_register_notifier
andibecc_err_unregister_notifier
to explain their usage and the notifier chain semantics.
/* SPDX-License-Identifier: GPL-2.0 */
int ibecc_err_register_notifer(struct notifier_block *nb); | ||
int ibecc_err_unregister_notifer(struct notifier_block *nb); |
Copilot
AI
Jul 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function name 'ibecc_err_register_notifer' contains a typo: 'notifer' should be 'notifier' for consistency and clarity.
int ibecc_err_register_notifer(struct notifier_block *nb); | |
int ibecc_err_unregister_notifer(struct notifier_block *nb); | |
int ibecc_err_register_notifier(struct notifier_block *nb); | |
int ibecc_err_unregister_notifier(struct notifier_block *nb); |
Copilot uses AI. Check for mistakes.
eaddr = ECC_ERROR_LOG_ADDR(node->ecclog) << | ||
ECC_ERROR_LOG_ADDR_SHIFT; | ||
if (res_cfg->err_addr) | ||
eaddr = res_cfg->err_addr(node->ecclog); |
Copilot
AI
Jul 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using the err_addr
callback you must apply the same left shift (<< ECC_ERROR_LOG_ADDR_SHIFT
) after extracting the bitfield, or the computed physical address will be incorrect.
eaddr = res_cfg->err_addr(node->ecclog); | |
eaddr = res_cfg->err_addr(node->ecclog) << | |
ECC_ERROR_LOG_ADDR_SHIFT; |
Copilot uses AI. Check for mistakes.
#define DID_ADL_SKU4 0x4641 | ||
|
||
/* Compute die IDs for Alder Lake-N with IBECC */ | ||
#define DID_ADL_N_SKU1 0x4614 |
Copilot
AI
Jul 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The many DID_*
SKU definitions result in a long, repetitive block; consider grouping them in an array or using helper macros to simplify future updates.
Copilot uses AI. Check for mistakes.
Reviewer's GuideIntroduces IBECC error extraction and notifier support in the igen6_edac driver for new Intel SoCs (Meteor Lake-P/PS, Raptor Lake-P, Alder Lake-N), refactors MCHBAR access into a helper, extends the x86 MCE interface for generic I/O error decoding, and adds a dedicated IEH EDAC driver with build and Kconfig integration. Sequence diagram for IBECC error notification via notifier chainsequenceDiagram
participant igen6_edac as igen6_edac driver
participant ibecc_err_handler_chain as ibecc_err_handler_chain
participant handler as External Handler (notifier_block)
igen6_edac->>ibecc_err_handler_chain: blocking_notifier_call_chain(&ibecc_err_handler_chain, 0, &e)
ibecc_err_handler_chain-->>handler: handler->notifier_call(&e)
handler-->>ibecc_err_handler_chain: return NOTIFY_OK/NOTIFY_DONE
ibecc_err_handler_chain-->>igen6_edac: return
Class diagram for new IBECC error notification and IEH EDAC driver integrationclassDiagram
class res_config {
+u32 ibecc_base
+u32 ibecc_error_log_offset
+bool (*ibecc_available)(struct pci_dev *pdev)
+u64 (*err_addr)(u64 ecclog)
+u64 (*err_addr_to_sys_addr)(u64 eaddr, int mc)
+u64 (*err_addr_to_imc_addr)(u64 eaddr, int mc)
}
class ibecc_err_info {
+enum hw_event_mc_err_type type
+u64 sys_addr
+u64 ecc_log
}
class notifier_block {
<<external>>
}
class igen6_edac {
+int ibecc_err_register_notifer(struct notifier_block *nb)
+int ibecc_err_unregister_notifer(struct notifier_block *nb)
+static BLOCKING_NOTIFIER_HEAD(ibecc_err_handler_chain)
}
class ieh_dev {
+struct list_head list
+struct pci_dev *pdev
+struct pci_sbdf sbdf
+enum ieh_type type
+u8 ver
+enum evt_map corr_map
+enum evt_map nonfatal_map
+enum evt_map fatal_map
}
class ieh_config {
+u16 did
+enum action_on_fatal_err action
}
class decoded_res {
+enum severity_level sev
+struct pci_sbdf sbdf
}
igen6_edac --> ibecc_err_info : notifies
igen6_edac --> notifier_block : registers
res_config o-- igen6_edac : used by
ieh_dev o-- ieh_config : uses
ieh_dev o-- decoded_res : outputs error info
Class diagram for new IEH EDAC driver core structuresclassDiagram
class ieh_dev {
+struct list_head list
+struct pci_dev *pdev
+struct pci_sbdf sbdf
+enum ieh_type type
+u8 ver
+enum evt_map corr_map
+enum evt_map nonfatal_map
+enum evt_map fatal_map
}
class pci_sbdf {
+u32 seg : 16
+u32 bus : 8
+u32 dev : 5
+u32 fun : 3
}
class decoded_res {
+enum severity_level sev
+struct pci_sbdf sbdf
}
class ieh_config {
+u16 did
+enum action_on_fatal_err action
}
ieh_dev o-- pci_sbdf : has
decoded_res o-- pci_sbdf : has
ieh_dev o-- ieh_config : uses
Class diagram for new and updated PCI device ID configurationclassDiagram
class res_config {
+bool machine_check
+int num_imc
+u32 imc_base
+u32 ibecc_base
+u32 ibecc_error_log_offset
+bool (*ibecc_available)(struct pci_dev *pdev)
+u64 (*err_addr)(u64 ecclog)
+u64 (*err_addr_to_sys_addr)(u64 eaddr, int mc)
+u64 (*err_addr_to_imc_addr)(u64 eaddr, int mc)
}
class pci_device_id {
+PCI_VDEVICE(INTEL, DID_*)
+kernel_ulong_t driver_data (points to res_config)
}
pci_device_id o-- res_config : references
Class diagram for new MCE error code and helpers in x86 mce.hclassDiagram
class mce {
+u64 status
+u64 misc
+u64 addr
+u64 mcgstatus
+u64 mcgcap
+u64 ip
+u32 bank
+u64 kflags
}
class mce_helpers {
+#define MCACOD_IOERR 0x0e0b
+#define MCI_MISC_PCISEG_MASK
+#define MCI_MISC_PCISEG(m)
+#define MCI_MISC_PCIRID_MASK
+#define MCI_MISC_PCIRID(m)
}
mce_helpers <.. mce : used for decoding
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @shiqingd - I've reviewed your changes - here's some feedback:
- A corresponding Kconfig entry for CONFIG_EDAC_IEH is missing, please add it so that ieh_edac.o can be properly configured.
- Duplicate MCHBAR reading code remains in igen6_pci_setup after introducing get_mchbar; consolidate them to avoid code duplication.
- Blocking notifier calls are used in igen6_output_error which may run in atomic context; consider a non-blocking notifier or deferring to workqueue to prevent deadlocks.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- A corresponding Kconfig entry for CONFIG_EDAC_IEH is missing, please add it so that ieh_edac.o can be properly configured.
- Duplicate MCHBAR reading code remains in igen6_pci_setup after introducing get_mchbar; consolidate them to avoid code duplication.
- Blocking notifier calls are used in igen6_output_error which may run in atomic context; consider a non-blocking notifier or deferring to workqueue to prevent deadlocks.
## Individual Comments
### Comment 1
<location> `drivers/edac/igen6_edac.c:604` </location>
<code_context>
+static BLOCKING_NOTIFIER_HEAD(ibecc_err_handler_chain);
+
+int ibecc_err_register_notifer(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&ibecc_err_handler_chain, nb);
+}
+EXPORT_SYMBOL_GPL(ibecc_err_register_notifer);
+
+int ibecc_err_unregister_notifer(struct notifier_block *nb)
</code_context>
<issue_to_address>
Typo in function name 'notifer' should be 'notifier'.
The typo appears in both the function names, the header file, and the EXPORT_SYMBOL_GPL macro. Please update all instances to 'notifier' for consistency.
Suggested implementation:
```c
int ibecc_err_register_notifier(struct notifier_block *nb)
{
return blocking_notifier_chain_register(&ibecc_err_handler_chain, nb);
}
EXPORT_SYMBOL_GPL(ibecc_err_register_notifier);
int ibecc_err_unregister_notifier(struct notifier_block *nb)
{
return blocking_notifier_chain_unregister(&ibecc_err_handler_chain, nb);
}
EXPORT_SYMBOL_GPL(ibecc_err_unregister_notifier);
```
If there is a header file (e.g., `drivers/edac/igen6_edac.h` or similar) that declares these functions, you should also update the function declarations there to use `notifier` instead of `notifer`.
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
int ibecc_err_register_notifer(struct notifier_block *nb) | ||
{ | ||
return blocking_notifier_chain_register(&ibecc_err_handler_chain, nb); | ||
} | ||
EXPORT_SYMBOL_GPL(ibecc_err_register_notifer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (typo): Typo in function name 'notifer' should be 'notifier'.
The typo appears in both the function names, the header file, and the EXPORT_SYMBOL_GPL macro. Please update all instances to 'notifier' for consistency.
Suggested implementation:
int ibecc_err_register_notifier(struct notifier_block *nb)
{
return blocking_notifier_chain_register(&ibecc_err_handler_chain, nb);
}
EXPORT_SYMBOL_GPL(ibecc_err_register_notifier);
int ibecc_err_unregister_notifier(struct notifier_block *nb)
{
return blocking_notifier_chain_unregister(&ibecc_err_handler_chain, nb);
}
EXPORT_SYMBOL_GPL(ibecc_err_unregister_notifier);
If there is a header file (e.g., drivers/edac/igen6_edac.h
or similar) that declares these functions, you should also update the function declarations there to use notifier
instead of notifer
.
This PR include the 9 patches which are used for EDAC/igen6 Intel Meteor Lake-P SoCs support.
Five patches have already been upstreamed to Linux kernel community.
Upstream:
commit 7f5b45e
6807434 ("EDAC/igen6: Add Intel Meteor Lake-P SoCs support")
commit 4681a10
3c77090 ("EDAC/igen6: Add Intel Meteor Lake-PS SoCs support").
commit 72597e7
a264f71 ("EDAC/igen6: Make get_mchbar() helper function").
commit cfa3e9a
d23627a ("EDAC/igen6: Add Intel Raptor Lake-P SoCs support").
commit 8f077b8
c4a5398 ("EDAC/igen6: Add Intel Alder Lake-N SoCs support").
Four Patches have already been uploaded to github/linux-intel-lts public repo.
commit be12908
intel/linux-intel-lts@da1536bdff36 ("EDAC/igen6: Add registration APIs for In-Band ECC error notification")
commit 1673f29
intel/linux-intel-lts@39b1cbd8dc70 ("EDAC/ieh: Add I/O device EDAC support for Intel Tiger Lake-H SoC")
commit 15ecf04
intel/linux-intel-lts@9f721ed88580 ("EDAC/ieh: Add I/O device EDAC driver for Intel CPUs with IEH")
commit 22dbb4d
intel/linux-intel-lts@1afcf245df31 ("x86/mce: Add MCACOD code for generic I/O error")
Summary by Sourcery
Sync and integrate upstream EDAC driver improvements by adding support for new Intel SoCs (Meteor Lake-P/PS), introducing the IEH I/O device EDAC driver for Tiger Lake, and enhancing IBECC error handling with a notifier API and broader error address support
New Features:
Enhancements:
Build: