Skip to content

Conversation

@stuarthayes
Copy link

Add a new controller for the recently-added kernel NPEM/_DSM driver (in the kernel at drivers/pci/npem.c) that provides access to NPEM LEDs via sysfs. This allows access even if the kernel doesn't allow direct access to PCI config space (for NPEM), and also allows access to the PCI _DSM method of controlling PCIe SSD LEDs (PCI Firmware Specification, Revision 3.3, section 4.7 "_DSM Definitions for PCIe SSD Status LED").

The code was based on the npem controller code, but uses the sysfs interface rather than directly accessing the NPEM capability in PCI config space.

mtkaczyk
mtkaczyk previously approved these changes Oct 29, 2025
Copy link
Member

@mtkaczyk mtkaczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thank you!

Copy link
Collaborator

@bkucman bkucman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Stuart ,

Thank you for your contribution, Nice work!
After the review, I have a few points to clarify and some minor corrections.

It seems that the NPEM kernel still operates per slots, so my question is: did you use the implemented slot tests for the "NPEM kernel"? If not, please run them and verify that everything works correctly.

Also, please rebase the branch.

Thanks!

Add a new controller for the recently-added kernel NPEM/_DSM driver
(in the kernel at drivers/pci/npem.c) that provides access to NPEM LEDs via
sysfs. This allows access even if the kernel doesn't allow direct access to
PCI config space (for NPEM), and also allows access to the PCI _DSM method
of controlling PCIe SSD LEDs (PCI Firmware Specification, Revision 3.3,
section 4.7 "_DSM Definitions for PCIe SSD Status LED").

The code was based on the npem controller code, but uses the sysfs
interface rather than directly accessing the NPEM capability in PCI config
space.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
A few cosmetic changes for kernel_npem.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Include LED_CNTRL_TYPE_KERNEL_NPEM in led_controller_slot_support().

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Add the newly-added KERNEL_NPEM controller type to two places in block.c
that were missed in the initial commit.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Update lib_unit_test.c to allow 7 controller types so it doesn't fail with
KERNEL_NPEM.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
@stuarthayes
Copy link
Author

It seems that the NPEM kernel still operates per slots, so my question is: did you use the implemented slot tests for the "NPEM kernel"? If not, please run them and verify that everything works correctly.

I hadn't run the tests. I found an issue when I ran the tests, which I fixed with another commit (7088fe0).

@stuarthayes
Copy link
Author

Also, please rebase the branch.

Done, thanks for all the feedback!

@tasleson
Copy link
Collaborator

tasleson commented Nov 21, 2025

FYI: I'm in the middle of reviewing, testing etc. I should be done by end of day Tuesday. @stuarthayes Thanks for this PR, greatly appreciated!

I manually tested this on a Dell PowerEdge R750. I'm seeing the following let states when listing the slots:

# ./src/ledctl/ledctl --list-slots -n "Kernel NPEM"
slot: 0000:69:00.0    led state: UNKNOWN         device: /dev/nvme1n1   
slot: 0000:6a:00.0    led state: UNKNOWN         device: /dev/nvme2n1   
slot: 0000:6b:00.0    led state: UNKNOWN         device: /dev/nvme6n1   
slot: 0000:6c:00.0    led state: UNKNOWN         device: /dev/nvme8n1   
slot: 0000:6d:00.0    led state: UNKNOWN         device: /dev/nvme10n1  
slot: 0000:6e:00.0    led state: UNKNOWN         device: /dev/nvme14n1  
slot: 0000:6f:00.0    led state: UNKNOWN         device: /dev/nvme18n1  
slot: 0000:70:00.0    led state: UNKNOWN         device: /dev/nvme19n1  
slot: 0000:e7:00.0    led state: UNKNOWN         device: /dev/nvme0n1   
slot: 0000:e8:00.0    led state: UNKNOWN         device: /dev/nvme3n1   
slot: 0000:e9:00.0    led state: UNKNOWN         device: /dev/nvme4n1   
slot: 0000:ea:00.0    led state: UNKNOWN         device: /dev/nvme5n1   
slot: 0000:eb:00.0    led state: UNKNOWN         device: /dev/nvme7n1   
slot: 0000:ec:00.0    led state: UNKNOWN         device: /dev/nvme9n1   
slot: 0000:ed:00.0    led state: UNKNOWN         device: /dev/nvme11n1  
slot: 0000:ee:00.0    led state: UNKNOWN         device: /dev/nvme12n1  
slot: 0000:f1:00.0    led state: UNKNOWN         device: /dev/nvme13n1  
slot: 0000:f2:00.0    led state: UNKNOWN         device: /dev/nvme15n1  
slot: 0000:f3:00.0    led state: UNKNOWN         device: /dev/nvme16n1  
slot: 0000:f4:00.0    led state: UNKNOWN         device: /dev/nvme17n1 

The hardware only supports locate, and I can both set and read that state:

# ./src/ledctl/ledctl --set-slot --slot 0000:f4:00.0 --controller-type "Kernel NPEM" --state locate
# ./src/ledctl/ledctl --list-slots -n "Kernel NPEM" | grep 0000:f4:00.0
slot: 0000:f4:00.0    led state: LOCATE          device: /dev/nvme17n1

Given that, I think it’s reasonable to report this as a known state (e.g., “OK”) rather than “UNKNOWN” when the hardware only implements a subset of the standard. We still need to be able to distinguish between:

  • a LED path that exists and returns a valid value (state is known), and
  • how we interpret that value semantically (e.g., mapping it to “OK”, “LOCATE”, etc.).

On a related note, I’m a little unsure about exposing yet another controller type to the end user. From their perspective, this feels more like an implementation detail: the same hardware on an older kernel would use a different implementation but should ideally look the same from the tool’s interface. That said, I understand why you introduced a new controller type here; I just want to flag the end user/API difference.

Notes

  • Unit tests fail on this hardware, as expected with what I'm seeing.
  • I'm not sure if this behavior is consistent for other LED transports too, when specifying an unsupported state for the hardware.
# ledctl failure=/dev/nvme16n1 ; echo $?;
/dev/shm/ledmon.conf: does not exist, using global config file
/etc/ledmon.conf: does not exist, using built-in defaults
0
# ledctl --set-slot --slot 0000:f4:00.0 --controller-type "Kernel NPEM" --state failure ; echo $?
/dev/shm/ledmon.conf: does not exist, using global config file
/etc/ledmon.conf: does not exist, using built-in defaults
10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants