Skip to content

ami_tool device_boot does not remove all PCIe devices, causing them to crash #21

@andrew-bolin

Description

@andrew-bolin

I will explain via an example. Apologies if I misuse some terminology, I am not an expert in PCIe.

We are using AVED + QDMA, with one PCIe device (.0) for the AVED interface, and another for QDMA (.1)

 86:00.0 Processing accelerators: Xilinx Corporation Device 50b4
	Subsystem: Xilinx Corporation Device 000e
        [...]
	Kernel driver in use: ami
	Kernel modules: ami

86:00.1 Processing accelerators: Xilinx Corporation Device 50b5
	Subsystem: Xilinx Corporation Device 000e
        [...]
	Kernel driver in use: qdma-pf
	Kernel modules: qdma_pf, ami

Sadly, when we swap our FPGA images using the AMI tool device boot command

ami_tool device_boot -d 86 -p 0

the QDMA interface stops working, because the QDMA kernel module does not see its magic number (it sees 0xFFFF instead).

We eventually found an AMD forum post with a simple fix - remove the QDMA device and rescan the bus.

We can use this fix (it boils down to writing a 1 to /sys/bus/pci/devices/0000:86:00.1/remove before running ami_tool device_boot).

However, I feel that it would be best if ami_tool called remove for all of the card's PCIe devices prior to rescanning the bus.

ami_dev_hot_reset (in sw/AMI/api/src/ami_device.c) follows these steps:

find the PCI port
read some config
set PMC GPIO
pci_remove(0x8600) // <-- only device 86:00.0
// pci_remove(0x8601)  // <-- this does not happen, but I think it should!
read bridge control
set SBR
reset SBR
small sleep
pci_rescan()

Could it be changed so that pci_remove is called for each BDF produced by the card?

I am guessing that the problem is not specific to QDMA, but that any PCIe device other than the AMI one will be broken by AMI resetting the card... Does AMI have knowledge of all the BDFs that a card is producing? Could it discover them?

I'm tempted to just hack in a line like pci_remove(bdf | 1), but I am aware that is not a good fix (e.g. it will not be happy in FPGA designs that do not have a '.1' PCIe device...)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions