Description
Currently in torchao/experimental, we use a ukernel config to identify the function pointers to use in the linear operator.
During runtime, we select ukernel config to use, but the current logic is very simplistic. This is partially because we currently only have one kind of kernel.
But if we wish to support more kernels in future (e.g., GEMM kernels, kernels from KleidiAI, kernels based on i8mm), we need a better ukernel config selection mechanism.
We'd like to select an appropriate ukernel based on features like CPU uarch, activation size, and packing format. We can use CPU info to get CPU uarch. The feature request here is to design an efficient dynamic kernel selection infrastructure. XNNPACK has a similar feature implemented.
For now, we will select the ukernel config based on the CPU, but in future we might want to extend the design to select a different ukernel config based on the CPU core.