Description
This one is really weird. I have an in-progress PR to expose ggml-org/llama.cpp#11997. However, when building with CUDA, I'm getting a unsatisfied link error when I try to actually call model.head_kv which wraps the C FFI. If I build without CUDA it links without issue. I think it's because I normally link llama.cpp statically but CUDA forces a dynamic library to be built. I've tried nuking the target directory. The only thing I haven't double-checked is whether I have sccache on & that's somehow screwing things up.
To be clear, it's not a compilation issue & only shows up when model.head_kv is actually called and building llama.cpp as a shared lib as I believe dead code elimination otherwise fixes it.
Has anyone seen anything like this before? I've only tested this on Linux so far so not sure if this shows up on Windows as well.