From: - https://github.com/ggerganov/llama.cpp/issues/4216#issuecomment-1991730224 1. cleaning up the clip/llava libs and improving the API 2. in the old implementation, there were many internal object exposed to the server and the memory management was dubious 3. there was no obvious path for supporting parallel multimodal slots