Closed
Description
I'm unsure whether this is a limitation of the OpenAI library or a result of poor server management. However, after extensively testing various models using the latest server image in Docker with CUDA, I've come to a conclusion. It seems impossible to run a model that utilizes a chat template different from ChatML along with OpenAI library. All attempts resulted in failures in responses. This includes the model located at https://huggingface.co/mlabonne/AlphaMonarch-7B-GGUF, which I requested some time ago. I apologize if this isn't considered a bug, but I'm at a loss for what to do next. Thank you in advance.