Description
Hi.. I've spent an entire day on this, so I figured I'd come mention it here.
If I do the following request of dllama-api with curl, it works fine:
$ curl -X POST http://clusterllm1.local:9990/v1/chat/completions -H "Content-Type: application/json" -d '{
"messages": [{"role": "user", "content": "What is 4 * 11?\n"}],
"temperature": 0.7,
"stop": ["<|eot_id|>"],
"max_tokens": 128
}'
{"choices":[{"finish_reason":"","index":-845970800,"message":{"content":"The answer is 44.","role":"assistant"}}],"created":1738050222,"id":"cmpl-j0","model":"Distributed Model","object":"chat.completion","usage":{"completion_tokens":5,"prompt_tokens":44,"total_tokens":49}}%
$
If, however, I run this paired-down isolated example Java code to do the request, dllama-api.cpp doesn't see the request body at all, just the headers:
import java.io.*;
import java.lang.*;
import java.net.*;
import java.text.*;
import java.util.*;
import java.util.stream.*;
public class Connect {
public static void main(String[] args) throws Exception {
String host = "clusterllm1.local";
int port = 9990;
URL endpoint = null;
try {
endpoint = new URL("http://" + host + ":" + port + "/v1/chat/completions");
} catch (MalformedURLException mfe) {
throw new RuntimeException(mfe);
}
System.out.println("About to open the connection...");
HttpURLConnection connection = (HttpURLConnection) endpoint.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Accept", "application/json");
connection.setRequestProperty("Content-Type", "application/json; utf-8");
connection.setDoOutput(true);
connection.setDoInput(true);
String requestBody = "{ \"messages\": [{\"role\": \"user\", \"content\": \"What is 4 * 11?\"}], \"temperature\": 0.7, \"stop\": [\"<|eot_id|>\"], \"max_tokens\": 128 }";
System.out.println("DEBUG: request body is:\n=====\n" + requestBody + "\n=====\n");
byte[] requestBodyBytes = requestBody.getBytes("UTF-8");
OutputStream os = connection.getOutputStream();
os.write(requestBodyBytes);
os.flush();
os.close();
int responseCode = connection.getResponseCode();
System.out.println("DEBUG: responseCode = " + responseCode);
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder sb = new StringBuilder();
String line;
while ((line=reader.readLine()) != null) {
sb.append(line);
}
System.out.println("DEBUG: responseBody: " + sb.toString());
connection.disconnect();
}
}
dllama-api.cpp reads all of the headers fine, it just doesn't get any request body. I've put debugging statements in dllama-api.cpp's HttpRequest.read() method and it's readHttpRequest() method.. all that's ever read is the headers, no body.
To my eye those should be the same requests. If I bind to a port with netcat (nc) and connect to that with the same code, I see the body:
$ nc -l -p 9990
POST /v1/chat/completions HTTP/1.1
Accept: application/json
Content-Type: application/json; utf-8
User-Agent: Java/17.0.2
Host: clusterllm1.local:9990
Connection: keep-alive
Content-Length: 127
{ "messages": [{"role": "user", "content": "What is 4 * 11?"}], "temperature": 0.7, "stop": ["<|eot_id|>"], "max_tokens": 128 }
dllama-api.cpp clearly works in some cases, such as that curl request. And I'd love to find out that I've screwed up something in the above code (or the hundred variants of it that I've tried).. but I fear that dllama-api.cpp's http request handling might be a bit fragile, and it would be good to have it as robust as possible.
I'd normally never mention this in an issue; instead I'd usually solve it myself and submit a pull request, but I'm stumped. I wanted to make sure to record it here so the ball doesn't get dropped and we miss a chance to see an existing bug in dllama-api.cpp.