Misc. bug: Server tests /health race conditions

### Name and Version

```
./bin/llama-cli --version
register_backend: registered backend Metal (1 devices)
register_device: registered device Metal (Apple M3 Max)
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M3 Max)
version: 5615 (f470bc36)
built with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.5.0
```

### Operating systems

Mac

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
cd tools/server/tests
./tests.sh
```

### Problem description & steps to reproduce

## Description

When running the server tests locally on my M3 Max 64GB, any that make a request after `server.start()` fail unless I inject a manual sleep after `server.start()` to make it wait a bit longer. It seems that despite the retry loop on `/health` in `server.start()` ([here](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/tests/utils.py#L228)), the server does not have the model fully loaded and ready to query when `/health` returns 200 successfully. This seems to be similar to the `liveness` vs `readiness` probe problem for kubernetes pods where `/health` is currently indicating that the server is "live" but not "ready," whereas the tests are assuming it means ready.

## Related Discussion

* This came up due to failing tests on my Hybrid Recurrent Cache branch: https://github.com/ggml-org/llama.cpp/pull/13979#issuecomment-2956375627
* I was able to make this "work" locally by inserting retry logic into all `server.make_request` calls on https://github.com/gabe-l-hart/llama.cpp/commit/39a93b3915c678a9047a3a1b85cae66e8437b8ec, but this fix is likely just masking the issue.

## Python env

```sh
python --version
Python 3.11.8
```

<details>
<summary>pip freeze</summary>

```
aiohttp==3.9.5
aiosignal==1.3.2
annotated-types==0.7.0
anyio==4.9.0
attrs==25.3.0
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.8
contourpy==1.3.2
cycler==0.12.1
distro==1.9.0
docstring_parser==0.16
einops==0.7.0
filelock==3.13.3
flake8==7.2.0
fonttools==4.58.2
frozenlist==1.6.2
fsspec==2024.3.1
gguf==0.17.0
gitdb==4.0.12
GitPython==3.1.44
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
huggingface-hub==0.23.5
idna==3.6
iniconfig==2.1.0
Jinja2==3.1.3
jiter==0.10.0
joblib==1.4.2
kiwisolver==1.4.8
# Editable install with no version control (llama-cpp-scripts==0.0.0)
-e /Users/ghart/Projects/github/ggerganov/llama.cpp
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.10.3
mccabe==0.7.0
mdurl==0.1.2
mpmath==1.3.0
multidict==6.4.4
networkx==3.3
numpy==1.26.4
openai==1.55.3
packaging==24.0
pandas==2.2.3
pillow==10.2.0
pluggy==1.6.0
prometheus_client==0.20.0
propcache==0.3.1
protobuf==4.25.3
pycodestyle==2.13.0
pycparser==2.22
pydantic==2.6.4
pydantic_core==2.16.3
pyflakes==3.3.2
Pygments==2.19.1
pyparsing==3.2.3
PySide6==6.9.1
PySide6_Addons==6.9.1
PySide6_Essentials==6.9.1
pytest==8.3.5
python-dateutil==2.9.0.post0
pytz==2025.2
PyYAML==6.0.1
regex==2023.12.25
requests==2.32.3
rich==14.0.0
safetensors==0.5.3
scikit-learn==1.6.0
scipy==1.14.1
seaborn==0.13.2
sentence-transformers==3.3.1
sentencepiece==0.2.0
shellingham==1.5.4
shiboken6==6.9.1
six==1.17.0
smmap==5.0.2
sniffio==1.3.1
sympy==1.12
tabulate==0.9.0
threadpoolctl==3.5.0
tokenizers==0.20.3
torch==2.2.2
torchvision==0.17.2
tqdm==4.66.2
transformers==4.46.3
typer==0.15.4
typing_extensions==4.11.0
tzdata==2025.2
urllib3==2.2.1
wget==3.2
yarl==1.20.0
```
</details>

### First Bad Commit

_No response_

### Relevant log output

```shell
./tests.sh 
========================================================= test session starts ==========================================================
platform darwin -- Python 3.11.8, pytest-8.3.5, pluggy-1.6.0 -- /Users/ghart/mambaforge/envs/llama.cpp/bin/python3.11
cachedir: .pytest_cache
rootdir: /Users/ghart/Projects/github/ggml-org/llama.cpp/tools/server/tests
configfile: pytest.ini
plugins: anyio-4.9.0
collected 427 items / 229 deselected / 198 selected                                                                                    

unit/test_basic.py::test_server_start_simple PASSED                                                                              [  0%]
unit/test_basic.py::test_server_props FAILED                                                                                     [  1%]

=============================================================== FAILURES ===============================================================
__________________________________________________________ test_server_props ___________________________________________________________

self = <Response [200]>, kwargs = {}

    def json(self, **kwargs):
        r"""Returns the json-encoded content of a response, if any.
    
        :param \*\*kwargs: Optional arguments that ``json.loads`` takes.
        :raises requests.exceptions.JSONDecodeError: If the response body does not
            contain valid json.
        """
    
        if not self.encoding and self.content and len(self.content) > 3:
            # No encoding set. JSON RFC 4627 section 3 states we should expect
            # UTF-8, -16 or -32. Detect which one to use; If the detection or
            # decoding fails, fall back to `self.text` (using charset_normalizer to make
            # a best guess).
            encoding = guess_json_utf(self.content)
            if encoding is not None:
                try:
                    return complexjson.loads(self.content.decode(encoding), **kwargs)
                except UnicodeDecodeError:
                    # Wrong UTF codec detected; usually because it's not UTF-8
                    # but some other 8-bit codec.  This is an RFC violation,
                    # and the server didn't bother to tell us what codec *was*
                    # used.
                    pass
                except JSONDecodeError as e:
                    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
    
        try:
>           return complexjson.loads(self.text, **kwargs)

/Users/ghart/mambaforge/envs/llama.cpp/lib/python3.11/site-packages/requests/models.py:974: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/Users/ghart/mambaforge/envs/llama.cpp/lib/python3.11/json/__init__.py:346: in loads
    return _default_decoder.decode(s)
/Users/ghart/mambaforge/envs/llama.cpp/lib/python3.11/json/decoder.py:337: in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <json.decoder.JSONDecoder object at 0x103240dd0>
s = '<!doctype html>\n<html lang="en">\n\t<head>\n\t\t<meta charset="utf-8" />\n\t\t<link rel="icon" type="image/png" href...\t}\n\t}\n\n\t.animate-pulse-fast {\n\t\tanimation: pulse 1.5s cubic-bezier(0.4, 0, 0.6, 1) infinite;\n\t}\n</style>\n'
idx = 0

    def raw_decode(self, s, idx=0):
        """Decode a JSON document from ``s`` (a ``str`` beginning with
        a JSON document) and return a 2-tuple of the Python
        representation and the index in ``s`` where the document ended.
    
        This can be used to decode a JSON document from a string that may
        have extraneous data at the end.
    
        """
        try:
            obj, end = self.scan_once(s, idx)
        except StopIteration as err:
>           raise JSONDecodeError("Expecting value", s, err.value) from None
E           json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

/Users/ghart/mambaforge/envs/llama.cpp/lib/python3.11/json/decoder.py:355: JSONDecodeError

During handling of the above exception, another exception occurred:

    def test_server_props():
        global server
        server.start()
>       res = server.make_request("GET", "/props")

unit/test_basic.py:24: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
utils.py:275: in make_request
    result.body = response.json() if parse_body else None
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Response [200]>, kwargs = {}

    def json(self, **kwargs):
        r"""Returns the json-encoded content of a response, if any.
    
        :param \*\*kwargs: Optional arguments that ``json.loads`` takes.
        :raises requests.exceptions.JSONDecodeError: If the response body does not
            contain valid json.
        """
    
        if not self.encoding and self.content and len(self.content) > 3:
            # No encoding set. JSON RFC 4627 section 3 states we should expect
            # UTF-8, -16 or -32. Detect which one to use; If the detection or
            # decoding fails, fall back to `self.text` (using charset_normalizer to make
            # a best guess).
            encoding = guess_json_utf(self.content)
            if encoding is not None:
                try:
                    return complexjson.loads(self.content.decode(encoding), **kwargs)
                except UnicodeDecodeError:
                    # Wrong UTF codec detected; usually because it's not UTF-8
                    # but some other 8-bit codec.  This is an RFC violation,
                    # and the server didn't bother to tell us what codec *was*
                    # used.
                    pass
                except JSONDecodeError as e:
                    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
    
        try:
            return complexjson.loads(self.text, **kwargs)
        except JSONDecodeError as e:
            # Catch JSON-related errors and raise as requests.JSONDecodeError
            # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
>           raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
E           requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

/Users/ghart/mambaforge/envs/llama.cpp/lib/python3.11/site-packages/requests/models.py:978: JSONDecodeError
--------------------------------------------------------- Captured stdout call ---------------------------------------------------------
tests: starting server with: ../../../build/bin/llama-server --host 127.0.0.1 --port 8080 --temp 0.8 --seed 42 --hf-repo ggml-org/models --hf-file tinyllamas/stories260K.gguf --batch-size 32 --alias tinyllama-2 --ctx-size 512 --parallel 2 --n-predict 64
server pid=66945, pytest pid=66943
Response from server {
  "status": true
}
------------------------------------------------------- Captured stdout teardown -------------------------------------------------------
Stopping server with pid=66945
======================================================= short test summary info ========================================================
FAILED unit/test_basic.py::test_server_props - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================= 1 failed, 1 passed, 229 deselected in 0.87s ==============================================
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Server tests /health race conditions #14092

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Description

Related Discussion

Python env

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Server tests /health race conditions #14092

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Description

Related Discussion

Python env

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions