MLX-native cross-encoder reranker for Apple Silicon. Uses mlx-lm to run Qwen3-Reranker with true cross-encoder scoring (yes/no logit comparison), served via FastAPI on port 8001.
./start_reranker.sh
# or with a different model:
./start_reranker.sh mlx-community/Qwen3-Reranker-0.6B-mxfp8Test it:
curl http://127.0.0.1:8001/v1/rerank \
-H "Content-Type: application/json" \
-d '{
"query": "fraud audit",
"documents": [
"The contractor shall perform annual financial audits for fraud detection",
"Chocolate chip cookies were invented in 1938",
"Federal agencies must conduct fraud risk assessments per OMB Circular A-123"
]
}'Qwen3-Reranker is a generative model that acts as a cross-encoder. For each query-document pair:
- Formats a chat prompt asking "is this document relevant to the query?"
- Runs a single forward pass through the model
- Compares the logits for "yes" vs "no" tokens at the output position
- Returns
sigmoid(yes_logit - no_logit)as the relevance score
This produces genuine semantic relevance scores with clear separation between relevant and irrelevant documents (unlike embedding cosine similarity which clusters tightly).
| Variable | Default | Description |
|---|---|---|
RERANKER_MODEL_ID |
mlx-community/Qwen3-Reranker-4B-mxfp8 |
Cross-encoder reranker model |
HOST |
127.0.0.1 |
Server bind address |
PORT |
8001 |
Server port |
See launchd/com.agrande.mlx-rerank.plist. Copy to ~/Library/LaunchAgents/ and customize the paths marked with CUSTOMIZE comments.