Skip to content

aperepel/mlx-rerank

Repository files navigation

mlx-rerank

MLX-native cross-encoder reranker for Apple Silicon. Uses mlx-lm to run Qwen3-Reranker with true cross-encoder scoring (yes/no logit comparison), served via FastAPI on port 8001.

Quick Start

./start_reranker.sh
# or with a different model:
./start_reranker.sh mlx-community/Qwen3-Reranker-0.6B-mxfp8

Test it:

curl http://127.0.0.1:8001/v1/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "query": "fraud audit",
    "documents": [
      "The contractor shall perform annual financial audits for fraud detection",
      "Chocolate chip cookies were invented in 1938",
      "Federal agencies must conduct fraud risk assessments per OMB Circular A-123"
    ]
  }'

How It Works

Qwen3-Reranker is a generative model that acts as a cross-encoder. For each query-document pair:

  1. Formats a chat prompt asking "is this document relevant to the query?"
  2. Runs a single forward pass through the model
  3. Compares the logits for "yes" vs "no" tokens at the output position
  4. Returns sigmoid(yes_logit - no_logit) as the relevance score

This produces genuine semantic relevance scores with clear separation between relevant and irrelevant documents (unlike embedding cosine similarity which clusters tightly).

Environment Variables

Variable Default Description
RERANKER_MODEL_ID mlx-community/Qwen3-Reranker-4B-mxfp8 Cross-encoder reranker model
HOST 127.0.0.1 Server bind address
PORT 8001 Server port

Auto-Start on Login (launchd)

See launchd/com.agrande.mlx-rerank.plist. Copy to ~/Library/LaunchAgents/ and customize the paths marked with CUSTOMIZE comments.

About

MLX-native reranker service for Apple Silicon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published