Bias detection and debiasing in text: identify biased segments, classify severity, get reasoning and neutral replacements per segment, and a full neutral rewrite. Structured output (binary label, severity, biased segments with offsets) via CLI, REST API, or Python.
Input text → analysis → validated BiasResult: binary label (biased/unbiased), overall severity (1–5), biased_segments (original phrase, replacement, severity, bias type, reasoning, character offsets), and full unbiased_text. Entry points: CLI (unbias-plus), REST API (FastAPI + demo UI), or Python (UnBiasPlus).
Project structure:
unbias-plus/
├── src/unbias_plus/
│ ├── __init__.py # UnBiasPlus, BiasResult, BiasedSegment, serve
│ ├── cli.py # unbias-plus entry point (--text, --file, --serve)
│ ├── api.py # FastAPI app, /health, /analyze, serve()
│ ├── pipeline.py # UnBiasPlus: prompt → model → parse → result
│ ├── model.py # UnBiasModel: load LM, generate(), 4-bit optional
│ ├── prompt.py # build_prompt(text), system prompt
│ ├── parser.py # parse_llm_output() → BiasResult
│ ├── schema.py # BiasResult, BiasedSegment (Pydantic)
│ ├── formatter.py # format_cli, format_dict, format_json
│ └── demo/ # bundled web UI (served at / when using --serve)
│ ├── static/ # script.js, style.css
│ └── templates/ # index.html
├── tests/
│ ├── conftest.py # fixtures (sample_result, sample_json, …)
│ └── unbias_plus/ # test_api, test_pipeline, test_parser, …
├── pyproject.toml
└── README.md
- Bias detection: Identifies biased phrases in text and returns them as segments with character-level offsets for highlighting.
- Classification: Binary label (biased/unbiased), per-segment severity (low/medium/high), and bias type (e.g. loaded language, framing).
- Reasoning: Each segment includes an explanation of why it is considered biased.
- Debiasing: Per-segment neutral replacements and a full rewritten
unbiased_text. - Structured output: Pydantic-validated
BiasResultwithbinary_label,severity(1–5),biased_segments, andunbiased_text. - Demo UI:
--servelaunches a FastAPI server that also serves a visual web interface athttp://localhost:8000. - CLI: Analyze from command line with
--text,--file, or start the API + UI with--serve. Optional 4-bit quantization and JSON output. - REST API: FastAPI server with
/healthand/analyze(POST JSON{"text": "..."}). Model loaded at startup via lifespan. - Python API: Use
UnBiasPlusin code; callanalyze(),analyze_to_cli(),analyze_to_dict(), oranalyze_to_json().
- Python ≥3.10, <3.12
- CUDA 12.4 recommended (PyTorch + CUDA deps in
pyproject.toml). CPU is supported withdevice="cpu".
The project uses uv for dependency management. Install uv, then from the project root:
uv sync
source .venv/bin/activate # or .venv\Scripts\activate on WindowsFor development (tests, linting, type checking):
uv sync --dev
source .venv/bin/activateOptional: flash-attn (GPU only)
For training or faster inference with flash attention, install the train extra (requires CUDA/nvcc to build):
uv sync --extra train
# On HPC: load CUDA first, e.g. module load cuda/12.4.0Default uv sync does not install flash-attn, so CI and CPU-only setups work without it.
# Analyze a string
unbias-plus --text "Women are too emotional to lead."
# Analyze a file, output JSON
unbias-plus --file article.txt --json
# Start API server + demo UI (default model, port 8000)
unbias-plus --serve
unbias-plus --serve --model path/to/model --port 8000
unbias-plus --serve --load-in-4bit # reduce VRAMOptions: --model, --load-in-4bit, --max-new-tokens, --host, --port, --json.
After uv sync (and optionally uv sync --extra train on a GPU machine), verify the pipeline with:
# Default install (no flash-attn); use a small model or --load-in-4bit on GPU
uv run unbias-plus --text "Women are too emotional to lead."
# With your own model path
uv run unbias-plus --text "Some biased sentence." --model path/to/your/model
# JSON output
uv run unbias-plus --text "Test." --jsonOr in Python (same env):
uv run python -c "
from unbias_plus import UnBiasPlus
pipe = UnBiasPlus() # or UnBiasPlus('your-model-id', load_in_4bit=True)
text = 'Women are too emotional to lead.'
print(pipe.analyze_to_cli(text))
"Start the server with unbias-plus --serve (or serve() in Python). This starts a single FastAPI server that:
- Serves the visual demo UI at
http://localhost:8000/ - Exposes
GET /health→{"status": "ok", "model": "<model_name_or_path>"} - Exposes
POST /analyze→ Body:{"text": "Your text here"}. Returns JSON matchingBiasResult.
Programmatic start:
from unbias_plus import serve
serve("your-hf-model-id", port=8000, load_in_4bit=False)Running on a remote server or HPC node: If the server is running on a remote machine, use SSH port forwarding to access the UI in your browser:
ssh -L 8000:localhost:8000 user@your-server.com # or through a login node to a compute node: ssh -L 8000:gpu-node-hostname:8000 user@login-node.comThen open
http://localhost:8000. If port 8000 is already in use locally, use a different local port (e.g.-L 8001:...) and openhttp://localhost:8001.If you're using VS Code remote SSH, port forwarding is handled automatically via the Ports tab.
from unbias_plus import UnBiasPlus, BiasResult, BiasedSegment
pipe = UnBiasPlus("your-hf-model-id", load_in_4bit=False)
result = pipe.analyze("Women are too emotional to lead.")
print(result.binary_label) # "biased" | "unbiased"
print(result.severity) # 1–5
print(result.bias_found) # bool
for seg in result.biased_segments:
print(seg.original, seg.replacement, seg.severity, seg.bias_type, seg.reasoning)
print(seg.start, seg.end) # character offsets in original text
print(result.unbiased_text) # full neutral rewrite
# Formatted outputs
cli_str = pipe.analyze_to_cli("...") # human-readable colored terminal output
d = pipe.analyze_to_dict("...") # plain dict
json_str = pipe.analyze_to_json("...") # pretty-printed JSON string- Tests:
pytest(seepyproject.tomlfor markers). Run from repo root:uv run pytest tests/. - Linting / formatting:
ruff(format + lint), config inpyproject.toml. - Type checking:
mypywith strict options,mypy_path = "src".
Developed by the AI Engineering team at the Vector Institute. Ahmed Y. Radwan , Sindhuja Chaduvula, Shaina Raza, PhD
Resources used in preparing this research are provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.
This research is also supported by the European Union's Horizon Europe research and innovation programme under the AIXPERT project (Grant Agreement No. 101214389).
Licensed under the Apache License 2.0. See LICENSE in the repository.
- Open an issue on GitHub: https://github.com/VectorInstitute/unbias-plus/issues