An LLM-based pipeline to detect toxic speech using language models.
This project uses uv for dependency management.
- Python 3.12 or higher
- uv package manager
-
Install uv (if not already installed):
-
Clone the repository:
git clone https://github.com/debatelab/toxicity-detector.git cd toxicity-detector -
Install dependencies:
uv sync
This will create a virtual environment and install all dependencies specified in
pyproject.toml. -
Install development dependencies (optional):
uv sync --group dev
Create a .env file in the project root with the following variables:
# API Keys (by the names as specified in the model config files)
# Optional: Custom app config file path
TOXICITY_DETECTOR_APP_CONFIG_FILE=./config/app_config.yamlSee the notebooks in the notebooks/ directory for examples on how to run and test the toxicity detection pipeline.
The project includes a Gradio web interface for interactive toxicity detection.
Run the app using uv:
uv run python src/toxicity_detector/app.pyThe app will start and be accessible at http://localhost:7860 by default.
You can also activate the virtual environment and run the app directly:
# Activate the virtual environment
source .venv/bin/activate # On Linux/Mac
# or
.venv\Scripts\activate # On Windows
# Run the app
python src/toxicity_detector/app.py
# or (enables live reloading)
gradio src/toxicity_detector/app.pyTo enable developer mode with additional configuration options, update your config/app_config.yaml:
developer_mode: truetoxicity-detector/
├── config/ # Configuration files
│ ├── app_config.yaml # App configuration
│ └── default_model_config_*.yaml # Model configurations
├── src/
│ └── toxicity_detector/
│ ├── __init__.py
│ ├── app.py # Gradio web interface
│ ├── backend.py # Core detection logic
│ └── chains.py # LangChain pipelines
├── logs/ # Application logs
├── notebooks/ # Jupyter notebooks for testing
├── pyproject.toml # Project dependencies
└── README.md # This file
The project follows PEP 8 guidelines with a maximum line length of 88 characters.
Run linting checks:
uv run flake8 src/Run all tests:
uv run pytestRun tests with verbose output:
uv run pytest -vRun a specific test file:
uv run pytest tests/test_config.pyRun tests with coverage report:
uv run pytest --cov=src/toxicity_detectorAlternative: Using the activated virtual environment:
# Activate the virtual environment first
source .venv/bin/activate # On Linux/Mac
# or
.venv\Scripts\activate # On Windows
# Then run pytest directly
pytest tests/
pytest tests/test_config.py -vTo use Jupyter notebooks for development:
# Install dev dependencies if not already done
uv sync --group dev
# Start Jupyter
uv run jupyter notebook notebooks/See LICENSE file for details.