Thanks for your interest in contributing to ctrlmap. This guide covers the development setup, coding standards, and contribution workflow.
- uv — fast Python package manager
- Python 3.11+ — required by the project
- Ollama — local LLM runtime (installed by
make setup)
- Clone the repository
git clone https://github.com/JoshDoesIT/ctrlmap.git
cd ctrlmap- Run the setup script (installs Python deps, Ollama, and
qwen2.5:14bmodel)
make setupThis single command installs everything needed to develop and test ctrlmap.
- Verify the setup
make testThis project strictly follows the Red-Green-Refactor cycle. Every feature and bug fix must begin with a failing test.
- RED: Write a test that describes the expected behavior. Run it and confirm it fails.
- GREEN: Write the minimal code needed to make the test pass.
- REFACTOR: Clean up the code while keeping all tests green.
No production code is merged without a corresponding test that was written first.
- Linting: Ruff handles both linting and formatting
- Type checking: mypy in strict mode
- Pre-commit hooks: Configured via
.pre-commit-config.yaml
Run all checks locally before pushing:
make lint # Ruff linter
make format # Ruff formatter
make typecheck # mypy strict type checking
make check # Lint + typecheck + format check
make test # Unit + integration tests
make test-eval # Evaluation tests (requires Ollama)
make test-all # All tests including eval
make docs # Build API documentationThis project uses Conventional Commits. Every commit message must follow this format:
<type>(<scope>): <description>
[optional body]
[optional footer(s)]
Common types:
| Type | Purpose |
|---|---|
feat |
New feature |
fix |
Bug fix |
test |
Adding or updating tests |
docs |
Documentation changes |
refactor |
Code restructuring (no behavior change) |
chore |
Tooling, CI, dependency updates |
- Create a feature branch from
main(e.g.,feat/your-feature-name) - Make your changes following TDD
- Ensure all checks pass (
pytest,ruff,mypy) - Open a PR using the provided template
- Reference the relevant issue (e.g.,
Fixes #XX)
ctrlmap/
├── src/ctrlmap/ # Application source code
│ ├── cli.py # Typer command routing
│ ├── _defaults.py # Centralized model defaults
│ ├── _console.py # Shared Rich console instances
│ ├── parse/ # PDF ingestion and chunking
│ ├── index/ # Embedding and vector storage
│ ├── map/ # Mapping, harmonization, and clustering
│ ├── llm/ # Ollama client and structured outputs
│ │ └── prompts/ # LLM prompt templates (.txt files)
│ ├── export/ # CSV, Markdown, HTML, OSCAL formatters
│ │ └── templates/ # HTML report CSS/JS assets
│ └── models/ # Pydantic schemas and OSCAL parsing
├── tests/
│ ├── unit/ # Fast, isolated unit tests
│ ├── integration/ # End-to-end integration tests
│ ├── evaluation/ # Non-deterministic eval tests (requires Ollama)
│ └── fixtures/ # Shared test data (eval sets, golden datasets)
└── pyproject.toml
All LLM prompt templates live in src/ctrlmap/llm/prompts/ as plain .txt files with {placeholder} format strings. The prompt loader caches templates at process level via load_prompt().
| Template | Purpose |
|---|---|
compliance_rationale.txt |
Generate structured compliance rationale for a control–chunk pair |
gap_rationale.txt |
Explain why an unmapped control is non-compliant |
meta_classification.txt |
Classify whether a control is a meta-requirement |
relevance_check.txt |
Verify if a chunk directly addresses a control |
control_extraction.txt |
Extract controls from raw PDF text |
- Edit the
.txtfile insrc/ctrlmap/llm/prompts/ - Run the focused eval suite to check regression:
make test-eval # All eval suites
uv run pytest tests/evaluation/test_relevance_accuracy.py -v # Single suite- Use the eval runner for quick A/B comparisons:
uv run python tests/evaluation/eval_runner.pyThe evaluation suite tests LLM accuracy against expert-labeled ground truth in tests/fixtures/. These tests require a running Ollama instance.
# Start Ollama (if not already running)
ollama serve
# Run all eval tests
make test-eval
# Run a specific eval suite
uv run pytest tests/evaluation/test_compliance_accuracy.py -v
uv run pytest tests/evaluation/test_relevance_accuracy.py -v
uv run pytest tests/evaluation/test_meta_classification.py -v
uv run pytest tests/evaluation/test_faithfulness.py -v
uv run pytest tests/evaluation/test_retrieval_precision.py -v
uv run pytest tests/evaluation/test_end_to_end_scenario.py -v
# Run with model comparison
uv run python tests/evaluation/model_compare.pyEval test fixtures are in tests/fixtures/ as JSON. Each entry has:
id: Unique identifier for the test casecontrol_text: The security control being evaluatedchunk_text: The policy text excerpt (for relevance/compliance evals)requirement_family: The control's requirement family (e.g., "Access Control")expected_relevant/expected_compliant/expected_is_meta: The ground-truth labelrationale: Explanation of why the expected label is correct
By contributing, you agree that your contributions will be licensed under the MIT License.