Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 78 additions & 49 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,49 @@
# GitHub Copilot Instructions for msticpy

## Project Overview

**MSTICPy** is a Python library for InfoSec investigation and threat hunting.
It provides:

- Data query providers for Microsoft Sentinel, Azure Monitor, Kusto, Splunk, and more
- Threat intelligence lookups (VirusTotal, OTX, etc.)
- Data enrichment (GeoIP, WhoIs, etc.)
- Security-focused data analysis and visualization tools
- Jupyter notebook integration for interactive investigations

## Package Structure
- **Package name**: `msticpy` (in `msticpy`)

- **Package name**: `msticpy` (in `msticpy/`)
- **Import pattern**: `import msticpy as mp`
- **Source**: `msticpy` contains implementation modules
- **Source**: `msticpy/` contains implementation modules
- **Tests**: `tests/` with pytest markers: `unit`, `integration`, `slow`
- **Tools**: `tools/` supplementary tools not core to the package
- **Documentation**: `docs` - sphinx source files and notebooks
- **Documentation**: `docs/` - Sphinx source files and notebooks

### Key Subpackages

- `msticpy/data/` - Data providers and query execution
- `msticpy/context/` - Threat intelligence and enrichment providers
- `msticpy/auth/` - Authentication helpers (Azure, etc.)
- `msticpy/vis/` - Visualization components
- `msticpy/transform/` - Data transformation utilities
- `msticpy/init/` - Initialization and pandas accessors

## Code Conventions

### Python Standards (Enforced by Ruff)

- **Line length**: 93 characters
- **Type hints**: Required (enforced by mypy, annotations checked)
- Always use built-in types like `list`, `dict`, for type annotations and avoid use
types from `typing`.
- E.g. use `list[str]` instead of `List[str]`, `str | None` instead of
`Optional[str]`.
- **Docstrings**: Required for public functions (D-series rules) - use numpy style.
- Document parameters, return type and exceptions raised for public
functions/methods.
- Always use built-in types like `list`, `dict` for type annotations
- E.g. use `list[str]` instead of `List[str]`, `str | None` instead of `Optional[str]`
- **Docstrings**: Required for public functions (D-series rules) - use numpy style
- Document parameters, return type and exceptions raised
- **Single-line**: Keep on same line as triple quotes:
`"""Return the user name."""`
- **Multi-line**: Summary starts on new line after opening quotes, blank line before
Parameters/Returns sections, blank line before closing quotes:

```python
def example(name: str) -> str:
"""
Expand All @@ -41,82 +61,84 @@

"""
```

- **Imports and Formatting**: Sorted/grouped automatically (isort)

### General Coding Style
- Avoid using Python built-in `open` function for file operations. Use
`pathlib.Path` methods instead. Prefer `Path.*` methods over legacy `os.*` methods.
- **Logging**: Create a logger per module `logger = logging.getLogger(__name__)`.
- When adding logging calls, use `%s`, `%d` style variable substitution rather than
f-strings.

- Avoid using Python built-in `open` function for file operations. Use `pathlib.Path`
methods instead. Prefer `Path.*` methods over legacy `os.*` methods.
- **Logging**: Create a logger per module: `logger = logging.getLogger(__name__)`
- Use `%s`, `%d` style variable substitution rather than f-strings in log calls
- Never use inline import statements. Always place imports at the top of the file
(there are some exceptional cases where conditional imports are used but, these
should also be at the top of the file, before the main code).
- When generating code, be careful with indentation - always replace lines using the
same indentation unless introducing branches, etc.
- Try to avoid a line length of over 90 characters - this applies to code,
docstrings, comments and suppressions.
(conditional imports should also be at the top, before main code).
- Be careful with indentation - always replace lines using the same indentation
unless introducing branches, etc.
- Try to avoid a line length of over 90 characters - applies to code, docstrings,
comments and suppressions.
- Prefer use of pydantic classes over dataclasses or attrs classes.
- Prefer use of pydantic classes over complex dictionaries for structured data.

## Documentation

- If you are adding/changing/removing any public API you must update the Sphinx API
documentation using the following command to re-generate the RST files:
### Sphinx API Documentation

If adding/changing/removing any public API, update the Sphinx API documentation:
```bash
cd docs
del /Q source\api\*
sphinx-apidoc --o source/api --force --module-first --separate ../msticpy
del source\api\modules.rst
```
- Then add any changed files to the commit. The docs build process will generate
the necessary HTML files and issue any errors/warnings when the CI build is run.
Add any changed files to the commit. The docs build will generate HTML and
report errors/warnings during CI.

## Testing

### Test Creation
- Always use pytest and generate pytest-style test functions.
- If you need to mock httpx requests, use the respx library.
- Test file modules should mirror the name/path of the tested module, e.g.

- Always use pytest and generate pytest-style test functions
- Mock httpx requests using the `respx` library
- Test file paths should mirror the source module:
`msticpy/path/module.py` → `tests/path/test_module.py`
- Always add at least a single-line docstring to fixtures and test functions.
If the context of the parameters is not obvious, explain them in the docstring.
- Unit test coverage should be >= 85% on new code.
- Test "secrets" - if you need to mock a secret value (password, key, etc.), always
use the value `"[PLACEHOLDER]"` as the value of the secret.
- Always add at least a single-line docstring to fixtures and test functions
- Unit test coverage should be >= 85% on new code
- For mock secrets (passwords, keys, etc.), always use `"[PLACEHOLDER]"` as the value

### Running Tests

```bash
pytest # All tests
pytest --cov=msticpy --cov-report=html
pytest # All tests
pytest --cov=msticpy --cov-report=html # With coverage
```

## Code Quality Tools

**Always** run pre-commit before creating a PR.
**Always** run mypy before creating a PR.

### Pre-commit Hooks
This project uses pre-commit for automated code quality checks. Install and enable:

```bash
pip install pre-commit
pre-commit install
```

Run manually on all files:
```bash
pre-commit run --all-files
pre-commit run --all-files # Run manually
```

### Running Linters Manually

```bash
ruff check msticpy --fix # Lint and auto-fix
ruff format msticpy # Format code
mypy msticpy # Type checking
ruff check msticpy --fix # Lint and auto-fix
ruff format msticpy # Format code
mypy msticpy # Type checking
```

**Important**: When running mypy, always run it to get the full output. It is slow, so
avoid preliminary runs to find error counts - run it once completely.
**Important**: When running mypy, run it completely. It is slow - avoid preliminary
runs to find error counts.

### After Generating New Python Code

### When Generating New Python Code
**ALWAYS run pre-commit or equivalent checks after generating new Python code:**
**ALWAYS run pre-commit or equivalent checks:**
```bash
pre-commit run --all-files
# Or manually:
Expand All @@ -126,5 +148,12 @@ ruff check msticpy --fix && ruff format msticpy && mypy msticpy
Fix any errors before committing. Do not leave Ruff or mypy errors in generated code.

## Commit Guidelines

- Write clear, descriptive commit messages
- Always run pre-commit hooks (or linters manually) before committing
- Always run pre-commit hooks before committing

## Key Files

- **`pyproject.toml`**: Package config, dependencies, tool settings (ruff, mypy, pytest)
- **`msticpyconfig.yaml`**: User configuration for data providers, TI providers, etc.
- **`docs/`**: Sphinx documentation source and example notebooks