CVE Harbor is a lightweight yet powerful tool that automatically retrieves, locally stores, processes, and prepares National Vulnerability Database (NVD) data for analysis. It provides a fast, reliable, and scalable foundation for cybersecurity operations, vulnerability management, and SIEM integrations.
cve-harbor/
├── src/
│ └── harbor/
│ ├── __init__.py
│ ├── cli.py
│ ├── core/
│ │ ├── __init__.py
│ │ ├── api_client.py
│ │ ├── data_processor.py
│ │ └── checkpoint_manager.py
│ ├── formatters/
│ │ ├── __init__.py
│ │ └── event_writer.py
│ └── utils/
│ ├── __init__.py
│ └── utils.py
├── config/
│ └── config.json
├── data/
│ ├── events.jsonl
│ └── checkpoint.txt
├── logs/
│ └── harbor.log
├── main.py
├── requirements.txt
├── .gitignore
└── README.md
- Multiple Output Formats: Output in JSONL, JSON, and CSV formats
- Checkpoint System: Prevents reprocessing of the same data
- Filtering: Filter by keyword, CVSS score, date range, etc.
- Flexible Configuration: Easy configuration with JSON file
- Logging: Detailed log records
- Modular Structure: Easily extensible and maintainable
- Web Dashboard: Interactive Streamlit-based web interface
- Data Visualization: Charts and statistics with Plotly
- Real-time Analytics: Live CVE data analysis
- Export Capabilities: Download filtered data as CSV
- Python 3.8+
- pip
- Clone the repository:
git clone https://github.com/mkdemir/cve-harbor.git
cd cve-harbor- Create virtual environment (recommended):
python3 -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows- Install dependencies:
pip install -r requirements.txt- Developer dependencies (optional):
pip install -e ".[dev]"You can configure the project by editing the config/config.json file:
{
"checkpoint_type": "published",
"checkpoint_start_time_or_index": "2024-01-01T00:00:00.000+00:00",
"keyword_search": "Windows",
"keywordexactmatch": false,
"cve_data_to_include": "descriptions,references,metrics",
"description_language": "en",
"misc_parameters": "",
"results_per_page": 100,
"checkpoint_file": "",
"output_file": "",
"output_format": "jsonl",
"log_file": "",
"log_level": "DEBUG"
}| Parameter | Description | Default |
|---|---|---|
checkpoint_type |
Checkpoint type: published, modified, index |
published |
checkpoint_start_time_or_index |
Start time or index | 2024-01-01T00:00:00.000+00:00 |
keyword_search |
Keyword to search for | null |
keywordexactmatch |
Exact word matching | false |
cve_data_to_include |
CVE fields to include | descriptions,references |
description_language |
Description language | en |
misc_parameters |
Additional API parameters | null |
results_per_page |
Results per page | 100 |
output_format |
Output format: jsonl, json, csv |
jsonl |
checkpoint_file |
Checkpoint file path | "" (auto) |
output_file |
Output file path | "" (auto) |
log_file |
Log file path | "" (auto) |
log_level |
Log level | DEBUG |
CVE Harbor supports three types of path configurations for maximum flexibility:
Leave the path fields empty to use automatic absolute paths:
{
"checkpoint_file": "",
"output_file": "",
"log_file": ""
}Default locations:
- Windows:
C:\Users\[username]\cve-harbor-data\ - Linux/Mac:
/home/[username]/cve-harbor-data/
Directory structure:
~/cve-harbor-data/
├── checkpoints/
│ └── checkpoint.txt
├── output/
│ └── events_2025-10-08.jsonl
└── logs/
└── harbor.log
Specify full paths for complete control:
{
"checkpoint_file": "C:/production/cve-data/checkpoints/checkpoint.txt",
"output_file": "D:/exports/cve/events.jsonl",
"log_file": "C:/logs/cve-harbor/harbor.log"
}Linux/Mac example:
{
"checkpoint_file": "/var/lib/cve-harbor/checkpoints/checkpoint.txt",
"output_file": "/data/exports/cve/events.jsonl",
"log_file": "/var/log/cve-harbor/harbor.log"
}Use paths relative to the project directory:
{
"checkpoint_file": "data/checkpoints/checkpoint.txt",
"output_file": "data/output/events.jsonl",
"log_file": "logs/harbor.log"
}Benefits:
- ✅ Works with crontab without worrying about working directory
- ✅ Separate data storage from application code
- ✅ Easy to backup and manage data files
- ✅ Multiple instances can use different data directories
# Run with main script
python main.py
# Run with CLI
python -m harbor.cli
# With custom config file
python -m harbor.cli --config custom_config.json# Run every 6 hours
0 */6 * * * cd /opt/cve-harbor && /usr/bin/python3 main.py >> /var/log/cve-harbor/cron.log 2>&1from harbor import fetch_cve_data, filter_cve_data, write_events
# Fetch data from API
data = fetch_cve_data(url, params)
# Process data
events, timestamp = filter_cve_data(vulnerabilities, fields, languages, checkpoint_type)
# Write to file
write_events(events, "output.jsonl", "jsonl"){
"timestamp": 1704067200.0,
"cve_id": "CVE-2024-1234",
"published": "2024-01-01T00:00:00.000+00:00",
"last_modified": "2024-01-01T00:00:00.000+00:00",
"vuln_status": "Analyzed",
"data": { ... }
}All CVEs in a single JSON array.
Flattened data structure, suitable for Excel and other analysis tools.
-
src/harbor/core/: Main business logicapi_client.py: NVD API communicationdata_processor.py: Data processing and filteringcheckpoint_manager.py: Checkpoint management
-
src/harbor/formatters/: Output formatsevent_writer.py: JSONL, JSON, CSV formats
-
src/harbor/utils/: Utility functionsutils.py: Logging, datetime operations
-
src/harbor/cli.py: Command line interface
# Code formatting
black src/
# Linting
flake8 src/
# Type checking
mypy src/Follow these steps to set up a clean development environment with linting, formatting, and type-checking.
python3 -m venv .venv
source .venv/bin/activate # Linux/Mac
# or on Windows (PowerShell)
.venv\\Scripts\\Activate.ps1This enables clean imports (src/ layout) and the console script cve-harbor.
pip install -e .pip install pre-commit
pre-commit install
# Run hooks on all files once
pre-commit run --all-filesThe repository is configured via pyproject.toml for:
- flake8 (max line length 100, ignores E203/W503)
- black (line length 100)
- isort (black profile)
- pyright (type checking)
# Run with CLI (after editable install)
cve-harbor --config config/config.json
# Format + sort imports
black . && isort .
# Lint and types
flake8 . && pyright
# Run tests (if tests/ present)
pytest -q- Keep changes focused and small when possible.
- Ensure pre-commit hooks pass before pushing.
- Add tests for non-trivial changes.
{
"keyword_search": "Windows",
"misc_parameters": "cvssV4Severity=HIGH",
"output_format": "csv"
}{
"checkpoint_start_time_or_index": "2024-12-25T00:00:00.000+00:00",
"output_format": "json"
}{
"keyword_search": "Apache",
"keywordexactmatch": false,
"cve_data_to_include": "descriptions,metrics,references"
}The project tracks the last state of processed data:
- published: Checkpoint based on publication date
- modified: Checkpoint based on modification date
- index: Checkpoint based on index number
This prevents reprocessing of the same data and allows continuation from where it left off.
The project maintains detailed log records:
- API requests
- Number of processed CVEs
- Checkpoint updates
- Error messages
- API connection errors
- File writing errors
- Invalid configuration
- Network timeout
This project is licensed under the MIT License. See the LICENSE file for details.
- Fork the project
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
