Skip to content

CVE Harbor is a lightweight yet powerful tool that automatically retrieves, locally stores, processes, and prepares National Vulnerability Database (NVD) data for analysis. It provides a fast, reliable, and scalable foundation for cybersecurity operations, vulnerability management, and SIEM integrations.

License

Notifications You must be signed in to change notification settings

mkdemir/cve-harbor

Repository files navigation

CVE Harbor

CVE Harbor is a lightweight yet powerful tool that automatically retrieves, locally stores, processes, and prepares National Vulnerability Database (NVD) data for analysis. It provides a fast, reliable, and scalable foundation for cybersecurity operations, vulnerability management, and SIEM integrations.

📁 Project Structure

cve-harbor/
├── src/
│   └── harbor/
│       ├── __init__.py
│       ├── cli.py
│       ├── core/
│       │   ├── __init__.py
│       │   ├── api_client.py
│       │   ├── data_processor.py
│       │   └── checkpoint_manager.py
│       ├── formatters/
│       │   ├── __init__.py
│       │   └── event_writer.py
│       └── utils/
│           ├── __init__.py
│           └── utils.py
├── config/
│   └── config.json
├── data/
│   ├── events.jsonl
│   └── checkpoint.txt
├── logs/
│   └── harbor.log
├── main.py
├── requirements.txt
├── .gitignore
└── README.md

🚀 Features

  • Multiple Output Formats: Output in JSONL, JSON, and CSV formats
  • Checkpoint System: Prevents reprocessing of the same data
  • Filtering: Filter by keyword, CVSS score, date range, etc.
  • Flexible Configuration: Easy configuration with JSON file
  • Logging: Detailed log records
  • Modular Structure: Easily extensible and maintainable
  • Web Dashboard: Interactive Streamlit-based web interface
  • Data Visualization: Charts and statistics with Plotly
  • Real-time Analytics: Live CVE data analysis
  • Export Capabilities: Download filtered data as CSV

📦 Installation

Requirements

  • Python 3.8+
  • pip

Installation Steps

  1. Clone the repository:
git clone https://github.com/mkdemir/cve-harbor.git
cd cve-harbor
  1. Create virtual environment (recommended):
python3 -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate     # Windows
  1. Install dependencies:
pip install -r requirements.txt
  1. Developer dependencies (optional):
pip install -e ".[dev]"

⚙️ Configuration

You can configure the project by editing the config/config.json file:

{
    "checkpoint_type": "published",
    "checkpoint_start_time_or_index": "2024-01-01T00:00:00.000+00:00",
    "keyword_search": "Windows",
    "keywordexactmatch": false,
    "cve_data_to_include": "descriptions,references,metrics",
    "description_language": "en",
    "misc_parameters": "",
    "results_per_page": 100,
    "checkpoint_file": "",
    "output_file": "",
    "output_format": "jsonl",
    "log_file": "",
    "log_level": "DEBUG"
}

Configuration Parameters

Parameter Description Default
checkpoint_type Checkpoint type: published, modified, index published
checkpoint_start_time_or_index Start time or index 2024-01-01T00:00:00.000+00:00
keyword_search Keyword to search for null
keywordexactmatch Exact word matching false
cve_data_to_include CVE fields to include descriptions,references
description_language Description language en
misc_parameters Additional API parameters null
results_per_page Results per page 100
output_format Output format: jsonl, json, csv jsonl
checkpoint_file Checkpoint file path "" (auto)
output_file Output file path "" (auto)
log_file Log file path "" (auto)
log_level Log level DEBUG

File Path Configuration

CVE Harbor supports three types of path configurations for maximum flexibility:

1. Empty/Default (Recommended)

Leave the path fields empty to use automatic absolute paths:

{
    "checkpoint_file": "",
    "output_file": "",
    "log_file": ""
}

Default locations:

  • Windows: C:\Users\[username]\cve-harbor-data\
  • Linux/Mac: /home/[username]/cve-harbor-data/

Directory structure:

~/cve-harbor-data/
├── checkpoints/
│   └── checkpoint.txt
├── output/
│   └── events_2025-10-08.jsonl
└── logs/
    └── harbor.log

2. Absolute Paths (For Production/Crontab)

Specify full paths for complete control:

{
    "checkpoint_file": "C:/production/cve-data/checkpoints/checkpoint.txt",
    "output_file": "D:/exports/cve/events.jsonl",
    "log_file": "C:/logs/cve-harbor/harbor.log"
}

Linux/Mac example:

{
    "checkpoint_file": "/var/lib/cve-harbor/checkpoints/checkpoint.txt",
    "output_file": "/data/exports/cve/events.jsonl",
    "log_file": "/var/log/cve-harbor/harbor.log"
}

3. Relative Paths (For Development)

Use paths relative to the project directory:

{
    "checkpoint_file": "data/checkpoints/checkpoint.txt",
    "output_file": "data/output/events.jsonl",
    "log_file": "logs/harbor.log"
}

Benefits:

  • ✅ Works with crontab without worrying about working directory
  • ✅ Separate data storage from application code
  • ✅ Easy to backup and manage data files
  • ✅ Multiple instances can use different data directories

🎯 Usage

Basic Usage

# Run with main script
python main.py

# Run with CLI
python -m harbor.cli

# With custom config file
python -m harbor.cli --config custom_config.json

Crontab Example

# Run every 6 hours
0 */6 * * * cd /opt/cve-harbor && /usr/bin/python3 main.py >> /var/log/cve-harbor/cron.log 2>&1

Usage as Python Module

from harbor import fetch_cve_data, filter_cve_data, write_events

# Fetch data from API
data = fetch_cve_data(url, params)

# Process data
events, timestamp = filter_cve_data(vulnerabilities, fields, languages, checkpoint_type)

# Write to file
write_events(events, "output.jsonl", "jsonl")

📊 Output Formats

JSONL Format (Default)

{
  "timestamp": 1704067200.0,
  "cve_id": "CVE-2024-1234",
  "published": "2024-01-01T00:00:00.000+00:00",
  "last_modified": "2024-01-01T00:00:00.000+00:00",
  "vuln_status": "Analyzed",
  "data": { ... }
}

JSON Format

All CVEs in a single JSON array.

CSV Format

Flattened data structure, suitable for Excel and other analysis tools.

🔧 Development

Project Structure

  • src/harbor/core/: Main business logic

    • api_client.py: NVD API communication
    • data_processor.py: Data processing and filtering
    • checkpoint_manager.py: Checkpoint management
  • src/harbor/formatters/: Output formats

    • event_writer.py: JSONL, JSON, CSV formats
  • src/harbor/utils/: Utility functions

    • utils.py: Logging, datetime operations
  • src/harbor/cli.py: Command line interface

Code Quality

# Code formatting
black src/

# Linting
flake8 src/

# Type checking
mypy src/

🧰 Contributing / Dev Setup

Follow these steps to set up a clean development environment with linting, formatting, and type-checking.

1) Create and activate a virtualenv

python3 -m venv .venv
source .venv/bin/activate  # Linux/Mac
# or on Windows (PowerShell)
.venv\\Scripts\\Activate.ps1

2) Install the project in editable mode

This enables clean imports (src/ layout) and the console script cve-harbor.

pip install -e .

3) Install dev tools and pre-commit hooks

pip install pre-commit
pre-commit install

# Run hooks on all files once
pre-commit run --all-files

The repository is configured via pyproject.toml for:

  • flake8 (max line length 100, ignores E203/W503)
  • black (line length 100)
  • isort (black profile)
  • pyright (type checking)

4) Useful developer commands

# Run with CLI (after editable install)
cve-harbor --config config/config.json

# Format + sort imports
black . && isort .

# Lint and types
flake8 . && pyright

# Run tests (if tests/ present)
pytest -q

5) Commit guidelines

  • Keep changes focused and small when possible.
  • Ensure pre-commit hooks pass before pushing.
  • Add tests for non-trivial changes.

📝 Example Usage Scenarios

1. Monitoring Windows Security Vulnerabilities

{
    "keyword_search": "Windows",
    "misc_parameters": "cvssV4Severity=HIGH",
    "output_format": "csv"
}

2. Getting Last 7 Days of CVEs

{
    "checkpoint_start_time_or_index": "2024-12-25T00:00:00.000+00:00",
    "output_format": "json"
}

3. Filtering Specific Software Vulnerabilities

{
    "keyword_search": "Apache",
    "keywordexactmatch": false,
    "cve_data_to_include": "descriptions,metrics,references"
}

🔄 Checkpoint System

The project tracks the last state of processed data:

  • published: Checkpoint based on publication date
  • modified: Checkpoint based on modification date
  • index: Checkpoint based on index number

This prevents reprocessing of the same data and allows continuation from where it left off.

📋 Logging

The project maintains detailed log records:

  • API requests
  • Number of processed CVEs
  • Checkpoint updates
  • Error messages

🛠️ Error Handling

  • API connection errors
  • File writing errors
  • Invalid configuration
  • Network timeout

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

🤝 Contributing

  1. Fork the project
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

🙏 Acknowledgments

  • NVD - National Vulnerability Database
  • NIST - National Institute of Standards and Technology

About

CVE Harbor is a lightweight yet powerful tool that automatically retrieves, locally stores, processes, and prepares National Vulnerability Database (NVD) data for analysis. It provides a fast, reliable, and scalable foundation for cybersecurity operations, vulnerability management, and SIEM integrations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages