A comprehensive tool for downloading, parsing, and indexing CPE (Common Platform Enumeration) product data from the NVD (National Vulnerability Database) into Elasticsearch for fast searching.
- 📥 Download NVD CPE 2.0 feed data automatically
- 🔄 Parse JSON chunk files and extract CPE information
- 📚 Index data into Elasticsearch with optimized mapping
- 🔍 Search by tool name, website, CPE pattern, vendor, and more
- 📊 Statistics and aggregations on the indexed data
- 🖥️ Interactive command-line search interface
- 🔄 Update database with latest data and generate diff reports
- 📄 CSV matching for bulk tool analysis
- Python 3.7+
- Elasticsearch 7.x or 8.x running locally
- Internet connection for downloading NVD feed
- Clone or download this project
- Create a virtual environment (recommended):
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
Edit config.properties to match your setup:
# Elasticsearch connection
es.host=localhost
es.port=9200
es.scheme=http
# Index configuration
es.index=cpe-index
# Feed configuration
nvd.feed.url=https://nvd.nist.gov/feeds/json/cpe/2.0/nvdcpe-2.0.tar.gz
nvd.feed.extract.dir=./data/nvd-cpepython main.py [-h] [--config CONFIG] [--verbose] {setup,download,parse-and-index,recreate-index,search-demo,search,full-pipeline,match-csv,update} ...Global Options:
--config CONFIG: Configuration file path (default: config.properties)--verbose, -v: Enable verbose logging
Available Commands:
python main.py setupSetup project and create Elasticsearch index.
python main.py download [--force]Download NVD CPE feed data.
Options:
--force: Force re-download even if data exists
python main.py parse-and-indexParse JSON files and index into Elasticsearch.
python main.py recreate-indexDelete and recreate index with updated mapping.
python main.py search-demoDemonstrate search capabilities with example queries.
python main.py searchInteractive search mode with command-line interface.
Available search commands in interactive mode:
tool <name>- Search by tool namewebsite <url>- Search by websitecpe <pattern>- Search by CPE patternvendor <name>- Search by vendorstats- Show database statisticsquit- Exit interactive mode
python main.py full-pipeline [--force-download]Run complete pipeline: setup + download + index + search demo.
Options:
--force-download: Force re-download of data during pipeline
Important Note: If an Elasticsearch index already exists, the tool will prompt for confirmation before proceeding, as the operation will delete all existing data and create a new index.
python main.py match-csv CSV_FILE [--tool-col COL] [--website-col COL] [--output OUTPUT]Match tools from a CSV file against the CPE database.
Arguments:
CSV_FILE: Path to CSV file containing tools
Options:
--tool-col COL: Tool name column index (0-based, default: 1)--website-col COL: Website column index (0-based, default: 2)--output OUTPUT: Output file path (default: cpe_matches.csv)
python main.py update [--force-download] [--no-diff]Update CPE database with latest data and create diff reports.
Options:
--force-download: Force re-download of latest data--no-diff: Skip diff generation for faster updates
Run the complete pipeline in one command:
python main.py full-pipelineThis will:
- Setup Elasticsearch index
- Download NVD CPE feed
- Parse and index all data
- Show search examples
Match tools from a CSV file against the CPE database:
python main.py match-csv your_tools.csv --tool-col 1 --website-col 2 --output results.csvCSV Format Requirements:
- Tool name column (default: column 1, 0-based indexing)
- Website column (default: column 2, 0-based indexing)
Example CSV:
rank,tool_name,website,popularity
1,apache,https://httpd.apache.org,1000000
2,nginx,https://nginx.org,800000
What the tool does:
- Searches CPE database by tool name and website
- Removes http/https prefixes from URLs
- Filters out deprecated CPE entries
- Groups CPE variants by vendor/product (handles version differences)
- Sets version to
*when multiple versions exist - Outputs up to 5 CPE matches per tool with full details
Update your CPE database with the latest data:
python main.py updateThis command will:
- Download the latest NVD CPE feed
- Compare with existing database
- Generate diff reports showing changes
- Update the database with new entries
- Create backup of previous state
Update Options:
- Use
--force-downloadto force fresh download - Use
--no-diffto skip diff generation for faster updates
python main.py setuppython main.py downloadForce re-download:
python main.py download --forcepython main.py parse-and-indexDemo searches:
python main.py search-demoInteractive search:
python main.py searchThe tool supports various search types:
- Tool name search:
tool apache - Website search:
website github.com - CPE pattern search:
cpe *apache* - Vendor search:
vendor microsoft - Statistics:
stats
search_client.search_by_tool_name("apache")search_client.search_by_website("github.com")search_client.search_by_exact_cpe("cpe:2.3:a:apache:http_server:2.4.41:*:*:*:*:*:*:*")search_client.search_by_vendor_product(vendor="apache", product="tomcat")search_client.search_deprecated(deprecated=True)search_client.search_by_date_range(start_date="2023-01-01", end_date="2023-12-31")Each CPE entry contains:
- cpeName: Full CPE identifier
- cpeNameId: Unique UUID
- created/lastModified: Timestamps
- deprecated: Boolean status
- refs: Array of reference URLs
- titles: Array of human-readable titles
├── main.py # Main application
├── config_parser.py # Configuration handling
├── elasticsearch_manager.py # Elasticsearch operations
├── data_downloader.py # NVD feed download/extraction
├── data_parser.py # JSON parsing and indexing
├── cpe_search_client.py # Search functionality
├── config.properties # Configuration file
├── requirements.txt # Python dependencies
└── data/ # Downloaded and extracted data
└── nvd-cpe/
└── nvdcpematch-2.0-chunks/
- Ensure Elasticsearch is running:
curl http://localhost:9200 - Check configuration in
config.properties - Verify firewall settings
- Check internet connection
- Verify NVD feed URL in configuration
- Try force re-download:
python main.py download --force
- The tool processes data in batches (default: 1000 documents)
- Adjust batch size in
data_parser.pyif needed - Ensure sufficient disk space for extracted data
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is open source. Please check the repository for license details.