Skip to content

This project uses BERT-based deep learning model for automatic classification of CVE vulnerability descriptions from CVE.org JSON records. It applies keyword-based labeling, and achieves high accuracy in vulnerability detection, supporting proactive cybersecurity analysis.

Notifications You must be signed in to change notification settings

1810suman/CVE-BERT-Vulnerability-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CVE-BERT: Vulnerability Classification using BERT

This project leverages BERT (Bidirectional Encoder Representations from Transformers) to classify CVE descriptions into high-risk or low-risk categories based on vulnerability keywords. It automates parsing of large-scale CVE JSON records from CVE.org, generates labels, and fine-tunes a BERT model for high-accuracy binary classification.

πŸ“‚ Dataset Source

βš™οΈ Key Features

  • Parallel parsing of JSON files for scalable processing
  • Automatic binary labeling using critical vulnerability keywords
  • Balanced dataset creation for unbiased training
  • Custom PyTorch dataset integration for BERT
  • Fine-tuning BERT-Base Uncased for text classification
  • Comprehensive evaluation with Accuracy, Precision, Recall, F1, Confusion Matrix, and ROC-AUC

πŸ› οΈ Technical Stack

  • Language: Python 3.x
  • Libraries: PyTorch, HuggingFace Transformers, Scikit-learn, Pandas, Seaborn, Matplotlib

πŸš€ Model Performance

  • Train Set: 8,000 Samples (Balanced)
  • Test Set: 2,000 Samples (Balanced)
  • Final Test Accuracy: 99%
  • F1-Score: 0.99
  • ROC-AUC Score: High (See plot below)

πŸ“Š Example Output

Classification Report:

              precision    recall  f1-score   support
           0       0.99      0.99      0.99       988
           1       1.00      0.99      0.99      1012
    accuracy                           0.99      2000
   macro avg       0.99      0.99      0.99      2000
weighted avg       0.99      0.99      0.99      2000

Confusion Matrix:

Confusion Matrix

ROC-AUC Curve:

ROC AUC Curve

πŸ“¦ Installation (requirements.txt)

torch
transformers
sklearn
pandas
seaborn
matplotlib
tqdm

πŸ‘¨β€πŸ’» Usage

  1. Download CVE JSON dataset from CVEProject
  2. Set the correct cve_json_root path in the script
  3. Run the script for training and evaluation

About

This project uses BERT-based deep learning model for automatic classification of CVE vulnerability descriptions from CVE.org JSON records. It applies keyword-based labeling, and achieves high accuracy in vulnerability detection, supporting proactive cybersecurity analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages