Skip to content
/ mono Public

mono: Is Your "Clean" Vulnerability Dataset Really Solvable? Exposing and Trapping Undecidable Patches and Beyond

Notifications You must be signed in to change notification settings

vul337/mono

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mono: Is Your "Clean" Vulnerability Dataset Really Solvable? Exposing and Trapping Undecidable Patches and Beyond

This document describes the artifacts accompanying our paper: "mono: Is Your "Clean" Vulnerability Dataset Really Solvable? Exposing and Trapping Undecidable Patches and Beyond". 'mono' represents Multi-agent Operated Noise Outfilter. The artifacts are organized in the following directories:

mono

This directory contains the source code of our project.

MonoLens

This subfolder contains the final dataset, MonoLens, generated and analyzed by our framework.

The subfolders within MonoLens are organized as follows:

examples

This directory provides a sample of 8 data entries in the csv file and the overall stats of these samples. Each entry includes the original CVE metadata, the root cause analysis performed by our agent, and other relevant information. It also contains a reference to a corresponding folder within other_context folder, which holds the complete analysis results and the step-by-step process undertaken by the agent.

conf_0.9

This directory contains the subset of CVEs for which our agent's final confidence score in its analysis was greater than 0.9. The other_context subfolder is ommitted due to the large size of the data.

all

This directory includes the results for all CVEs that our agent was able to process and analyze. The other_context subfolder is ommitted due to the large size of the data.

whole-workflow-examples

This directory showcases the complete analysis process of our mono framework for four specific cases, each with an ReadMe.md. It details the entire pipeline:

  • Stage1. Patch Pre-filtering and Classification: Filtering of security-related patches.

  • Stage2. Data Acquisition and Preprocessing: Preprocessing using Joern to generate Code Property Graphs (CPGs). The binary files (cpg.bin), whole repo are excluded due to its large size.

  • Stage3. Iterative Contextual Analysis: Including:

    • The agent's analysis of the CVEs.
    • The contextual information gathered to understand the root cause of the CVE.
    • The context as understood and summarized by the agent.

RQs

This directory is dedicated to the research questions (RQs) addressed in our paper. Each RQ has its own subfolder, which contains:

  • The specific code used for that RQ.
  • The data relevant to that RQ.
  • The final results obtained for that RQ.

Each RQ subfolder also includes its own ReadMe.md file providing more detailed information specific to that research question.

Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

@misc{gao2025monocleanvulnerabilitydataset,
      title={mono: Is Your "Clean" Vulnerability Dataset Really Solvable? Exposing and Trapping Undecidable Patches and Beyond}, 
      author={Zeyu Gao and Junlin Zhou and Bolun Zhang and Yi He and Chao Zhang and Yuxin Cui and Hao Wang},
      year={2025},
      eprint={2506.03651},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2506.03651}, 
}

About

mono: Is Your "Clean" Vulnerability Dataset Really Solvable? Exposing and Trapping Undecidable Patches and Beyond

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •