Skip to content

Bastet is a comprehensive dataset of common smart contract vulnerabilities in DeFi along with an AI-driven automated detection process to enhance vulnerability detection accuracy and optimize security lifecycle management.

License

Notifications You must be signed in to change notification settings

OneSavieLabs/Bastet

Repository files navigation

Bastet

Bastet is a comprehensive dataset of common smart contract vulnerabilities in DeFi along with an AI-driven automated detection process to enhance vulnerability detection accuracy and optimize security lifecycle management.

Overview

Bastet covers common vulnerabilities in DeFi, including medium- to high-risk vulnerabilities found on-chain and in audit competitions, along with corresponding secure implementations. It aims to help developers and researchers gain deeper insights into vulnerability patterns and best security practices.

In addition, Bastet integrates an AI-driven automated vulnerability detection process. By designing tailored detection workflows, Bastet enhances AI's accuracy in identifying vulnerabilities, with the goal of optimizing security lifecycle management—from development and auditing to ongoing monitoring.

We strive to improve overall security coverage and warmly welcome contributions of additional vulnerability types, datasets, or improved AI detection methodologies. Please refer here to join and contribute to the Bastet dataset. Together, we can drive the industry's security development forward.

To download the dataset here

Bastet/
│── cli/                        # Python CLI package
│   │── __init__.py
│   │── main.py                 # CLI entry point
│   │── commands/               # CLI commands
│   │   │── <module>/
│   │   │   │── __init__.py     # CLI routing only, logic will define below
│   │   │   │── <function>.py
│   │── models/                 # Interfaces for python type check
│   │   │── <SAAS>/
│   │   │   │── __init__.py     # For output all models in SAAS
│   │   │   │── <function>.py
│   │   │── audit_report.py     # Main Interface of output in Bastet
│── dataset/                    # dataset location
│   │── reports/                # will be unzipped from the dataset.zip provide in google drive -> audit reports of the projects
│   │   │── <reports>/
│   │── repos/                  # will be unzipped from the dataset.zip provide in google drive -> codebase of the projects
│   │   │── <repos>/
│   │── dataset.csv             # dataset sheet, provide ground truth. (should be clone from google drive)
│   │── README.MD               # Basic information of the dataset
│── n8n_workflows/              # n8n workflow files
│   │── <file>.json             # workflow for analyzing the smart contracts
│── docker-compose.yaml
│── README.md
│── poetry.lock
│── pyproject.toml
│── .gitignore

Features

  • Recursive scanning of .sol files in specified directories
  • Automatic database creation and schema setup
  • Integration with n8n workflows via webhooks
  • Detailed processing summary and error reporting
  • Results stored in PostgreSQL for further analysis
  • A dataset for evaluate the prompt
  • A cli interface to trigger evaluate workflow
  • Python file formatter: Black

How to install

Local n8n Setup

Prerequisites

  • Python 3.10 or higher
  • Docker installed on your machine
  • Docker Compose installed on your machine
  • Poetry for package management, if you want to follow our instruction the version should< 2.0.0

Installation Steps

Video tutorial

IMAGE ALT TEXT HERE

  1. Setup Python environment:
# Initialize virtual environment and install dependencies
poetry shell
poetry install
  1. Configure environment variables in .env:
cp .env.example .env

Update the environment variables in .env file if needed.

  1. Start n8n and database:
docker-compose -f ./docker-compose.yml up -d
  1. Access the n8n dashboard, Open your browser and navigate to http://localhost:5678

  2. (First time only) Setup owner account, activate free n8n pro features

  3. Click the user icon at the bottom left → Settings → Click the n8n API in the sidebar → Create an API key → Label fill Bastet → Expiration select "No Expiration" (If you want to set an expiration time, select it) → Copy the API key and paste it to N8N_API_KEY in .env file, because the API key will not be visible after creation, you can only create it again → Click Done.

  4. Back to the homepage (http://localhost:5678/home/workflows)

  5. Click Create Credential in the arrow button next to the Create Workflow button → Fill in "OpenAi" in the input → You will see "OpenAi" and select it, click Continue → API Key fill your OpenAi API key, Create OpenAi credentials, and copy the value of the ID field and paste it to N8N_OPENAI_CREDENTIAL_ID in .env file.

  6. Import the workflow by executing the following code

Before the setup, make sure you fill the N8N_API_KEY, N8N_OPENAI_CREDENTIAL_ID in .env file.

poetry run python cli/main.py init

You will see the all workflows we provided currently. (default activated, if you want to skip some workflow, please deactivate it in n8n (http://localhost:5678/home/workflows)

Usage

Scan Multiple Contracts with Multiple Processor Workflows

The main script scan will recursively scan all .sol files in the specified directory:

poetry run python cli/main.py scan

The script will scan all contracts in the dataset/scan_queue directory using all workflows that you have activated by turning on their respective switch buttons.

you can use flag --help for detail information of flag you can use

Scan Single Contract with Single Processor Workflow

  1. Go into the workflow you want to scan.
  2. Click the Chat button on the bottom and input the contract content.

Evaluation

  1. import the workflow you want to evaluate

The output of the workflow need to follow the following json schema.

{
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "summary": {
        "type": "string",
        "description": "Brief summary of the vulnerability"
      },
      "severity": {
        "type": "string",
        "items": {
          "type": "string",
          "enum": ["high", "medium", "low"]
        },
        "description": "Severity level of the vulnerability"
      },
      "vulnerability_details": {
        "type": "object",
        "properties": {
          "function_name": {
            "type": "string",
            "description": "Function name where the vulnerability is found"
          },
          "description": {
            "type": "string",
            "description": "Detailed description of the vulnerability"
          }
        },
        "required": ["function_name", "description"]
      },
      "code_snippet": {
        "type": "array",
        "items": {
          "type": "string"
        },
        "description": "Code snippet showing the vulnerability",
        "default": []
      },
      "recommendation": {
        "type": "string",
        "description": "Recommendation to fix the vulnerability"
      }
    },
    "required": [
      "summary",
      "severity",
      "vulnerability_details",
      "code_snippet",
      "recommendation"
    ]
  },
  "additionalProperties": false
}

The trigger point should be a webhook and this workflow should be activated (by clicking the switch at n8n home page)

You may refer n8n_workflow/slippage_min_amount.json

  1. download the latest dataset.zip and the dataset.csv from here

  2. unzip the dataset.zip in the ./dataset and the folder structure should look like this

dataset/ # dataset location
│── reports/ # will be unzipped from the dataset.zip provide in google drive -> audit reports of the projects
│  │── <reports>/
│── repos/ # will be unzipped from the dataset.zip provide in google drive -> codebase of the projects
│  │── <repos>/
│── dataset.csv # dataset sheet, provide ground truth. (should be clone from google drive and renamed to `dataset.csv`)
│── README.MD # Basic information of the dataset
  1. run the command
poetry run python cli/main.py eval

you can use flag --help for detail information of flag you can use

Demo Case Setup

  1. import slippage_min_amount.json to your n8n service.

  2. provide the openAI credential for the workflow slippage_min_amount you just create.

  3. make the workflow active

  4. download the latest dataset.zip and the dataset.csv from here

  5. unzip the dataset.zip in the ./dataset and the folder structure should look like this

dataset/ # dataset location
│── reports/ # will be unzipped from the dataset.zip provide in google drive -> audit reports of the projects
│  │── <reports>/
│── repos/ # will be unzipped from the dataset.zip provide in google drive -> codebase of the projects
│  │── <repos>/
│── dataset.csv # dataset sheet, provide ground truth. (should be clone from google drive and renamed to `dataset.csv`)
│── README.MD # Basic information of the dataset
  1. run
poetry run python cli/main.py eval

you shell get the confusion metrics. like this

+----------------+---------+
| Metric         |   Value |
+================+=========+
| True Positive  |      16 |
+----------------+---------+
| True Negative  |      27 |
+----------------+---------+
| False Positive |       2 |
+----------------+---------+
| False Negative |      13 |
+----------------+---------+

Note: the number shell be difference since the answer of LLM model is not stable, the answer here is created by gpt-4o-mini

Conference

Date Conference Name Topic Slide
2025-04-02 ETH TAIPEI 2025 Exploring AI’s Role in Smart Contract Security ETH-TAIPEI-2025
2025-04-17 CyberSec 2025 AI-Driven Smart Contract Vulnerability Detection CyberSec-2025

Disclaimer

Bastet is for research and educational purposes only. Anyone who discovers a vulnerability should adhere to the principles of Responsible Disclosure and ensure compliance with applicable laws and regulations. We do not encourage or support any unauthorized testing, attacks, or abusive behavior, and users assume all associated risks.

License

Apache License 2.0

About

Bastet is a comprehensive dataset of common smart contract vulnerabilities in DeFi along with an AI-driven automated detection process to enhance vulnerability detection accuracy and optimize security lifecycle management.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 7