Skip to content

AI-powered browser-based vulnerability scanner using UniXcoder embeddings and RAG with LLM to detect security flaws across 9 languages.

Notifications You must be signed in to change notification settings

butlerem/vulnerability-scanner-UniXcoder-RAG

Repository files navigation

Sylint

RAG-Based Static Vulnerability Scanner with Semantic Code Analysis

License: MIT Next.js Python TypeScript

Live DemoFeaturesInstallationUsageArchitecture


Overview

Sylint is a browser-based static vulnerability scanner that combines deep code embeddings with retrieval-augmented generation (RAG) to detect, explain, and remediate security vulnerabilities. Unlike traditional pattern-matching tools, Sylint understands code semantically, enabling it to identify vulnerabilities even in obfuscated or stylistically varied code.

Key Differentiators

  • Semantic Analysis: Uses Microsoft's UniXcoder to understand code logic, not just syntax
  • RAG-Powered Explanations: Grounds vulnerability analysis in real-world CVE/CWE patterns
  • Multi-Language Support: Analyzes 9 programming languages with a single model
  • Developer-Friendly: Provides plain-English explanations and automated fix suggestions

Supported Languages

Python • JavaScript • TypeScript • Java • C • C++ • PHP • Ruby • Go


Features

Core Functionality

  • Semantic Vulnerability Detection - Identifies security issues through code understanding rather than pattern matching
  • Deep Code Embeddings - 768-dimensional vector representations using UniXcoder
  • CVE/CWE Mapping - Automatic classification based on NVD vulnerability database
  • Automated Fix Suggestions - LLM-generated patches for detected vulnerabilities
  • Compliance Reporting - Maps findings to PCI DSS, HIPAA, NIST SP 800-53, and OWASP ASVS
  • Scan History - Persistent storage and retrieval of previous analyses
  • Export Reports - Generate PDF and Markdown vulnerability reports

Technical Features

  • Monaco Editor Integration - Professional code editing with syntax highlighting
  • Real-time Analysis - Serverless backend for fast vulnerability scanning
  • Vector Similarity Search - Pinecone-powered retrieval of similar vulnerable code
  • Authentication & Authorization - Clerk-based user management with subscription tiers
  • Secure Communication - Full HTTPS encryption for all client-server interactions

Demo

Live Application: https://sylint.app/

Try Sylint with your own code or use the sample vulnerabilities provided in the interface.

Sylint Interface

Example Analysis

# Input: User authentication function with SQL injection vulnerability
# Output: 
# - Detected: CWE-89 (SQL Injection)
# - Explanation: Unsanitized user input concatenated into SQL query
# - Suggested Fix: Use parameterized queries or prepared statements

Architecture

System Overview

┌─────────────┐
│   Browser   │
│  (Next.js)  │
└──────┬──────┘
       │
       ├─────────────┐
       │             │
┌──────▼──────┐ ┌───▼────────┐
│   Convex    │ │  FastAPI   │
│  (Backend)  │ │ (AI Layer) │
└─────────────┘ └─────┬──────┘
                      │
              ┌───────┼────────┐
              │       │        │
       ┌──────▼──┐ ┌──▼────┐ ┌▼────────┐
       │ UniXcoder│ │ Groq  │ │Pinecone │
       │(Embeddings)│(LLM)  │ │(Vector) │
       └──────────┘ └───────┘ └─────────┘

RAG Pipeline

  1. Code Submission - User submits source code via Monaco editor
  2. Embedding Generation - UniXcoder creates 768-dimensional vector representation
  3. Similarity Search - Query Pinecone for top-k similar vulnerable code samples from CVEfixes dataset
  4. Context Augmentation - Retrieved examples augment LLM prompt
  5. Vulnerability Analysis - Groq's Mixtral model generates explanation, CWE tags, and fixes
  6. Result Presentation - Findings displayed with compliance mappings and export options

Installation

Prerequisites

  • Node.js 18+ and npm
  • Python 3.9+
  • Convex account
  • Clerk account
  • Groq API key
  • Pinecone account

Setup

  1. Clone the repository
git clone https://github.com/yourusername/sylint.git
cd sylint
  1. Install dependencies
npm install
  1. Configure environment variables

Create a .env.local file in the root directory:

# Clerk Authentication
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=your_clerk_publishable_key
CLERK_SECRET_KEY=your_clerk_secret_key

# Convex
CONVEX_DEPLOYMENT=your_convex_deployment
NEXT_PUBLIC_CONVEX_URL=your_convex_url

# Groq API
GROQ_API_KEY=your_groq_api_key

# Pinecone
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_ENVIRONMENT=your_pinecone_environment
  1. Set up the AI service
cd ai-service
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
  1. Start the development servers
# Terminal 1: Next.js frontend
npm run dev

# Terminal 2: Convex backend
npx convex dev

# Terminal 3: FastAPI AI service
cd ai-service
uvicorn main:app --reload --port 8000
  1. Access the application

Open http://localhost:3000 in your browser.


Usage

Basic Vulnerability Scan

  1. Select your programming language from the dropdown
  2. Paste or type your code in the Monaco editor
  3. Click "Scan for Vulnerabilities"
  4. Review detected issues with:
    • Vulnerability explanation
    • CWE classification
    • Affected code location
    • Suggested fix
    • Compliance implications

API Endpoints

Generate Code Embedding

POST /embed
Content-Type: application/json

{
  "code": "string",
  "language": "python"
}

Explain Vulnerability

POST /explain
Content-Type: application/json

{
  "code": "string",
  "similar_vulnerabilities": ["array of similar code samples"],
  "language": "python"
}

Tech Stack

Frontend

  • Next.js 14 - React framework with App Router
  • TypeScript - Type-safe development
  • Tailwind CSS - Utility-first styling
  • Monaco Editor - VS Code-based code editor

Backend

  • Convex - Real-time serverless database
  • FastAPI - High-performance Python API framework
  • Clerk - Authentication and user management

AI/ML

  • UniXcoder - Microsoft's code understanding model (768-dim embeddings)
  • Groq API - LLM inference (Mixtral Llama 3.3 70B)
  • Pinecone - Vector database for similarity search

Dataset

  • CVEfixes - Curated vulnerable code samples from NVD with CVE/CWE mappings

Roadmap

  • Compliance Mode Selection - Filter analysis by specific frameworks (PCI DSS, HIPAA, NIST)
  • Multi-file Project Scanning - Analyze entire codebases with dependency tracking
  • CI/CD Integration - GitHub Actions and GitLab CI plugins
  • Custom Rule Creation - User-defined vulnerability patterns
  • IDE Extensions - VS Code and JetBrains plugin support
  • Real-time Collaboration - Multi-user code review sessions
  • Enhanced Compliance Database - Expanded regulatory framework coverage

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.


Acknowledgments

  • Microsoft for UniXcoder
  • CVEfixes dataset contributors
  • Groq for LLM API access
  • Pinecone for vector database infrastructure

About

AI-powered browser-based vulnerability scanner using UniXcoder embeddings and RAG with LLM to detect security flaws across 9 languages.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published