DeepCSIM is a Python library and tool for analyzing code similarity between Python files using Abstract Syntax Tree (AST) analysis. It can detect structural and semantic similarities.
pip install deepcsimYou can use DeepCSIM in three ways: as a command-line tool, as a local web server, or as a Python library.
Scan a directory for code duplicates directly from your terminal.
# Scan current directory
deepcsim-cli
# Scan a specific directory
deepcsim-cli /path/to/project
# Set similarity threshold (default: 80.0)
deepcsim-cli /path/to/project --threshold 90
# Output results in JSON format
deepcsim-cli /path/to/project --jsonStart the built-in web interface to view results interactively.
deepcsim-serverThen open http://localhost:8000/ in your browser.
Use DeepCSIM programmatically in your Python scripts.
from deepcsim import compare_source
from deepcsim.core.analyzer import CodeAnalyzer # Optional, for advanced usage
# --- Example 1: Compare two code snippets (Easy Way) ---
source1 = """
def hello():
print("Hello world")
"""
source2 = """
def greet():
print("Hello world")
"""
# Compare directly
report = compare_source(source1, source2)
# The report contains full details, similar to scan_directory output
print(f"Similarity: {report['results'][0]['similarity']}%")
# --- Example 2: Scan a directory ---
results = scan_directory("/path/to/project", threshold=80.0)
print(f"Found {results['count']} similar pairs.")If running the server, you have access to the following endpoints:
GET /— Web interface for browsing project files and scanning directoriesPOST /api/file-info/— Get metadata and similar files for a specific filePOST /scan-project— Recursively scan a directory for duplicate/similar files (JSON response)
