A pattern scanning and replacement tool built in Rust. It combines SIMD regex-based file scanning with pattern replacement capabilities, designed with reliability, speed, and efficiency as top priorities to cover a lot of ground fast.
We currently use it internally to update copyright information, find and change words or code preserving syntax, and automate repetitive tasks. It's also a powerful tool for managing large codebases.
- Fast scanning - SIMD-powered regex engine with parallel file processing using Rayon
- Pattern matching - Find complex patterns across entire codebases with YAML configuration
- Safe replacements - Replace or remove patterns with automatic
.bakfile creation - Atomic file renaming - Atomically rename files at scale.
- Undo support - Restore files from backups with a single command
- Smart cleanup - Automatically removes empty lines after block deletion and collapses consecutive empty lines
- Gitignore aware - Respects
.gitignorefiles automatically via theignorecrate - Directory exclusions - Configure custom directory exclusions (e.g., node_modules, target) in YAML
- Backup management - Clean up backup files when satisfied with changes
- Atomic writes - Uses
tempfilefor safe file modifications - Line-by-line processing - Memory-efficient stream processing for large files
# Clone the repository
git clone https://github.com/yourusername/oober.git
cd oober
# Install via cargo (recommended)
cargo install --path .
# The binary will be installed to ~/.cargo/bin/oober
# You can now use 'oober' or 'ob' from anywhere
# Add alias to your shell config (.zshrc, .bashrc, etc.)
echo "alias ob='oober'" >> ~/.zshrc
source ~/.zshrc
# Or build manually
cargo build --release
# Binary will be at ./target/release/oober# Scan with default patterns.yaml (using alias)
ob scan /path/to/scan
# Scan with custom patterns and output to file
ob scan -p scan_patterns.yaml -o results.txt /path/to/scan
# Scan only specific file types
ob scan -p patterns.yaml -x js,ts,py /path/to/scan# Replace using YAML configuration
ob replace --dir /Users/macadelic/dusk-indust/company/sandbox/lotide --config replace_config.yaml
# Preview changes without applying (dry-run)
ob replace --dir /path --config replace_config.yaml --dry-run
# Single pattern replacement
ob replace --dir /path --pattern "TODO" --replacement "[TODO]"
# Replace with specific extensions
ob replace --dir /path --config replace_config.yaml -x js,ts# Rename files with a dry-run to preview changes
ob rename --dir /path/to/project --pattern "old_name" --replacement "new_name" --dry-run
# Rename files
ob rename --dir /path/to/project --pattern "old_name" --replacement "new_name"# Restore files from backups
ob undo --dir /path
# Keep backup files after restore
ob undo --dir /path --keep-backups
# Remove backup files after verification
ob clean-backups --dir /Users/macadelic/dusk-labs/shared/packages/jot
# Preview backup files before removal (shows total size)
ob clean-backups --dir /path --dry-runpatterns:
- name: aws_access_key
pattern: '\bAKIA[0-9A-Z]{16}\b'
- name: email_address
pattern: '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b'
- name: password_assignment
pattern: 'password\s*=\s*[^\s;]+'
# Optional: directories to exclude from scanning
exclusions:
- node_modules
- target
- .git
- dist
- build# Patterns to search for (regex supported)
patterns:
- "TODO"
- "console\\.log"
- "print\\(.*\\)"
# Replacements (use null to delete the matched line)
replacements:
- "[TODO]"
- "// console.log"
- null # null means delete the matched line
# Multi-line blocks to remove
blocks:
- start: "/* DEBUG START */"
end: "/* DEBUG END */"
- start: "### TEMP CODE START ###"
end: "### TEMP CODE END ###"
# Optional: file extensions to process
extensions: [js, ts, py, rb]
# Optional: directories to exclude
exclude: [node_modules, target, .git]- Pattern Engine: Uses Rust's
regexcrate with SIMD optimizations - File Walker:
ignorecrate provides gitignore-aware traversal - Parallel Processing:
rayonenables multi-threaded file processing - Atomic Writes:
tempfileensures safe file modifications - Backup System: Creates
.bakfiles before modifications
The tool uses a subcommand architecture:
scan: Find patterns in filesreplace: Replace/remove patterns with optional backupundo: Restore files from backupsclean-backups: Remove backup files without restoring
# remove_copyright.yaml
blocks:
- start: "# Copyright (c) 2024"
end: "# DEALINGS IN THE SOFTWARE."ob replace --dir /project --config remove_copyright.yaml# cleanup_debug.yaml
patterns:
- "console\\.log"
- "debugger;"
- "print\\(.*debug.*\\)"
replacements:
- "// console.log"
- "// debugger;"
- "# print(debug)"
blocks:
- start: "// DEBUG START"
end: "// DEBUG END"
# Exclude common build directories
exclude:
- node_modules
- target
- dist# secret_patterns.yaml
patterns:
- name: aws_access_key
pattern: '\bAKIA[0-9A-Z]{16}\b'
- name: private_key
pattern: '-----BEGIN (?:RSA )?PRIVATE KEY-----'
exclusions:
- node_modules
- vendor
- .git
- target
- dist# Scan for potential security issues
ob scan -p patterns/secret_patterns.yaml -o security_audit.txt /project
# Additional pattern collections available
ob scan -p patterns/grammar_patterns.yaml /project # Grammar checks
ob scan -p patterns/log_patterns.yaml /project # Log statementsFind patterns in files and directories.
Options:
-p, --patterns <FILE>- Path to YAML patterns file (default: patterns.yaml)-o, --output <FILE>- Output file (default: stdout)-x, --ext <EXTENSIONS>- Comma-separated file extensions to include-w, --workers <N>- Number of threads to use (default: logical CPU cores)<INPUTS>...- Files or directories to scan
Note: Output order is not guaranteed when using multiple workers. Use -w 1 or sort the output for deterministic ordering.
Replace or remove patterns in files.
Options:
-c, --config <FILE>- YAML configuration file-p, --pattern <PATTERN>- Single pattern to match-r, --replacement <TEXT>- Replacement text-d, --dir <PATH>- Directory to process-x, --ext <EXTENSIONS>- File extensions to include-e, --exclude <DIRS>- Directories to exclude--no-backup- Don't create backup files--dry-run- Preview changes without applying-w, --workers <N>- Number of threads to use
Restore files from backups.
Options:
-d, --dir <PATH>- Directory to restore--keep-backups- Don't delete backup files after restore
Remove backup files without restoring.
Options:
-d, --dir <PATH>- Directory to clean--dry-run- Preview files to be removed
Rename files in a directory.
Options:
-d, --dir <PATH>- Directory to process-p, --pattern <PATTERN>- Regex pattern to match filenames-r, --replacement <TEXT>- Replacement string--dry-run- Preview changes without applying-w, --workers <N>- Number of threads to use
Oober is designed for maximum performance:
- Parallel processing - Uses all available CPU cores via Rayon
- SIMD regex - Hardware-accelerated pattern matching
- Efficient I/O - Streams files line-by-line to minimize memory usage
- Smart filtering - Respects .gitignore and filters by extension early
- Minimal allocations - Optimized for hot paths
Benchmark on a large codebase (100k files):
- Scanning: ~2-5 seconds
- Replacing: ~5-10 seconds (with backup creation)
- Use
cargo build --releasefor production builds - Profile with
cargo flamegraphfor optimization - Filter by extensions to reduce file processing
- Automatic backups - Creates
.bakfiles before any modification - Atomic writes - Uses temporary files to prevent corruption
- Dry-run mode - Preview all changes before applying
- Binary detection - Automatically skips binary files
- Gitignore respect - Won't modify ignored files
- Comprehensive error handling - Graceful failures with clear messages
- Empty line cleanup - Intelligently handles whitespace after deletions
Requirements:
- Rust 2024 edition
- Cargo
Dependencies:
regex- Pattern matching enginerayon- Parallel processingignore- Gitignore-aware file walkingclap- Command line parsingtempfile- Atomic file operationschrono- Timestamp handling
# Development build
cargo build
# Run tests
cargo test
# Build optimized binary
cargo build --releaseContributions are welcome! Please feel free to submit a Pull Request.
Design principles (in priority order):
- Reliability - Never corrupt or lose data
- Atomic file writes
- Automatic backups before changes
- Comprehensive error handling
- Speed - Maximize performance
- Parallel file processing
- SIMD-powered regex
- Efficient file traversal
- Efficiency - Optimize resource usage
- Stream processing (line-by-line)
- Minimal memory footprint
- Smart file filtering
- Modularity - Clean architecture
- Separate scan/replace/undo logic
- Reusable pattern configurations
- Extensible command structure
This project is licensed under the MIT License - see the LICENSE file for details.
- The tool respects
.gitignorefiles automatically - Binary files are automatically skipped
- Backups are created as
.bakfiles before any modifications - Empty lines left after block removal are automatically cleaned up
- Line-by-line processing means patterns cannot span multiple lines
- Use double backslashes in YAML for regex escapes (e.g.,
\\.for literal dot)
After removing multi-line blocks, the tool automatically:
- Removes all empty lines at the beginning of files
- Collapses multiple consecutive empty lines to just one
- Preserves single empty lines between content
- Maintains proper file endings (preserves final newline if present)
The clean-backups command provides:
- Preview mode with
--dry-runto see what would be removed - Total size calculation of backup files
- Safe removal with error handling
- Detailed reporting of removed files
Built with: