Skip to content

Fast disk usage analyzer that groups files by extension. Parallel scanning, beautiful terminal output with visual bars, and detailed statistics. Perfect for finding what's taking up space on your drives.

License

Notifications You must be signed in to change notification settings

chrstphdm/extstat

Repository files navigation

extstat

CI License: MIT Rust

⚡ Fast disk usage analyzer that groups files by extension. Perfect for finding what's taking up space on your drives.

A parallel, high-performance CLI tool that scans directories and displays disk usage statistics grouped by file extension, with beautiful terminal output.

Features

Parallel scanning - Uses all CPU cores for maximum speed ✅ Beautiful table output - Color-coded results with visual bars ✅ Flexible filtering - Minimum file size, top N extensions ✅ File count tracking - See how many files per extension ✅ No dependencies - Single binary, works everywhere

Installation

From source (recommended for now)

# Clone or copy the project
cd extstat

# Build release version (optimized)
cargo build --release

# Binary will be in target/release/extstat
# Copy to your PATH
sudo cp target/release/extstat /usr/local/bin/

Usage

Basic usage

# Analyze current directory
extstat

# Analyze specific directory
extstat /path/to/directory

# Show file counts
extstat -c

# Filter small files (e.g., min 1MB)
extstat -s 1048576

# Show only top 20 extensions
extstat -n 20

# Combine options
extstat /data -c -s 1000000 -n 10

Examples

# Analyze your home directory
extstat ~

# Find what's taking space in /var
extstat /var -n 15

# Show detailed stats for current project
extstat . -c

Command Line Options

Options:
  <PATH>              Directory to analyze [default: .]
  -s, --min-size      Minimum file size to include (in bytes) [default: 0]
  -n, --top           Maximum number of extensions to display [default: 50]
  -c, --show-count    Show file count
  -h, --help          Print help
  -V, --version       Print version

Output Explanation

╭────────────┬──────────┬─────────┬──────────────────────────────────╮
│ Extension  │ Size     │ % Total │ Visual                           │
├────────────┼──────────┼─────────┼──────────────────────────────────┤
│ .fastq     │ 2.5 GiB  │ 45.23%  │ ██████████████░░░░░░░░░░░░░░░░░░ │
│ .bam       │ 1.2 GiB  │ 21.67%  │ ███████░░░░░░░░░░░░░░░░░░░░░░░░░ │
│ .fasta     │ 567 MiB  │ 10.11%  │ ███░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │
╰────────────┴──────────┴─────────┴──────────────────────────────────╯
  • Extension: File extension (or [no extension] for files without one)
  • Size: Total size for all files with this extension (human-readable)
  • % Total: Percentage of total scanned space
  • Visual: Bar chart representation

Performance

  • Parallel scanning using Rayon (uses all CPU cores)
  • Typical performance: ~500k files/second on modern SSD
  • Memory efficient: doesn't load file contents, only metadata

Development

Project Structure

extstat/
├── Cargo.toml       # Rust dependencies
├── src/
│   └── main.rs      # Main application code
└── README.md        # This file

Building for development

# Build debug version (faster compilation)
cargo build

# Run directly
cargo run -- /path/to/scan

# Run with options
cargo run -- . -c -n 10

Understanding the code

Key Rust concepts used:

  1. Parallel iteration with Rayon:

    files.par_iter()  // Process files in parallel
  2. Result handling with ?:

    let metadata = entry.metadata().ok()?;  // Return None if error
  3. Pattern matching:

    path.extension()
        .and_then(|s| s.to_str())  // Chain operations safely
  4. HashMap aggregation:

    let entry = acc.entry(ext).or_insert((0, 0));  // Get or create
    entry.0 += size;  // Update tuple

Adding features

Want to add more features? Common additions:

  1. JSON export: Add serde and serde_json dependencies
  2. Interactive TUI: Add ratatui and crossterm
  3. Progress bar: Add indicatif dependency
  4. Date filtering: Use file metadata modified() time

Troubleshooting

Permission denied errors:

  • Use sudo for system directories
  • Or skip inaccessible files (feature coming soon)

Slow on network drives:

  • Network I/O is the bottleneck, not the tool
  • Consider scanning locally first

Out of memory:

  • Only happens with millions of different extensions
  • Try filtering with -s to reduce file count

Why Rust?

  • Speed: As fast as C/C++, often faster than Go/Python
  • Safety: No segfaults, data races prevented at compile time
  • Modern: Great tooling (cargo), helpful compiler errors
  • Dependencies: Easy to manage, reproducible builds

Next Steps (Version 2)

Planned features:

  • Interactive TUI mode (like ncdu)
  • Drill-down: click extension → see directories
  • Export to JSON/CSV
  • Progress bar during scan
  • Filter by date modified
  • Compare two scans (before/after cleanup)

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

Changelog

See CHANGELOG.md for a list of changes in each release.

License

MIT License - Feel free to use, modify, distribute

About

Fast disk usage analyzer that groups files by extension. Parallel scanning, beautiful terminal output with visual bars, and detailed statistics. Perfect for finding what's taking up space on your drives.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published