Skip to content

Commit ac33a5e

Browse files
committed
docs: comprehensive README with usage examples and architecture
1 parent 75793d9 commit ac33a5e

File tree

1 file changed

+297
-2
lines changed

1 file changed

+297
-2
lines changed

README.md

Lines changed: 297 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,300 @@
11
# UFFS - Ultra Fast File Search
22

3-
Ultra-high-performance file search using direct NTFS MFT reading and Polars DataFrames.
3+
[![License: MPL 2.0](https://img.shields.io/badge/License-MPL%202.0-brightgreen.svg)](https://opensource.org/licenses/MPL-2.0)
4+
[![Rust](https://img.shields.io/badge/rust-1.85%2B-orange.svg)](https://www.rust-lang.org)
45

5-
> **Note**: This is the Rust implementation of UFFS, replacing the original C++ version.
6+
**Ultra-high-performance file search for Windows using direct NTFS MFT reading and Polars DataFrames.**
7+
8+
> 🦀 This is the Rust rewrite of UFFS, replacing the original C++ version with modern, safe, and blazing-fast code.
9+
10+
---
11+
12+
## ⚡ Why UFFS is Lightning Fast
13+
14+
Traditional file search tools (including `os.walk`, `FindFirstFile`, etc.) work like this:
15+
16+
1. Ask the OS to find a file
17+
2. OS reads the **entire MFT** (Master File Table) - the "phonebook" of all files
18+
3. Returns info for **one file**
19+
4. **Throws away the MFT**
20+
5. Repeat for the next file 🐌
21+
22+
**UFFS reads the MFT directly** - once - and queries it in memory using Polars DataFrames. This is like reading the entire phonebook once instead of looking up each name individually.
23+
24+
### Benchmark Results
25+
26+
| Program | Records | Time | Notes |
27+
|---------|---------|------|-------|
28+
| **UFFS** | **19 Million** | **~120 seconds** | All disks in parallel |
29+
| **UFFS** | **6.5 Million** | **~56 seconds** | Single HDD |
30+
| Everything | 19 Million | 178 seconds | All disks |
31+
| WizFile | 6.5 Million | 299 seconds | Single HDD |
32+
33+
> **UFFS is 68% faster than Everything and 4x faster than WizFile!**
34+
35+
---
36+
37+
## 🚀 Quick Start
38+
39+
### Installation
40+
41+
```bash
42+
# Build from source (requires Rust 1.85+)
43+
cargo build --release
44+
45+
# The binary will be at:
46+
# Windows: target/release/uffs.exe
47+
# Linux/macOS: target/release/uffs
48+
```
49+
50+
### Basic Usage
51+
52+
```bash
53+
# Search for all .rs files on C: drive
54+
uffs "*.rs" --drive C
55+
56+
# Search across multiple drives
57+
uffs "*.txt" --drives C,D,E
58+
59+
# Search all drives (default)
60+
uffs "project*"
61+
62+
# Use a pre-built index for instant searches
63+
uffs index --drive C --output c_drive.parquet
64+
uffs search "*.rs" --index c_drive.parquet
65+
```
66+
67+
---
68+
69+
## 📖 Usage Examples
70+
71+
### Simple Search
72+
73+
| Command | Result |
74+
|---------|--------|
75+
| `uffs "c:/pro*"` | Files & folders starting with "pro" on C: |
76+
| `uffs "*.txt"` | All .txt files on ALL drives |
77+
| `uffs "*.txt" --drives C,D,M` | All .txt files on C:, D:, and M: |
78+
| `uffs "project*" --ext rs,toml` | Rust project files |
79+
80+
### Filter Options
81+
82+
```bash
83+
# Files only (no directories)
84+
uffs "*.log" --files-only
85+
86+
# Directories only
87+
uffs "node_modules" --dirs-only
88+
89+
# Size filters
90+
uffs "*.mp4" --min-size 100MB --max-size 4GB
91+
92+
# Limit results
93+
uffs "*.tmp" --limit 100
94+
95+
# Case-sensitive search
96+
uffs "README" --case
97+
```
98+
99+
### Output Options
100+
101+
```bash
102+
# Output to CSV file
103+
uffs "*.rs" --out results.csv
104+
105+
# Custom columns
106+
uffs "*" --columns path,size,created --out files.csv
107+
108+
# Custom separator and quotes
109+
uffs "*" --sep ";" --quotes "'" --out data.csv
110+
111+
# Include/exclude header
112+
uffs "*" --header false --out raw.csv
113+
114+
# JSON output
115+
uffs "*.rs" --format json
116+
```
117+
118+
### Available Columns
119+
120+
| Column | Description |
121+
|--------|-------------|
122+
| `path` | Full path including filename |
123+
| `name` | Filename only |
124+
| `pathonly` | Directory path only |
125+
| `size` | File size in bytes |
126+
| `sizeondisk` | Actual disk space used |
127+
| `created` | Creation timestamp |
128+
| `written` | Last modified timestamp |
129+
| `accessed` | Last accessed timestamp |
130+
| `type` | File type |
131+
| `directory` | Is a directory |
132+
| `compressed` | Is compressed |
133+
| `encrypted` | Is encrypted |
134+
| `hidden` | Hidden attribute |
135+
| `system` | System attribute |
136+
| `readonly` | Read-only attribute |
137+
| `all` | All available columns |
138+
139+
---
140+
141+
## 🛠️ Commands
142+
143+
### `uffs search` (default)
144+
Search for files matching a pattern.
145+
146+
```bash
147+
uffs search "*.rs" --drive C --files-only --limit 100
148+
```
149+
150+
### `uffs index`
151+
Build a persistent index for instant future searches.
152+
153+
```bash
154+
# Index a single drive
155+
uffs index --drive C --output c_drive.parquet
156+
157+
# Index multiple drives
158+
uffs index --drives C,D,E --output all_drives.parquet
159+
```
160+
161+
### `uffs info`
162+
Display information about an index file.
163+
164+
```bash
165+
uffs info c_drive.parquet
166+
```
167+
168+
### `uffs stats`
169+
Show statistics about indexed files.
170+
171+
```bash
172+
uffs stats --index c_drive.parquet --top 20
173+
```
174+
175+
### `uffs save-raw`
176+
Save raw MFT bytes for offline analysis.
177+
178+
```bash
179+
uffs save-raw --drive C --output c_mft.raw --compress
180+
```
181+
182+
### `uffs load-raw`
183+
Load and parse a saved raw MFT file.
184+
185+
```bash
186+
uffs load-raw c_mft.raw --output parsed.parquet
187+
```
188+
189+
---
190+
191+
## 🏗️ Architecture
192+
193+
UFFS is built as a modular Rust workspace:
194+
195+
```
196+
crates/
197+
├── uffs-polars # Polars facade (compilation isolation)
198+
├── uffs-mft # Direct MFT reading → Polars DataFrame
199+
├── uffs-core # Query engine using Polars lazy API
200+
├── uffs-cli # Command-line interface
201+
├── uffs-tui # Terminal UI (interactive)
202+
└── uffs-gui # Graphical UI (future)
203+
```
204+
205+
### Key Features
206+
207+
- **Direct MFT Access**: Bypasses Windows file enumeration APIs
208+
- **Polars DataFrames**: Powerful, memory-efficient data manipulation
209+
- **Async I/O**: High-throughput disk reading with Tokio
210+
- **Parquet Persistence**: Compressed, columnar index storage
211+
- **Multi-drive Parallel Search**: Query all drives concurrently
212+
- **SIMD-accelerated Pattern Matching**: Fast glob and regex support
213+
214+
---
215+
216+
## ⚠️ Requirements
217+
218+
### Platform
219+
- **Windows only** for MFT reading (the core functionality)
220+
- Cross-platform for working with saved indexes
221+
222+
### Privileges
223+
- **Administrator privileges required** for direct MFT access
224+
- Windows will show a UAC prompt when running UFFS
225+
226+
### Build Requirements
227+
- Rust 1.85+ (Edition 2024)
228+
- Windows SDK (for MFT reading)
229+
230+
---
231+
232+
## 📊 Output Formats
233+
234+
### Console (default)
235+
Pretty-printed table output for terminal viewing.
236+
237+
### CSV
238+
```bash
239+
uffs "*.rs" --out results.csv --sep "," --header true
240+
```
241+
242+
### JSON
243+
```bash
244+
uffs "*.rs" --format json --out results.json
245+
```
246+
247+
### Parquet
248+
Indexes are stored in Parquet format for efficient storage and fast loading.
249+
250+
---
251+
252+
## 🔧 Advanced Usage
253+
254+
### Using with Polars/Pandas
255+
256+
Export to CSV or Parquet and load in your data analysis tools:
257+
258+
```python
259+
import polars as pl
260+
261+
# Load UFFS index
262+
df = pl.read_parquet("c_drive.parquet")
263+
264+
# Analyze file distribution
265+
df.group_by("extension").agg(
266+
pl.count().alias("count"),
267+
pl.col("size").sum().alias("total_size")
268+
).sort("total_size", descending=True)
269+
```
270+
271+
### Piping to Other Tools
272+
273+
```bash
274+
# Find large log files and process with grep
275+
uffs "*.log" --min-size 100MB --out console | grep "error"
276+
277+
# Export for further processing
278+
uffs "*" --columns path,size --out - | sort -t, -k2 -n -r | head -100
279+
```
280+
281+
---
282+
283+
## 📜 License
284+
285+
This project is licensed under the **Mozilla Public License 2.0 (MPL-2.0)**.
286+
287+
See [LICENSE](LICENSE) for details.
288+
289+
---
290+
291+
## 🙏 Acknowledgments
292+
293+
This Rust implementation is inspired by the original C++ UFFS, which was based on [SwiftSearch](https://sourceforge.net/projects/swiftsearch/) by wfunction.
294+
295+
---
296+
297+
## 📬 Contact
298+
299+
- **Author**: Robert Nio
300+
- **Repository**: [github.com/githubrobbi/UltraFastFileSearch](https://github.com/githubrobbi/UltraFastFileSearch)

0 commit comments

Comments
 (0)