Skip to content

Commit 82eaaf6

Browse files
committed
chore: development v0.2.38 - comprehensive testing complete [auto-commit]
1 parent ce19bdf commit 82eaaf6

15 files changed

+798
-34
lines changed

Cargo.lock

Lines changed: 30 additions & 30 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ exclude = [
3737
# Workspace Package Metadata (inherited by all crates)
3838
# ─────────────────────────────────────────────────────────────────────────────
3939
[workspace.package]
40-
version = "0.2.37"
40+
version = "0.2.38"
4141
edition = "2024"
4242
rust-version = "1.85"
4343
license = "MPL-2.0 OR LicenseRef-UFFS-Commercial"
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Healing log (draft) - MFT parent placeholders / missing parents
2+
3+
## Context
4+
5+
- Symptom: On F: drive, Rust UFFS CLI produced many `<dir:FRS>` / `<unknown:FRS>` placeholder parents and under-reported file counts compared to C++.
6+
- Investigation showed thousands of `parent_frs` values in `f_mft.parquet` without corresponding directory rows, especially in high-FRS ranges.
7+
- Cross-checks against `f_mft_reference.csv` from the vendored `mft-reader-rs` showed that many of these missing parents were in-use, base directory records in the reference output.
8+
- We previously fixed a major `$MFT` extent bug and introduced placeholder parent rows in the Rust reader to match C++ `at()` behavior.
9+
10+
## Changes in this round
11+
12+
1. **Diag: header + full parse inspection for specific FRS**
13+
14+
- Added `crates/uffs-diag/src/bin/inspect_mft_record_flow.rs` to:
15+
- Load `f_mft.raw` via `uffs_mft::raw::load_raw_mft`.
16+
- For selected FRS, dump the local `FileRecordSegmentHeader` fields (magic, USA, flags, base reference).
17+
- On Windows, call into the real `apply_fixup` + `parse_record_full` pipeline to see exactly how the core reader treats the record.
18+
- Added `crates/uffs-diag/src/bin/uffs_mft_helpers_windows.rs` (Windows-only) to host the helper that runs `apply_fixup` and `parse_record_full` on a single FRS.
19+
20+
2. **Diag: magic distribution scanner**
21+
22+
- Implemented `crates/uffs-diag/src/bin/scan_mft_magic.rs` to scan all records in `f_mft.raw` and classify the NTFS magic (`FILE`, `RCRD`, `INDX`, `ZERO`, `OTHER`) by buckets of FRS.
23+
- This showed that in some earlier snapshots, high FRS ranges had few `FILE` records and many `RCRD`/`ZERO` entries, which correlated with missing parents.
24+
25+
3. **Parent placeholder creation in the reader**
26+
27+
- In `crates/uffs-mft/src/io.rs` we now have `create_placeholder_record(frs: u64) -> ParsedRecord` and `add_missing_parent_placeholders_to_vec` / `ParsedColumns::add_missing_parent_placeholders`.
28+
- These functions:
29+
- Detect any `parent_frs` referenced by parsed records that do not have a corresponding row.
30+
- Synthesize a minimal directory record with name `<dir:FRS>`, parent FRS defaulting to 5 (root) and `is_directory = true`.
31+
- Repeat until closure so that chains of missing parents also get placeholders.
32+
- This matches C++ `at()` semantics and dramatically reduces `<unknown:FRS>` failures in `FastPathResolver`.
33+
34+
4. **Path resolver understanding**
35+
36+
- Re‑reviewed `crates/uffs-core/src/path_resolver.rs`:
37+
- `FastPathResolver::build` constructs a Vec-backed FRS→(parent, name) map from the full MFT DataFrame.
38+
- When a parent FRS is missing entirely, `format_partial_path` emits `<unknown:FRS>` with a partial path suffix.
39+
- With parent placeholders present in the DataFrame, these `<unknown:FRS>` cases should become rare and traceable to genuinely unrecoverable parents.
40+
41+
## Validation (so far)
42+
43+
- `cargo check -p uffs-diag --bin inspect_mft_record_flow` passes.
44+
- `cargo run -p uffs-diag --bin inspect_mft_record_flow -- docs/trial_runs/UltraFastFileSearch/f_mft.raw <frs...>`:
45+
- Confirms header sanity for selected FRS and, on Windows, shows `parse_record_full` outcomes (Base/Extension/Skip).
46+
- `cargo run -p uffs-diag --bin scan_mft_magic -- docs/trial_runs/UltraFastFileSearch/f_mft.raw [bucket]`:
47+
- Confirms `FILE` magic distribution and highlights problematic FRS buckets.
48+
- `cargo run -p uffs-diag --release --bin cross_check_mft_reference -- docs/trial_runs/UltraFastFileSearch/f_mft_reference.csv docs/trial_runs/UltraFastFileSearch/f_mft.parquet`:
49+
- For joined FRS, `IsDirectory` (CSV) and `is_directory` (Parquet) agree 100%.
50+
- There remain reference-only parents with many children in Parquet; placeholders in Rust are used to keep paths resolvable while we continue investigating raw header / extent causes for those gaps.
51+
52+
## Next steps
53+
54+
- On Windows, run `inspect_mft_record_flow` for high-impact parent FRS (e.g., 2640657, 2631176, 2628892, 2628924, 2627024) on a freshly captured `f_mft.raw` that matches the CSV snapshot.
55+
- If `apply_fixup` + `parse_record_full` still drop any in-use base directories that the reference reader keeps, adjust `parse_record_full` / merger semantics to accept them (with tests).
56+
- Re‑run `cross_check_mft_reference` and `analyze_mft_parents` with synchronized artifacts to verify that missing parent counts are bounded and explainable.
57+
- Once raw + parse semantics are proven, tighten path resolver behavior and document remaining `<dir:FRS>` / `<unknown:FRS>` cases as genuine on-disk anomalies rather than reader bugs.
58+

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Traditional file search tools (including `os.walk`, `FindFirstFile`, etc.) work
2121

2222
**UFFS reads the MFT directly** - once - and queries it in memory using Polars DataFrames. This is like reading the entire phonebook once instead of looking up each name individually.
2323

24-
### Benchmark Results (v0.2.37)
24+
### Benchmark Results (v0.2.38)
2525

2626
| Drive Type | Records | Time | Throughput |
2727
|------------|---------|------|------------|
@@ -33,7 +33,7 @@ Traditional file search tools (including `os.walk`, `FindFirstFile`, etc.) work
3333

3434
| Comparison | Records | Time | Notes |
3535
|------------|---------|------|-------|
36-
| **UFFS v0.2.37** | **18.7 Million** | **~142 seconds** | All disks, fast mode |
36+
| **UFFS v0.2.38** | **18.7 Million** | **~142 seconds** | All disks, fast mode |
3737
| UFFS v0.1.30 | 18.7 Million | ~315 seconds | Baseline |
3838
| Everything | 19 Million | 178 seconds | All disks |
3939
| WizFile | 6.5 Million | 299 seconds | Single HDD |

crates/uffs-diag/Cargo.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,10 @@ path = "src/bin/scan_mft_magic.rs"
4444
name = "dump_mft_extents"
4545
path = "src/bin/dump_mft_extents.rs"
4646

47+
[[bin]]
48+
name = "cross_check_mft_reference"
49+
path = "src/bin/cross_check_mft_reference.rs"
50+
4751
# ─────────────────────────────────────────────────────────────────────────────
4852
# Dependencies (minimal set for diagnostic tools)
4953
# ─────────────────────────────────────────────────────────────────────────────

0 commit comments

Comments
 (0)