|
| 1 | +# Healing log (draft) - MFT parent placeholders / missing parents |
| 2 | + |
| 3 | +## Context |
| 4 | + |
| 5 | +- Symptom: On F: drive, Rust UFFS CLI produced many `<dir:FRS>` / `<unknown:FRS>` placeholder parents and under-reported file counts compared to C++. |
| 6 | +- Investigation showed thousands of `parent_frs` values in `f_mft.parquet` without corresponding directory rows, especially in high-FRS ranges. |
| 7 | +- Cross-checks against `f_mft_reference.csv` from the vendored `mft-reader-rs` showed that many of these missing parents were in-use, base directory records in the reference output. |
| 8 | +- We previously fixed a major `$MFT` extent bug and introduced placeholder parent rows in the Rust reader to match C++ `at()` behavior. |
| 9 | + |
| 10 | +## Changes in this round |
| 11 | + |
| 12 | +1. **Diag: header + full parse inspection for specific FRS** |
| 13 | + |
| 14 | + - Added `crates/uffs-diag/src/bin/inspect_mft_record_flow.rs` to: |
| 15 | + - Load `f_mft.raw` via `uffs_mft::raw::load_raw_mft`. |
| 16 | + - For selected FRS, dump the local `FileRecordSegmentHeader` fields (magic, USA, flags, base reference). |
| 17 | + - On Windows, call into the real `apply_fixup` + `parse_record_full` pipeline to see exactly how the core reader treats the record. |
| 18 | + - Added `crates/uffs-diag/src/bin/uffs_mft_helpers_windows.rs` (Windows-only) to host the helper that runs `apply_fixup` and `parse_record_full` on a single FRS. |
| 19 | + |
| 20 | +2. **Diag: magic distribution scanner** |
| 21 | + |
| 22 | + - Implemented `crates/uffs-diag/src/bin/scan_mft_magic.rs` to scan all records in `f_mft.raw` and classify the NTFS magic (`FILE`, `RCRD`, `INDX`, `ZERO`, `OTHER`) by buckets of FRS. |
| 23 | + - This showed that in some earlier snapshots, high FRS ranges had few `FILE` records and many `RCRD`/`ZERO` entries, which correlated with missing parents. |
| 24 | + |
| 25 | +3. **Parent placeholder creation in the reader** |
| 26 | + |
| 27 | + - In `crates/uffs-mft/src/io.rs` we now have `create_placeholder_record(frs: u64) -> ParsedRecord` and `add_missing_parent_placeholders_to_vec` / `ParsedColumns::add_missing_parent_placeholders`. |
| 28 | + - These functions: |
| 29 | + - Detect any `parent_frs` referenced by parsed records that do not have a corresponding row. |
| 30 | + - Synthesize a minimal directory record with name `<dir:FRS>`, parent FRS defaulting to 5 (root) and `is_directory = true`. |
| 31 | + - Repeat until closure so that chains of missing parents also get placeholders. |
| 32 | + - This matches C++ `at()` semantics and dramatically reduces `<unknown:FRS>` failures in `FastPathResolver`. |
| 33 | + |
| 34 | +4. **Path resolver understanding** |
| 35 | + |
| 36 | + - Re‑reviewed `crates/uffs-core/src/path_resolver.rs`: |
| 37 | + - `FastPathResolver::build` constructs a Vec-backed FRS→(parent, name) map from the full MFT DataFrame. |
| 38 | + - When a parent FRS is missing entirely, `format_partial_path` emits `<unknown:FRS>` with a partial path suffix. |
| 39 | + - With parent placeholders present in the DataFrame, these `<unknown:FRS>` cases should become rare and traceable to genuinely unrecoverable parents. |
| 40 | + |
| 41 | +## Validation (so far) |
| 42 | + |
| 43 | +- `cargo check -p uffs-diag --bin inspect_mft_record_flow` passes. |
| 44 | +- `cargo run -p uffs-diag --bin inspect_mft_record_flow -- docs/trial_runs/UltraFastFileSearch/f_mft.raw <frs...>`: |
| 45 | + - Confirms header sanity for selected FRS and, on Windows, shows `parse_record_full` outcomes (Base/Extension/Skip). |
| 46 | +- `cargo run -p uffs-diag --bin scan_mft_magic -- docs/trial_runs/UltraFastFileSearch/f_mft.raw [bucket]`: |
| 47 | + - Confirms `FILE` magic distribution and highlights problematic FRS buckets. |
| 48 | +- `cargo run -p uffs-diag --release --bin cross_check_mft_reference -- docs/trial_runs/UltraFastFileSearch/f_mft_reference.csv docs/trial_runs/UltraFastFileSearch/f_mft.parquet`: |
| 49 | + - For joined FRS, `IsDirectory` (CSV) and `is_directory` (Parquet) agree 100%. |
| 50 | + - There remain reference-only parents with many children in Parquet; placeholders in Rust are used to keep paths resolvable while we continue investigating raw header / extent causes for those gaps. |
| 51 | + |
| 52 | +## Next steps |
| 53 | + |
| 54 | +- On Windows, run `inspect_mft_record_flow` for high-impact parent FRS (e.g., 2640657, 2631176, 2628892, 2628924, 2627024) on a freshly captured `f_mft.raw` that matches the CSV snapshot. |
| 55 | +- If `apply_fixup` + `parse_record_full` still drop any in-use base directories that the reference reader keeps, adjust `parse_record_full` / merger semantics to accept them (with tests). |
| 56 | +- Re‑run `cross_check_mft_reference` and `analyze_mft_parents` with synchronized artifacts to verify that missing parent counts are bounded and explainable. |
| 57 | +- Once raw + parse semantics are proven, tighten path resolver behavior and document remaining `<dir:FRS>` / `<unknown:FRS>` cases as genuine on-disk anomalies rather than reader bugs. |
| 58 | + |
0 commit comments