Panic when running ntfs_shell::dir in home directory #10

leofidus · 2022-02-22T21:09:06Z

PS C:\Users\redacted\source\ntfs> .\target\debug\examples\ntfs_shell.exe \\.\C:
**********************************************************************
ntfs-shell - Demonstration of the ntfs Rust crate
by Colin Finck <colin@reactos.org>
**********************************************************************

Opened "\\.\C:" read-only.

ntfs-shell:\> cd Users
ntfs-shell:\Users> cd redacted
ntfs-shell:\Users\redacted> dir
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', C:\Users\redacted\source\ntfs\src\index_record.rs:68:51
stack backtrace:
   0: std::panicking::begin_panic_handler
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b\/library\std\src\panicking.rs:498
   1: core::panicking::panic_fmt
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b\/library\core\src\panicking.rs:107
   2: core::panicking::panic
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b\/library\core\src\panicking.rs:48
   3: enum$<core::option::Option<u64> >::unwrap<u64>
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b\library\core\src\option.rs:746
   4: ntfs::index_record::NtfsIndexRecord::new<std::io::buffered::bufreader::BufReader<ntfs_shell::sector_reader::SectorReader<std::fs::File> > >
             at .\src\index_record.rs:68
   5: ntfs::structured_values::index_allocation::NtfsIndexAllocation::record_from_vcn<std::io::buffered::bufreader::BufReader<ntfs_shell::sector_reader::SectorReader<std::fs::File> > >
             at .\src\structured_values\index_allocation.rs:65
   6: ntfs::index::NtfsIndexEntries<ntfs::indexes::file_name::NtfsFileNameIndex>::next<ntfs::indexes::file_name::NtfsFileNameIndex,std::io::buffered::bufreader::BufReader<ntfs_shell::sector_reader::SectorReader<std::fs::File> > >
             at .\src\index.rs:173
   7: ntfs_shell::dir<std::io::buffered::bufreader::BufReader<ntfs_shell::sector_reader::SectorReader<std::fs::File> > >
             at .\examples\ntfs-shell\main.rs:298
   8: ntfs_shell::main
             at .\examples\ntfs-shell\main.rs:85
   9: core::ops::function::FnOnce::call_once<enum$<core::result::Result<tuple$<>,anyhow::Error>, 1, 18446744073709551615, Err> (*)(),tuple$<> >
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b\library\core\src\ops\function.rs:227
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

I don't know what's causing it or how to narrow it down, but I'm happy to run it with a couple println statements or something if you can give me a clue where to start

vthib · 2022-02-24T10:52:07Z

This is probably the same issue and root-case that i stumbled upon, so i'm hijacking this issue with my analysis :)

I found that this stems from a subnode VCN value of 0 inside a NtfsAttributeListNonResidentAttributeValue.
In index.rs:177, the call to subnode_vcn looks like this:

index_allocation: NtfsIndexAllocation { ntfs: Ntfs { cluster_size: 4096, sector_size: 512, size: 106701438976, mft_position: 3221225472, file_record_size: 1024, serial_number: 9370544425783817375, upcase_table: None }, value: AttributeLis
tNonResident(NtfsAttributeListNonResidentAttributeValue { ntfs: Ntfs { cluster_size: 4096, sector_size: 512, size: 106701438976, mft_position: 3221225472, file_record_size: 1024, serial_number: 9370544425783817375, upcase_table: None }, initial_attribute_
list_entries: NtfsAttributeListEntries { attribute_list: Resident([160, 0, 0, 0, 40, 0, 4, 26, 0, 0, 0, 0, 0, 0, 0, 0, 48, 84, 0, 0, 0, 0, 1, 0, 1, 0, 36, 0, 73, 0, 51, 0, 48, 0, 0, 0, 0, 0, 0, 0, 176, 0, 0, 0, 40, 0, 4, 26, 0, 0, 0, 0, 0, 0, 0, 0, 48, 84
, 0, 0, 0, 0, 1, 0, 2, 0, 36, 0, 73, 0, 51, 0, 48, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 40, 0, 4, 26, 0, 0, 0, 0, 0, 0, 0, 0, 48, 84, 0, 0, 0, 0, 1, 0, 3, 0, 36, 0, 68, 0, 83, 0, 67, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 48, 0, 9, 26, 0, 0, 0, 0, 0, 0, 0, 0, 69, 2
2, 0, 0, 0, 0, 1, 0, 12, 0, 36, 0, 84, 0, 88, 0, 70, 0, 95, 0, 68, 0, 65, 0, 84, 0, 65, 0, 0, 0, 0, 0], 3227063552) }, connected_entries: AttributeListConnectedEntries { attribute_list_entries: Some(NtfsAttributeListEntries { attribute_list: Resident([160
, 0, 0, 0, 40, 0, 4, 26, 0, 0, 0, 0, 0, 0, 0, 0, 48, 84, 0, 0, 0, 0, 1, 0, 1, 0, 36, 0, 73, 0, 51, 0, 48, 0, 0, 0, 0, 0, 0, 0, 176, 0, 0, 0, 40, 0, 4, 26, 0, 0, 0, 0, 0, 0, 0, 0, 48, 84, 0, 0, 0, 0, 1, 0, 2, 0, 36, 0, 73, 0, 51, 0, 48, 0, 0, 0, 0, 0, 0, 0
, 0, 1, 0, 0, 40, 0, 4, 26, 0, 0, 0, 0, 0, 0, 0, 0, 48, 84, 0, 0, 0, 0, 1, 0, 3, 0, 36, 0, 68, 0, 83, 0, 67, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 48, 0, 9, 26, 0, 0, 0, 0, 0, 0, 0, 0, 69, 22, 0, 0, 0, 0, 1, 0, 12, 0, 36, 0, 84, 0, 88, 0, 70, 0, 95, 0, 68, 0,
65, 0, 84, 0, 65, 0, 0, 0, 0, 0], 3227063552) }), instance: 1, ty: IndexAllocation }, data_size: 10747904, attribute_state: None, stream_state: StreamState { stream_data_run: None, stream_position: 0, data_size: 10747904 } }) },
index_record_size: 4096,
vcn: 0

The issue, afaict, is that the seek implementation for NtfsAttributeListNonResidentAttributeValue does not handle a vcn value of 0. It expects data_to_seek to be strictly positive, but do not handle it being 0 and walking inside the next data run of the current attribute, or the first data run of the next connect attribute.
I don't know enough about NTFS whether this is a valid case or if some parsing failed earlier, but it looks legit from what i've seen, and changing the check on data_to_seek into a loop seems to fix the issue.
I have attached a tentative PR for this: #11

leofidus · 2022-02-25T10:40:30Z

You're right, I just tested #11 and it does fix it the issue for me too.

ColinFinck · 2022-03-07T19:15:48Z

@leofidus @vthib Thanks a lot for your bug report and the analysis!
I dug into the related code again and I'm afraid this requires more than a 1-line fix to handle all such cases:

NtfsNonResidentAttributeValue and NtfsAttributeListNonResidentAttributeValue both share code in StreamState to read and seek in Data Runs.
When StreamState is created via StreamState::new, its stream_data_run field is initialized to None. The field is only filled with an actual Data Run when the first non-zero read or seek is performed. That way, StreamState::new remains simple and we only deal with Data Runs when actually required. There are also other cases when stream_data_run may be None:

We seek past the end of all Data Runs
We encounter a non-resident attribute with zero Data Runs (may not be valid as per NTFS spec, but the crate must not panic here)

The design above gets problematic as soon as I want to return the absolute filesystem byte position of the current seek position. I only have a valid value for that if stream_data_run is filled with a Some value.
Additionally, the Data Run itself can only know about its absolute seek position if we haven't seeked past its bounds and it's not a sparse Data Run. This is why all data_position functions return an Option<u64> instead of a simple definite u64.

Now I have multiple other parts of the code, which urgently need to know the absolute filesystem byte position, and can't just deal with an Option.
They all use .data_position().unwrap() and are supposed to previously check the conditions, so that .unwrap() cannot panic. That was thought to be easy: As that code also performs the seeks, it can make sure that it seeks within the bounds. It can also check for sparse Data Runs.
However, I must have missed that the data_position function of NtfsNonResidentAttributeValue and NtfsAttributeListNonResidentAttributeValue may also return None when their stream_data_run is None. The function-level comment doesn't mention that possibility.
Actually, I didn't entirely miss it. NtfsNonResidentAttributeValue::new has extra code to mitigate that:

ntfs/src/attribute_value/non_resident.rs

Lines 48 to 52 in be1fc19

    
           // Get the first Data Run already here to let `data_position` return something meaningful. 
        
           if let Some(stream_data_run) = stream_data_runs.next() { 
        
               let stream_data_run = stream_data_run?; 
        
               stream_state.set_stream_data_run(stream_data_run); 
        
           }

This mitigation fails though when you seek to SeekFrom::Start(0).

NtfsAttributeListNonResidentAttributeValue misses any such workarounds, which is why you're encountering this bug in the first place.
PR #11 mitigates this bug, but only if you seek before calling data_position. It panics the same way if you don't seek at all.
None of the mitigations work when a non-resident attribute has no Data Runs at all.

I need some more time to think this through and come up with a definite solution that works in all cases.

Perhaps I can actually get the callers of data_position to handle an Option instead of unwrapping it. They usually don't work on it themselves, but pass it through to other functions. Could require quite some refactoring.
Maybe some of them don't even need the absolute byte position at all. I mostly use it to provide informative error messages.
I might also be able to just add some missing checks before unwrapping, and can thereby ensure that the Option is a Some value.

This may take some time. In any case, thanks for uncovering this bug and the related #9!

leofidus · 2022-03-07T22:05:28Z

I think #9 is a good example of how the error types should have position as an Option<u64>. Then the code would generate a error message

NtfsError::InvalidStructuredValueSize {
  position: None,
  ty: NtfsAttributeType::VolumeName,
  expected: 1,
  actual: 0,
}

which is completely reasonable (why would there be a data run if there is no data). From a cursory look changing the error types to an option and passing the option along as long as possible would also get rid of a lot of unwraps. It wouldn't solve #9 or #10 on its own, but I think it's a reasonable starting point.

@leofidus

`NtfsPosition` is built around an `Option<NonZeroU64>`. This allows functions to gracefully return that they currently don't have a valid position, without being larger than an `u64` in memory. The change allows us to get rid of lots of possibly panicking `unwrap` calls. Recent commits have shown that there are more valid cases where the `data_position` functions could return `None`. Examples are non-resident attributes without any Data Runs or non-resident attributes beginning with a sparse Data Run. Maybe even zero-length Data Runs are possible. In any of these cases, we mustn't unwrap the return value of `data_position`. Almost always, these positions are only used to give better error messages and assist with debugging anyway. Hence, their calculation should never cause a panic. Thanks to @leofidus for the inspiration in #10

This was referenced Mar 8, 2022

Panic on empty volume name #9

Closed

2MB clusters lead to "The cluster size is 124928 bytes, but the maximum supported one is 2097152" #12

Closed

ColinFinck closed this as completed in 93ce1f4 Mar 13, 2022

ColinFinck mentioned this issue Mar 13, 2022

fix 0-length seek inside a non-resident attribute list #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic when running ntfs_shell::dir in home directory #10

Panic when running ntfs_shell::dir in home directory #10

leofidus commented Feb 22, 2022

vthib commented Feb 24, 2022

leofidus commented Feb 25, 2022

ColinFinck commented Mar 7, 2022

leofidus commented Mar 7, 2022

Panic when running ntfs_shell::dir in home directory #10

Panic when running ntfs_shell::dir in home directory #10

Comments

leofidus commented Feb 22, 2022

vthib commented Feb 24, 2022

leofidus commented Feb 25, 2022

ColinFinck commented Mar 7, 2022

leofidus commented Mar 7, 2022