Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DirTraversal #531

Merged
merged 6 commits into from
Jan 1, 2022
Merged

Conversation

pczarn
Copy link
Contributor

@pczarn pczarn commented Dec 29, 2021

Uses a builder pattern to provide a common mechanism for file and directory search.

TODO: migrate other search categories.

@pczarn
Copy link
Contributor Author

pczarn commented Dec 29, 2021

@qarmin what do you think about the code, is this the right direction?

@qarmin
Copy link
Owner

qarmin commented Dec 30, 2021

Removing duplicated code is always good, but I have some concerns about it

  • performance - slightly lower performance at the expense of unification is a small price to pay but I don't benchmarked this yet(I think that checking by name or size may have the biggest impact in performance)
  • cache size - the biggest concern I have about cache size. Entire FileEntry struct is used in Duplicate, Similar Images, Similar Videos and Broken Files to save data to cache, so struct as small as possible is required to be able to quicly load and save data to cache. Adding next fields like e.g. image hash(which is Vec<>) will increase size of cache. I don't have for now any numbers, so it is possible that cache size may increase only in e.g. 5-10% which is acceptable.

@pczarn
Copy link
Contributor Author

pczarn commented Dec 30, 2021

So I suppose after you will check performance, we can merge this PR. I can add memory statistics in another PR according to this post https://stackoverflow.com/questions/30869007/how-to-benchmark-memory-usage-of-a-function

@pczarn
Copy link
Contributor Author

pczarn commented Dec 30, 2021

@qarmin I believe small cache size can be retained with an implementation that is generic over the type of additional file entry information. I can do this in a future PR. How about we divide the work over multiple PRs for easier code review?

@qarmin
Copy link
Owner

qarmin commented Dec 31, 2021

Looks that PR is OK and I could merge it because I didn't find any performance degradation and also cache for now is same as before(only duplicates from changed modes in this PR uses cache system, but it is my custom implementation(similar image use bincode)).

The only problem is with invalid symlinks, because a little after start I have this crash

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', czkawka_core/src/common_dir_traversal.rs:636:45
stack backtrace:
   0: rust_begin_unwind
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:517:5
   1: core::panicking::panic_fmt
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:100:14
   2: core::panicking::panic
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:50:5
   3: czkawka_core::common_dir_traversal::set_as_not_empty_folder
   4: czkawka_core::common_dir_traversal::DirTraversal<F>::run
   5: czkawka_core::invalid_symlinks::InvalidSymlinks::find_invalid_links
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

@pczarn
Copy link
Contributor Author

pczarn commented Jan 1, 2022

@qarmin I could not reproduce the problem on my system. Could you send me test data? I would like to make a regression test case.

Even though I could not reproduce the problem, or figure it out by reading the code, I found bugs in my code. Please try symlinks again with the latest commit -- they should work.

} else if metadata.is_file() {
entry_type = EntryType::File;
} else {
unreachable!();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reachable and should be ignored

thread '<unnamed>' panicked at 'internal error: entered unreachable code', czkawka_core/src/common_dir_traversal.rs:398:29
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
drwx------  2 rafal rafal 4096 sty  1 10:42 .
drwx------ 56 rafal rafal 4096 sty  1 07:43 ..
-rw-------  1 rafal rafal 1812 gru 11 09:25 Configuration.xml
prw-------  1 rafal rafal    0 sty  1 07:12 .show-request-queue <---------- This cause crash

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qarmin fixed in the most recent commit

@qarmin
Copy link
Owner

qarmin commented Jan 1, 2022

I tried several times to find symlink which cause crash with older version of this PR, but I couldn't find it and looks that only crash happens with specific combination of folders and files.

@qarmin qarmin merged commit 51df63b into qarmin:master Jan 1, 2022
@qarmin
Copy link
Owner

qarmin commented Jan 1, 2022

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants