Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dir::get_size's handling of symlinks #60

Merged
merged 3 commits into from
Sep 20, 2022

Conversation

edmorley
Copy link
Contributor

@edmorley edmorley commented Jul 1, 2022

This makes dir::get_size give a correct calculation of the size of a path of directory, that now matches that returned by eg: du --bytes -s <path>. (Due to changes requested during review, this is no longer the case.)

Before:

Now:

  • Symlinks are no longer followed (fixing the regression from Fix get size (and much more) #34), so that only the size of the symlink itself is included (as expected), and not the size of the symlink's target.
  • If the initial path passed into dir::get_size is a directory, the size of that directory entry itself is now correctly counted (like it was already being counted for subdirectories). Due to changes requested during review, this is no longer the case. The metadata size of the directory entries (entries only, not the contents) is now excluded entirely from the total size.

In addition, the size calculations are performed using DirEntry::metadata, rather than unnecessarily converting to a path and then passing that path to fs::{metadata,symlink_metadata}.

Fixes #25.
Fixes #59.

This makes `dir::get_size` give a correct calculation of the size
of a path of directory, that now matches that returned by eg:
`du --bytes -s <path>`.

Before:
- Symlinks were followed, causing double-counting of symlinked
  files (see webdesus#59).
- Whilst the metadata size of subdirectory entries was counted
  correctly, the metadata size of the parent directory (the initial
  `path` passed into `dir::get_size`) was not counted. See:
   webdesus#25 (comment)

Now:
- Symlinks are no longer followed (fixing the regression from webdesus#34),
   so that only the size of the symlink itself is included (as expected),
   and not the size of the symlink's target.
- If the initial `path` passed into `dir::get_size` is a directory, the
   size of that directory entry itself is now correctly counted (like it
   was already being counted for subdirectories).

Fixes webdesus#25.
Fixes webdesus#59.
@edmorley edmorley changed the title Fix dir::get_size's handling of symlinks Fix dir::get_size's handling of symlinks + directories Jul 1, 2022
@webdesus
Copy link
Owner

webdesus commented Jul 5, 2022

Hi @edmorley! Thank you for your incredible work!!! I check and fix fall tests this week and will accept this PR😊

@edmorley
Copy link
Contributor Author

edmorley commented Jul 5, 2022

@webdesus Thank you! Did some tests fail locally for you? The test I added passed locally for me (macOS).

@webdesus
Copy link
Owner

webdesus commented Jul 11, 2022

@edmorley yes locally, not yours test, others:

    it_details_item_dir
    it_details_item_dir_short
    it_get_file_size
    it_ls

I looked at the reason. I made this decision on purpose not to count the size of the folders.
My point is:

  • We copy only content. Folders are created again.
  • Different file systems have different sizes of folders. And if we copy data to another file system size can be different
  • ext4 folders can be large(3 MB and more), but after copying in the same file system (ext4), the folder again will be 4kb.
  • Many file management programs have the same logic.

I suggest the following edit:

    let mut size_in_bytes = 0; // FIRST  CHANGE. 

    if path_metadata.is_dir() {
        for entry in read_dir(&path)? {
            let entry = entry?;
          
            let entry_metadata = entry.metadata()?;

            if entry_metadata.is_dir() {
                size_in_bytes += get_size(entry.path())?;
            } else {
                size_in_bytes += entry_metadata.len();
            }
        }
    }else{
        size_in_bytes = path_metadata.len(); // SECOND CHANGE
    }

Сonsider only files.

What do you think about it?

@edmorley
Copy link
Contributor Author

@webdesus Hi! Thank you for the reply.

So for me dir::get_size is about the size of a directory right now, not about the size of copying. The PR at the moment matches the result returned by du, which is why I feel it makes the most sense.

However if you prefer for dir::get_size not to count directory sizes, then I'm not going to strongly object (since the most important fix here is the symlink fix, and as long as that is merged, the rest is less important).

However, if dir::get_size doesn't count directory sizes, I feel this would be surprising to a lot of people, so should be mentioned in the rustdocs at the very least.

@edmorley
Copy link
Contributor Author

edmorley commented Aug 9, 2022

@webdesus Hope you are well :-) Any thoughts on the above?

@webdesus
Copy link
Owner

Hi, @edmorley!
Very big sorry for the delayed response. I didn't have time for open source all this time. First of all, I thought about how we could do a different method that would include folder size, but then I decided to use the old way of calculating size without folder size.
I pushed changes, and if you don't mind, I can merge them.

@edmorley
Copy link
Contributor Author

@webdesus No problem about the delay, I know the feeling! :-)

I'm fine with you merging that change, thank you. (I still think the implementation should count directory sizes too, but will just be happy to get the other fixes in, since they are a bigger problem.)

@edmorley edmorley changed the title Fix dir::get_size's handling of symlinks + directories Fix dir::get_size's handling of symlinks Sep 20, 2022
@webdesus
Copy link
Owner

I still think the implementation should count directory sizes too

I think it is sometimes reasonable too. But this API work a long time, and I think we can do it in another feature.

@webdesus webdesus merged commit bb12c0d into webdesus:master Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dir::get_size returns wrong size for symlinks Unexpected behaviour of dir::get_size
2 participants