Description
A lot of people have a tendency to assume files and directories are all there are they have to think about, and that if its not a file, its clearly a directory, and vice versa.
The documentation even suggests this if you don't think it through, by saying that is_file()
and is_dir()
are mutually exclusive.
However, not all things that return false from is_file()
are directories, and plenty of special file types return false for both is_dir()
and is_file()
A lot of the time, when people make the is_file()
check, they're not asking "is this a regular file as opposed to a special one", they're asking "is this a thing that I can read and get bytes from".
A particular situation where one might be calling is_file()
and prematurely rejecting it is when the file in question is a user specified path, and that specified path is a pipe, as constructed by the bash command:
fooprogram <( program that outputs )
fooprogram
gets passed /dev/fd/63
(or similar) as the first parameter, and that is not a regular file, but it is for all general purposes something that can be safely treated as a file.
Subsequently, the better advice here is to check for if !file.is_dir()
in preparation for doing reads from it.
Excerpts from man 7 inode
The following mask values are defined for the file type:
S_IFMT 0170000 bit mask for the file type bit field
S_IFSOCK 0140000 socket
S_IFLNK 0120000 symbolic link
S_IFREG 0100000 regular file
S_IFBLK 0060000 block device
S_IFDIR 0040000 directory
S_IFCHR 0020000 character device
S_IFIFO 0010000 FIFO
Thus, to test for a regular file (for example), one could write:
stat(pathname, &sb);
if ((sb.st_mode & S_IFMT) == S_IFREG) {
/* Handle regular file */
}
strace output of an example of this usage (noting the value of st_mode):
strace -v cat <( echo hello )
execve("/bin/cat", ["cat", "/dev/fd/63"], .... )
...
openat(AT_FDCWD, "/dev/fd/63", O_RDONLY) = 3
fstat(3, {
st_dev=makedev(0, 0xb),
st_ino=63079295,
st_mode=S_IFIFO|0600,
st_nlink=1,
st_uid=1000,
st_gid=1000,
st_blksize=4096,
st_blocks=0,
st_size=0,
st_atime=1567670286 /* 2019-09-05T19:58:06.632258848+1200 */,
st_atime_nsec=632258848,
st_mtime=1567670286 /* 2019-09-05T19:58:06.633258850+1200 */,
st_mtime_nsec=633258850,
st_ctime=1567670286 /* 2019-09-05T19:58:06.633258850+1200 */,
st_ctime_nsec=633258850
}) = 0
...
read(3, "hello\n", 131072) = 6
write(1, "hello\n", 6) = 6
read(3, "", 131072) = 0
close(3) = 0
close(1) = 0
Current rust code for sys::unix::fs::FileType
https://github.com/rust-lang/rust/blob/a24f636e60a5da57ab641d800ac5952bbde98b65/src/libstd/sys/unix/fs.rs#L195-L201
impl FileType {
pub fn is_dir(&self) -> bool { self.is(libc::S_IFDIR) }
pub fn is_file(&self) -> bool { self.is(libc::S_IFREG) }
pub fn is_symlink(&self) -> bool { self.is(libc::S_IFLNK) }
pub fn is(&self, mode: mode_t) -> bool { self.mode & libc::S_IFMT == mode }
}
But it may be worth noting the users assumptions do seem to hold true on Windows, where "is_file()" is just shorthand for "!is_symlink() && !is_directory()". But even then, if you're only asking "Can I read a thing", the "!is_symlink()" check is something you don't want to be doing, so even there, the "!is_dir()" is possibly the right call. Ugh.
impl FileType {
fn new(attrs: c::DWORD, reparse_tag: c::DWORD) -> FileType {
FileType {
attributes: attrs,
reparse_tag: reparse_tag,
}
}
pub fn is_dir(&self) -> bool {
!self.is_symlink() && self.is_directory()
}
pub fn is_file(&self) -> bool {
!self.is_symlink() && !self.is_directory()
}
pub fn is_symlink(&self) -> bool {
self.is_reparse_point() && self.is_reparse_tag_name_surrogate()
}
pub fn is_symlink_dir(&self) -> bool {
self.is_symlink() && self.is_directory()
}
pub fn is_symlink_file(&self) -> bool {
self.is_symlink() && !self.is_directory()
}
fn is_directory(&self) -> bool {
self.attributes & c::FILE_ATTRIBUTE_DIRECTORY != 0
}
fn is_reparse_point(&self) -> bool {
self.attributes & c::FILE_ATTRIBUTE_REPARSE_POINT != 0
}
fn is_reparse_tag_name_surrogate(&self) -> bool {
self.reparse_tag & 0x20000000 != 0
}
}