Skip to content

HADOOP-17060. listStatus and getFileStatus behave inconsistent in the case of ViewFs implementation for isDirectory #2061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 10, 2020

Conversation

umamaheswararao
Copy link
Contributor

@umamaheswararao
Copy link
Contributor Author

Here is a draft patch to see how the yetus report for this change. I ran locally some tests, most of them passed. Since it's in HADOOP, I don't think HDFS tests will run. I ran few tests, most of them passed.

I have question on following assert:
In FileStatus.java:

// The variables isdir and symlink indicate the type:
 // 1. isdir implies directory, in which case symlink must be null.
 // 2. !isdir implies a file or symlink, symlink != null implies a
 //    symlink, otherwise it's a file.
 assert (isdir && symlink == null) || !isdir;

This assert is telling that, if it's a dir, symlink should be null or it can be a just file. This means, if it's a symlink, it can be only file.

I have tested in my macos about this semantics locally :

I have create symLink with srcLink --> target
lrwxr-xr-x   1 umagangumalla  staff     6 Jun  6 15:54 srcLink -> target

  //Like getFileStatus
System.out.println("===== Like getFileStatus ========");
System.out.println("path:"+java.nio.file.Paths.get("/Users/umagangumalla/Work/repos/PRs/srcLink"));
System.out.println("isDir:"+Files.isDirectory(java.nio.file.Paths.get("/Users/umagangumalla/Work/repos/PRs/srcLink"))); System.out.println("isSymlink:"+Files.isSymbolicLink(java.nio.file.Paths.get("/Users/umagangumalla/Work/repos/PRs/srcLink") ));

The output was:
===== Like getFileStatus ========
path:/Users/umagangumalla/Work/repos/PRs/srcLink
isDir:true
isSymlink:true

This says, isDir and isSymlink can be true. This patch supports this behavior. So, I am not very sure on that assert, why that was added. I have removed it to support this case now, otherwise, that assert will make flow fail in dev envs. If someone knows about it, please comment. Thanks

CC: @shvachko @szetszwo @jnp @arp7

@umamaheswararao
Copy link
Contributor Author

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 33s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 19m 25s trunk passed
+1 💚 compile 16m 59s trunk passed
+1 💚 checkstyle 0m 55s trunk passed
+1 💚 mvnsite 1m 26s trunk passed
+1 💚 shadedclient 16m 19s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 5s trunk passed
+0 🆗 spotbugs 2m 10s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 9s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 49s the patch passed
+1 💚 compile 16m 19s the patch passed
+1 💚 javac 16m 20s the patch passed
-0 ⚠️ checkstyle 0m 56s hadoop-common-project/hadoop-common: The patch generated 2 new + 216 unchanged - 0 fixed = 218 total (was 216)
+1 💚 mvnsite 1m 27s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 13m 53s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 2s the patch passed
+1 💚 findbugs 2m 14s the patch passed
_ Other Tests _
+1 💚 unit 9m 16s hadoop-common in the patch passed.
+1 💚 asflicense 0m 53s The patch does not generate ASF License warnings.
106m 52s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/1/artifact/out/Dockerfile
GITHUB PR #2061
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 5363f53dccdf 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 3ca1529
Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/1/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/1/testReport/
Max. process+thread count 1478 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/1/console
versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

… case of ViewFs implementation for isDirectory
@ayushtkn
Copy link
Member

ayushtkn commented Jun 7, 2020

Thanx @umamaheswararao for the fix.
Technically speaking, I think isDir should be taken from the destination only at least here in case of ViewFs. Some applications relies on checking isDir, I guess for ls -R we too check isDir only.
I tried some stuff, and it gave responses as isDirectory true as well as symlink true.

lrwxrwxrwx  1 ayush ayush    4 Jun  8 04:25 sym_dir -> dir/
ayush@ayushpc:~/test_sym$ ../hadoop/ayush_isDir.bash sym_dir
sym_dir is directory.
sym_dir is not file.
sym_dir is symlink.
ayush@ayushpc:~/test_sym$ cat ../hadoop/ayush_isDir.bash 
if [ -d $1 ] 
then
    echo "$1 is directory." 
else
    echo "$1 is not directory."
fi

if [ -f $1 ] 
then
    echo "$1 is file." 
else
    echo "$1 is not file."it 
fi

if [ -L $1 ] 
then
    echo "$1 is symlink." 
else
    echo "$1 is not symlink."
fi
ayush@ayushpc:~/test_sym$ 

Similar stuff happens if it is file, it tells isDirectory false and File as true.

But the only doubt I have regarding the assertions, this was explicitly done as part of HADOOP-7032 and other one in HADOOP-15289, There should be some reason behind that?

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 18m 39s trunk passed
+1 💚 compile 16m 57s trunk passed
+1 💚 checkstyle 0m 55s trunk passed
+1 💚 mvnsite 1m 24s trunk passed
+1 💚 shadedclient 16m 24s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 4s trunk passed
+0 🆗 spotbugs 2m 8s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 6s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 48s the patch passed
+1 💚 compile 16m 20s the patch passed
+1 💚 javac 16m 20s the patch passed
-0 ⚠️ checkstyle 0m 56s hadoop-common-project/hadoop-common: The patch generated 2 new + 216 unchanged - 0 fixed = 218 total (was 216)
+1 💚 mvnsite 1m 25s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 14m 12s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 4s the patch passed
+1 💚 findbugs 2m 17s the patch passed
_ Other Tests _
+1 💚 unit 9m 18s hadoop-common in the patch passed.
+1 💚 asflicense 0m 54s The patch does not generate ASF License warnings.
106m 23s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/3/artifact/out/Dockerfile
GITHUB PR #2061
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 507453267b5c 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / a8610c1
Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/3/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/3/testReport/
Max. process+thread count 2100 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/3/console
versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 23m 0s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 23m 55s trunk passed
+1 💚 compile 20m 45s trunk passed
+1 💚 checkstyle 0m 53s trunk passed
+1 💚 mvnsite 1m 31s trunk passed
+1 💚 shadedclient 19m 11s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 58s trunk passed
+0 🆗 spotbugs 2m 36s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 35s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 58s the patch passed
+1 💚 compile 20m 20s the patch passed
+1 💚 javac 20m 20s the patch passed
-0 ⚠️ checkstyle 0m 56s hadoop-common-project/hadoop-common: The patch generated 2 new + 216 unchanged - 0 fixed = 218 total (was 216)
+1 💚 mvnsite 1m 41s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 16m 39s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 56s the patch passed
+1 💚 findbugs 2m 37s the patch passed
_ Other Tests _
+1 💚 unit 10m 32s hadoop-common in the patch passed.
+1 💚 asflicense 0m 45s The patch does not generate ASF License warnings.
149m 10s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/2/artifact/out/Dockerfile
GITHUB PR #2061
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 79c23889c526 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / a8610c1
Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/2/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/2/testReport/
Max. process+thread count 1327 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/2/console
versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@umamaheswararao
Copy link
Contributor Author

Thanks @ayushtkn for spending time on this.
I think these assertions might have derived from HDFS symlink design, where HDFS symlink will not be an InodeDirectory. Instead is will be always
public class INodeSymlink extends INodeWithAdditionalFields {
This will return isDir as always false. That tells, both symlink and isDir cannot be true in HDFS perspective. But as in ViewFS case, its not limited to HDFS. I think its good to support in similar to how other fs supporting. But I am not very sure where that assert checks are going to impact (ex: if downstream projects enable assertion on their runs).

CC: @jojochuang @rakeshadr

status.getModificationTime(), status.getAccessTime(),
status.getPermission(), status.getOwner(), status.getGroup(),
link.getTargetLink(),
new Path(inode.fullPath).makeQualified(myUri, null));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here in the FileStatus we have resolved the mount and giving details of the destination. Any idea where the link.getTargetLink() in the FileStatus can be or is used?
If we are resolving the mount entry, This is actually the FileStatus of the destination, if the destination itself is a symlink, in that case instead of link.getTargetLink() shouldn't status.getSymlink() be done?

Since all the other details like permissions and stuff we are having of the destination, the intent seems to resolve it as the destination directory rather than projecting this as a symlink to the end user. else if we see in linux the symlink doesn't shows the destination permission or other details it shows it's own details while listing :

lrwxrwxrwx  1 ayush ayush    3 Jun  8 04:52 sym_f -> txt
-rw-rw-r--  1 ayush ayush    0 Jun  8 11:05 txt

In RBF also for listing on mount points, we don't denote them as symlinks, instead just pass the destination FileStatus with name resolved...
Is it possible we keep the assertions as it is and change link.getTargetLink() and specify that as per the destination?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayushtkn , Thanks for the review!
Here ViewFS mount points seems to be designed to show them as symlinks. This can be figured out by looking at ViewFsBaseTest.java#testListOnInternalDirsOfMountTable (which was way old test) and also I had discussion with Sanjay on that.
So, I don't think it's reasonable to change that assumption/behavior now.

Any idea where the link.getTargetLink() in the FileStatus can be or is used?
FileStatus has getSymlink API. Also Filestatus path represents link path. and symlink path represents target path. This seems to be correct.

Coming to permissions bit I just realized symlink showing its own permissions instead of target in macos also. ( but we just changed the behavior in HADOOP-17029 where we display target dir's permissions and group. cc: @abhishekdas99, as per linux/macos the older behavior seems to be ok, but ViewFS perspective it seems to be good to show target permissions instead of showing some bogus permissions. I think in ViewFS case the in-memory InternalViewFSDirFS was created by fs but we can not really cary who created what mount, we will just configure in xml that link and would not have any info who added that, all should be at admins permission control only ). I assume in linux mount link dir will be created with the permission to the current user who created right? Since we don't have that link creation logic dynamically at ViewFS, I don't think we can behave exactly same as Linux here?

My another original question was why we should not have symlink on dir?
is it because HDFS does not support that? If that because of hdfs, should we move that assert under HdfsNamedFileStatus? I don't know what else will break here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent behind that assertion is little different.
An entity can be either a directory, file or a symlink. In linux while listing, directory is denoted as d, link by l and file by -, So there are three categories (Directory/File/Link) not just two (Directory/File)

drwxrwxr-x  2 ayush ayush 4096 Jun  8 04:25 dir/
lrwxrwxrwx  1 ayush ayush    4 Jun  8 04:25 sym_dir -> dir//
-rw-rw-r--  1 ayush ayush    0 Jun  8 11:05 txt

A symlink can point to both directory and file, but isn't itself a directory or file. It is link, and has its own identity.

In Hadoop, to distinguish between the three categories, the logic seems like :
isDir==true --> It is a Directory
isDir==false --> Can be a file or Symlink. So to conclude further whether a file or link
isDir==false and link==null --> it is file
isDir==false and link!=null --> it is a symlink

The assertion message "A directory can not be a symlink" also tries to depict this only. Since if isDir is true that means it is a directory so it can't be a symlink, it can be a symlink when it isn't neither directory nor file.

Now regarding changing it here in ViewFS, allowing isDir to be true and as well as having a link.
If someone is having an application code with

if(sDir==false and link!=null) --> it is a symlink
do work considering symlink.

His code will break, Since this check won't return true post this change. Would this change incompatible?

Now regarding inconsistency between getFileStatus and listStatus, if we are considering a mount entry as link, so getFileStatus should follow the similar semantics as listStatus, A same entity can not be treated differently. If we want to treat it as link, we can think of makin getFileStatus adapt..
Just my thoughts, not much idea about the actual discussions what was the intent then..

Copy link
Contributor Author

@umamaheswararao umamaheswararao Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isDir==true --> It is a Directory
isDir==false --> Can be a file or Symlink. So to conclude further whether a file or link
isDir==false and link==null --> it is file
isDir==false and link!=null --> it is a symlink

Here nio Files APIs return true for isDirectory API. But here we cannot make that judgement with this information. I saw that 'l' part along with directory. However, native filesystems seems to be capturing the info about target filesystem and return isDir true based on that. It denotes as symlink along with permission bits.
My original thought was to change GetFileStatus see comment as you thought here:
But after verifying tests on local mac, I realized isDirectory is getting returned tru in that cases, but here we cannot make that decision. in MAC it was showing as folder icon if target is a directory and isDirectory as true.

Yes, they are incompatible changes( either the change done in listStatus/GetFileStatus)( we will mark them), and behavioral corrections in other side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well things are different at different places and TBH I don't have a strong opinion on which is the best way to do.
On a personal note changing getFileStatus() seems to be little more safe to me, As those assertions and stuffs stays as it is and changes gets restricted to viewFS only and no changes to links interpretations and stuffs. (my assumption, it should be safe, haven't digged in much)

      return new FileStatus(0, true, 0, 0, creationTime, creationTime,
          PERMISSION_555, ugi.getShortUserName(), ugi.getPrimaryGroupName(),

          new Path(theInternalDir.fullPath).makeQualified(
              myUri, ROOT_PATH));

getFileStatus() is treating it as a link only(but with isDir true), it doesn't shows target file system permissions and times. That also need to be changed similarly to HADOOP-17029, to resolve permissions and stuff from target file system. FileStatus should be same through both API's? We can make things get in sync by doing that, and get away with inconsistencies b/w these two API's as of now..

Changes in getListing() apart from making the APIs in sync(That also we need to change in getFileStatus() as well since there 'true' is hardcoded), we seems to change symlink interpretation logics as well to get that in sync with other systems and I think that might break things for people relying on checks like this : if (isDir==false and link!=null)`, May be we can have a follow up JIRA as well, to change link interpretations with bigger audience.

But in any case, I don't have oppositions to any of the approach, all up to you whichever way you want to go ahead. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From ur previous comment:

drwxrwxr-x  2 ayush ayush 4096 Jun  8 04:25 dir/
lrwxrwxrwx  1 ayush ayush    4 Jun  8 04:25 sym_dir -> dir//
-rw-rw-r--  1 ayush ayush    0 Jun  8 11:05 txt

This is from permission bits. If you look at our FSPermission class we have a stickybit to denote its a file or not. Unfortunately it's just a boolean, it does not have 3rd option to represent link.

public FsPermission(FsAction u, FsAction g, FsAction o, boolean sb) {
    set(u, g, o, sb);
  }

(nio) Files.isDirectory returns true for a symlink if target is dir. Where as Hadoop FileStatus class does not allow if symlink represent as dir. Behaviors are confusing :-(
That means we don't have enough info captured in Permissions or FileStatus classes to represent symlink properly.

Coming to getFileStatus question:
In your above quoted code runs on internal directory, thats not a link. Here internal directory means, if you have link src as /a/b/c, here a, b are internal dirs (here it's always a dir, so we can safely hardcode to true) and c is a link dir.
GetFileStatus run in input path, so, when input path is link, it would have resolved and run getFileStatus on targetFileSystem#getFileStatus. Where listStatus gets the FileStatus of immediate childrens. So, listStatus of /a/b , since be is an internal dir, it will go to InetrnalViewFsDir and run listStatus. and one of its child c is a link. so previously treating that children FileStatus object as symlink with isDir is false, irrespective whether target is file/dir. One safe assumption we can keep is, in ViewFS target would always be a directory only. I don't think anyone would configure a file as target link. But we can not leave that case anyway for consistency. Hope this helps. I also don't have very clear opinions here. Symlinks seems to mess around here. :-( .
But I like the idea of representing as symlink as that one of fs features. but consistent behavior is what we need to find here. If we wanted to fix getFileStatus("/a/b/c"), we need to make isDir as false, but current it will return true as it runs on targetFileSystem and also give symlink with tagetFs path. Here no way user know whether the target is file or dir, unless he directly access symlinked path and see.
Let me catchup Sanjay, if he has some thoughts around.

But in any case, I don't have oppositions to any of the approach, all up to you whichever way you want to go ahead. :-)
Thank you. Let's figure out what would be the correct thing to do. I have no plans to move ahead until we find some reasonable solution here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayushtkn

I have verified symlinks behavior in HDFS.

public void testSymlinkOnHDFS() throws Exception {
    // add ur hdfs uri here: ex hdfs://10.0.1.75:9000
    URI hdfsURI = dfs.getUri();
    FileSystem.enableSymlinks();
    try (FileSystem hdfs = new DistributedFileSystem()) {
      hdfs.initialize(hdfsURI, new HdfsConfiguration());
      final Path targetLinkDir = new Path("/user", "targetRegularDir");
      hdfs.mkdirs(targetLinkDir);

      Path symLinkDir = new Path("/links/linkDir");
      hdfs.createSymlink(targetLinkDir, symLinkDir, true);

      // ListStatus Test
      FileStatus[] listStatus = hdfs.listStatus(new Path("/links"));
      FileStatus fsFromLs = listStatus[0]; // FileStatus of /links/linkDir
      Assert.assertEquals(fsFromLs.isDirectory(), false);
      Assert.assertEquals("/links/linkDir",
          Path.getPathWithoutSchemeAndAuthority(fsFromLs.getPath()).toString());

      // GetFileStatus test
      // FileStatus of /links/linkDir
      FileStatus fileStatus = hdfs.getFileStatus(symLinkDir);
      Assert.assertEquals(true, fileStatus.isDirectory());
      // resolved to FileStatus of /user/targetRegularDir
      Assert.assertEquals("/user/targetRegularDir", Path
          .getPathWithoutSchemeAndAuthority(fileStatus.getPath()).toString());
    }
  }

It turns out that the behavior of listStatus and GetFileStatus are different. They both returning different FileStatus. Same behavior in ViewFS also.

GetFileStatus(/test), just runs on resolved path directly. So, it will not be represented as symLink.
ListStatus(/) of gets /test as children FileStatus object. But that represents as symLink.

Probably we should just clarify the behavior in user guide and API docs about the behaviors in symlinks case? Otherwise fixing this needs to be done all other places and it will be incompatible change across.
One advantage I see with the existing behavior is that, with listStatus we can know dir is symlink. If one wants to know targetFs details, then issue GetFileStatus on that path will resolved to targetFS and gets the FileStatus at targetFS.
I will also check with Sanjay about his opinions on this.

@ayushtkn
Copy link
Member

ayushtkn commented Jun 9, 2020

Thanx @umamaheswararao the test is really helpful. I tried it and found really the filestatus are different in case of HDFS as well :

Through ListStatus 
HdfsLocatedFileStatus{path=hdfs://localhost:42219/links/linkDir; isDirectory=false; length=0; replication=0; blocksize=0; modification_time=1591733953915; access_time=1591733953915; owner=ayush; group=supergroup; permission=rwxrwxrwx; isSymlink=true; symlink=/user/targetRegularDir; hasAcl=false; isEncrypted=false; isErasureCoded=false}

Through getFileStatus
HdfsLocatedFileStatus{path=hdfs://localhost:42219/user/targetRegularDir; isDirectory=true; modification_time=1591733953674; access_time=0; owner=ayushS; group=hadoop; permission=rwxr-xr-x; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}

Probably we should just clarify the behavior in user guide and API docs about the behaviors in symlinks case? Otherwise fixing this needs to be done all other places and it will be incompatible change across.

Yahh, Documenting this properly sounds most apt as of now.

One advantage I see with the existing behavior is that, with listStatus we can know dir is symlink. If one wants to know targetFs details, then issue GetFileStatus on that path will resolved to targetFS and gets the FileStatus at targetFS.

Correct, provided people know about this behavior, this behavior may be helpful in many places . but there is probably too much lack of documentation/awareness around symlinks,

we see Ambari Files View is blocked due to this for ViewFS .

If we don't fix this, Srinivasu Majeti's problem won't get solved?

But if there aren't any further different opinions, and provided HDFS is also behaving similarlly, we don't have much scope changing just in ViewFs, So documenting properly as you said should be our final option.

This reverts commit 32f0c7f.
…t in the case of ViewFs implementation for isDirectory"

This reverts commit 66036d6.
… case of ViewFs implementation for isDirectory
@umamaheswararao
Copy link
Contributor Author

I had a discussion with Sanjay about this behaviors. He certainly agreed that, we should update the docs instead of changing them now. GetFileStatus implemented to work on resolved paths transparently. Updated the Javadocs for the APIs how they behave. Hope this helps.
No user guides updated, as they are generic fs commands and there will not be two variation of commands there, we have only ls. And also this behaviors can be fs specific.

CC: @rakeshadr @ayushtkn

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 33s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 21m 4s trunk passed
+1 💚 compile 18m 42s trunk passed
+1 💚 checkstyle 0m 54s trunk passed
+1 💚 mvnsite 1m 29s trunk passed
+1 💚 shadedclient 16m 49s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 5s trunk passed
+0 🆗 spotbugs 2m 18s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 14s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 53s the patch passed
+1 💚 compile 17m 20s the patch passed
+1 💚 javac 17m 20s the patch passed
+1 💚 checkstyle 0m 56s the patch passed
+1 💚 mvnsite 1m 28s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 14m 46s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 2s the patch passed
+1 💚 findbugs 2m 15s the patch passed
_ Other Tests _
+1 💚 unit 9m 40s hadoop-common in the patch passed.
+1 💚 asflicense 0m 48s The patch does not generate ASF License warnings.
113m 15s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/5/artifact/out/Dockerfile
GITHUB PR #2061
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 516742295eb2 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 635e6a1
Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/5/testReport/
Max. process+thread count 1494 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/5/console
versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 21m 25s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 20m 11s trunk passed
+1 💚 compile 18m 0s trunk passed
+1 💚 checkstyle 0m 55s trunk passed
+1 💚 mvnsite 1m 26s trunk passed
+1 💚 shadedclient 16m 27s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 6s trunk passed
+0 🆗 spotbugs 2m 10s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 8s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 51s the patch passed
+1 💚 compile 16m 57s the patch passed
+1 💚 javac 16m 57s the patch passed
+1 💚 checkstyle 0m 57s the patch passed
+1 💚 mvnsite 1m 27s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 14m 12s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 1s the patch passed
+1 💚 findbugs 2m 20s the patch passed
_ Other Tests _
+1 💚 unit 9m 33s hadoop-common in the patch passed.
+1 💚 asflicense 0m 55s The patch does not generate ASF License warnings.
131m 6s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/4/artifact/out/Dockerfile
GITHUB PR #2061
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 0d321b65cf68 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / b735a77
Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/4/testReport/
Max. process+thread count 1948 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/4/console
versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for branch
+1 💚 mvninstall 20m 38s trunk passed
+1 💚 compile 17m 39s trunk passed
+1 💚 checkstyle 2m 50s trunk passed
+1 💚 mvnsite 2m 33s trunk passed
+1 💚 shadedclient 20m 8s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 42s trunk passed
+0 🆗 spotbugs 2m 53s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 5m 5s trunk passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 1m 45s the patch passed
+1 💚 compile 19m 50s the patch passed
+1 💚 javac 19m 50s the patch passed
+1 💚 checkstyle 2m 57s the patch passed
+1 💚 mvnsite 2m 29s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 14m 47s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 37s the patch passed
+1 💚 findbugs 5m 24s the patch passed
_ Other Tests _
+1 💚 unit 10m 27s hadoop-common in the patch passed.
+1 💚 unit 2m 19s hadoop-hdfs-client in the patch passed.
+1 💚 asflicense 0m 47s The patch does not generate ASF License warnings.
133m 9s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/6/artifact/out/Dockerfile
GITHUB PR #2061
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 03a795521027 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / b735a77
Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/6/testReport/
Max. process+thread count 2340 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-2061/6/console
versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@umamaheswararao umamaheswararao merged commit 93b121a into apache:trunk Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants