-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-19494: [ABFS] Fix Case Sensitivity Issue for hdi_isfolder metadata #7496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🎊 +1 overall
This message was automatically generated. |
============================================================
|
String resourceType = result.getResponseHeader(dirHeaderName); | ||
return resourceType != null && resourceType.equals(TRUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since getResponseHeader method returns either empty string or true- should we check for resourceType != EMPTY_STRING instead of null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of an empty string, resourceType.equals(TRUE) will return false, so there’s no need for an explicit check for an empty string.
The null case occurs when the header we are looking for does not exist, which is why this check is needed.
return false; | ||
} | ||
|
||
String resourceType = result.getResponseHeader(dirHeaderName); | ||
return resourceType != null && resourceType.equals(TRUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes since we already checked for null in the getHeaderNameIgnoreCase method, this repeated check may not be needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it was an existing line of code, I didn’t change it. However, it does make sense to remove this null check.
return TRUE; | ||
} | ||
return EMPTY_STRING; | ||
} | ||
|
||
@Override | ||
public Map<String, List<String>> getResponseHeaders() { | ||
return new HashMap<>(); | ||
return Collections.singletonMap(X_MS_META_HDI_ISFOLDER, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we adding this entry in the map for all getResponse header calls ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AbfsHttpOperationWithFixedResultForGetFileStatus is used in one place where we are checking for an implicit directory. In the case of an implicit directory, I am applying the same logic as in the method httpHeader.equalsIgnoreCase(X_MS_META_HDI_ISFOLDER), where we return true if the header is X_MS_META_HDI_ISFOLDER.
* Test mkdirs with HDI folder configuration, | ||
* verifying the correct header and directory state. | ||
*/ | ||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add test for other variations such as ALL Caps, only one later caps, all small. Also one test can be when we set multiple metadata keys, the directory one is correctly identified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will add the test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some thoughts.
@@ -1528,10 +1528,31 @@ public AbfsRestOperation deleteBlobPath(final Path blobPath, | |||
*/ | |||
@Override | |||
public boolean checkIsDir(AbfsHttpOperation result) { | |||
String resourceType = result.getResponseHeader(X_MS_META_HDI_ISFOLDER); | |||
String dirHeaderName = getHeaderNameIgnoreCase(result, X_MS_META_HDI_ISFOLDER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't it be better to directly get the value from this function? Something like:
getCaseInsensitiveResponseHeader()
similar to getResponseHeader
And this can be moved to AbfsHttpOperation
class itself where normal version of this method already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be later used by other classes and other headers as well not only BlobClient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken!
@@ -167,4 +181,107 @@ public void testMkdirWithExistingFilename() throws Exception { | |||
intercept(FileAlreadyExistsException.class, () -> fs.mkdirs(new Path("/testFilePath"))); | |||
intercept(FileAlreadyExistsException.class, () -> fs.mkdirs(new Path("/testFilePath/newDir"))); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add tests for this change in mkdir test file?
To me it looks like more of a metadata change.
We should add tests in ITestFileStatus, ITestListStatus, ITestAttributes classes.
All of them have dependency on this header to return right response.
Listing output also depends on this header is that also case-insensitive? Can we add some tests for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for adding these test cases here is we are performing mkdir calls with different cases of Hdi_isfolder. Therefore, the focus is on how mkdir behaves in these scenarios.
But yes, it makes sense it is a metadata change so we can move it to other class if that seems more appropriate to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking suggestions.
LGTM.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
============================================================
|
@@ -104,6 +104,22 @@ public Map<String, List<String>> getResponseHeaders() { | |||
return connection.getHeaderFields(); | |||
} | |||
|
|||
/**{@inheritDoc}*/ | |||
@Override | |||
public String getResponseHeaderIgnoreCase(final String headerName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If both the children are using the same method implementation, same can be added in parent itself if it won't change in future for other children if added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three more implementations of AbfsHttpOperation where the implementation is different. Thats the reason I have implemented it separately for all.
…isfolder metadata (apache#7496) Contributed by Manish Bhatt Reviewed by Anmol, Manika, Anuj Signed off by: Anuj Modi<anujmodi@apache.org>
…isfolder metadata (apache#7496) Contributed by Manish Bhatt Reviewed by Anmol, Manika, Anuj Signed off by: Anuj Modi<anujmodi@apache.org>
Jira: https://issues.apache.org/jira/browse/HADOOP-19494
In the blob endpoint, we determine whether the path is a file, or a directory based on the metadata attribute hdi_isfolder. When creating a directory, we set hdi_isfolder to true. Currently, our method for checking if the path is a directory involves a case-sensitive equality check. Consequently, if someone configures a directory with Hdi_isfolder, the driver will not recognize that path as a directory. We need to address this issue because, in the backend, hdi_isfolder and Hdi_isfolder are considered the same metadata attribute. Therefore, the solution involves modifying our equality check to be case-insensitive, ensuring that the driver correctly identifies directories regardless of case variations in the metadata attribute.