Skip to content

HADOOP-16826. ABFS: update abfs.md to include config keys for identity transformation #1785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions hadoop-tools/hadoop-azure/src/site/markdown/abfs.md
Original file line number Diff line number Diff line change
Expand Up @@ -857,6 +857,37 @@ signon page for humans, even though it is a machine calling.
1. The URL is wrong —it is pointing at a web page unrelated to OAuth2.0
1. There's a proxy server in the way trying to return helpful instructions.

### `java.io.IOException: The ownership on the staging directory /tmp/hadoop-yarn/staging/user1/.staging is not as expected. It is owned by <principal_id>. The directory must be owned by the submitter user1 or user1`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of options to deal with the issue like deleting staging dir before running the job or changing the staging directory in config before running the job. The configs identity.transformer listed here are also providing a workaround specific to ABFS driver to let the client assume the ownership is with current local user.

As this is not really a store issue, please reword to highlight that this is a workaround.

Copy link
Contributor Author

@karthick-rn karthick-rn Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snvijaya The options suggested are not a suitable fix for the exception. I think identity.transformer is more of a fix rather to be considered as a workaround. I have added a short message on the fix in my last commit. Let me know for any further comments?

When using [Azure Managed Identities](https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/overview), the files/directories in ADLS Gen2 by default will be owned by the service principal object id i.e. principal ID & submitting jobs as the local OS user 'user1' results in the above exception.

The fix is to mimic the ownership to the local OS user, by adding the below properties to`core-site.xml`.

```xml
<property>
<name>fs.azure.identity.transformer.service.principal.id</name>
<value>service principal object id</value>
<description>
An Azure Active Directory object ID (oid) used as the replacement for names contained
in the list specified by “fs.azure.identity.transformer.service.principal.substitution.list”.
Notice that instead of setting oid, you can also set $superuser here.
</description>
</property>
<property>
<name>fs.azure.identity.transformer.service.principal.substitution.list</name>
<value>user1</value>
<description>
A comma separated list of names to be replaced with the service principal ID specified by
“fs.azure.identity.transformer.service.principal.id”. This substitution occurs
when setOwner, setAcl, modifyAclEntries, or removeAclEntries are invoked with identities
contained in the substitution list. Notice that when in non-secure cluster, asterisk symbol *
can be used to match all user/group.
</description>
</property>
```

Once the above properties are configured, `hdfs dfs -ls abfs://container1@abfswales1.dfs.core.windows.net/` shows the ADLS Gen2 files/directories are now owned by 'user1'.

## <a name="testing"></a> Testing ABFS

See the relevant section in [Testing Azure](testing_azure.html).