Skip to content

[SPARK-13478] [yarn] Use real user when fetching delegation tokens. #11358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Feb 25, 2016

The Hive client library is not smart enough to notice that the current
user is a proxy user; so when using a proxy user, it fails to fetch
delegation tokens from the metastore because of a missing kerberos
TGT for the current user.

To fix it, just run the code that fetches the delegation token as the
real logged in user.

Tested on a kerberos cluster both submitting normally and with a proxy
user; Hive and HBase tokens are retrieved correctly in both cases.

Marcelo Vanzin added 2 commits February 24, 2016 18:10
The Hive client library is not smart enough to notice that the current
user is a proxy user; so when using a proxy user, it fails to fetch
delegation tokens from the metastore because of a missing kerberos
TGT for the current user.

To fix it, just run the code that fetches the delegation token as the
real logged in user.

Tested on a kerberos cluster both submitting normally and with a proxy
user; Hive and HBase tokens are retrieved correctly in both cases.
@SparkQA
Copy link

SparkQA commented Feb 25, 2016

Test build #51921 has finished for PR 11358 at commit 7fcec4a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@harishreedharan
Copy link
Contributor

So I have not tested using the keytab-based login with proxy user stuff at all. We get delegation tokens even there - does this issue affect that as well?

@harishreedharan
Copy link
Contributor

This looks like it might affect HDFS tokens as well and error that looks like this might come up during the initial token renewal:

WARN UserGroupInformation: PriviledgedActionException as:hari (auth:PROXY) via hdfs@EXAMPLE (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: hari tries to renew a token with renewer hdfs

In addition to the code that gets the new tokens, I think the getTokenRenewalInterval method also needs to be be run as the real user.

@vanzin
Copy link
Contributor Author

vanzin commented Feb 25, 2016

keytab-based login with proxy user stuff at all

That is not expected to work. Just use the proper user's credentials if you're going to provide a principal and a keytab.

@@ -171,8 +173,8 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
* @param username the username of the principal requesting the delegating token.
* @return a delegation token
*/
private[yarn] def obtainTokenForHiveMetastoreInner(conf: Configuration,
username: String): Option[Token[DelegationTokenIdentifier]] = {
private[yarn] def obtainTokenForHiveMetastoreInner(conf: Configuration):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only reason I thought this was passed in was for testing, is that not the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sole test for this code didn't seem to do any verification of the given user name, so it didn't seem necessary to have a parameter here.

@tgravescs
Copy link
Contributor

the code looks fine to me. Do we need any more documentation to explain proxy user vs keytab?

@vanzin
Copy link
Contributor Author

vanzin commented Feb 26, 2016

Do we need any more documentation to explain proxy user vs keytab?

Ok, ok, I'll add something.

@tgravescs
Copy link
Contributor

looks fine. +1 pending jenkins. @harishreedharan any other concerns?

@harishreedharan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52058 has finished for PR 11358 at commit 0159499.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Feb 29, 2016

Merging this.

@asfgit asfgit closed this in c7fccb5 Feb 29, 2016
@vanzin vanzin deleted the SPARK-13478 branch March 1, 2016 19:43
@WangTaoTheTonic
Copy link
Contributor

hi @vanzin, how about spark sql in this issue, in your view? as in spark sql it will revoke SessionState.start in which will finally connected to metastore in Hive.

@vanzin
Copy link
Contributor Author

vanzin commented Mar 8, 2016

@WangTaoTheTonic please don't comment on closed PRs. If there's a bug in Spark SQL, file a bug.

roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
The Hive client library is not smart enough to notice that the current
user is a proxy user; so when using a proxy user, it fails to fetch
delegation tokens from the metastore because of a missing kerberos
TGT for the current user.

To fix it, just run the code that fetches the delegation token as the
real logged in user.

Tested on a kerberos cluster both submitting normally and with a proxy
user; Hive and HBase tokens are retrieved correctly in both cases.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#11358 from vanzin/SPARK-13478.
vanzin pushed a commit to vanzin/spark that referenced this pull request Jan 20, 2017
The Hive client library is not smart enough to notice that the current
user is a proxy user; so when using a proxy user, it fails to fetch
delegation tokens from the metastore because of a missing kerberos
TGT for the current user.

To fix it, just run the code that fetches the delegation token as the
real logged in user.

Tested on a kerberos cluster both submitting normally and with a proxy
user; Hive and HBase tokens are retrieved correctly in both cases.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#11358 from vanzin/SPARK-13478.

(cherry picked from commit c7fccb5)
asfgit pushed a commit that referenced this pull request Jan 22, 2017
The Hive client library is not smart enough to notice that the current
user is a proxy user; so when using a proxy user, it fails to fetch
delegation tokens from the metastore because of a missing kerberos
TGT for the current user.

To fix it, just run the code that fetches the delegation token as the
real logged in user.

Tested on a kerberos cluster both submitting normally and with a proxy
user; Hive and HBase tokens are retrieved correctly in both cases.

Author: Marcelo Vanzin <vanzincloudera.com>

Closes #11358 from vanzin/SPARK-13478.

(cherry picked from commit c7fccb5)

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #16665 from vanzin/SPARK-13478_1.6.
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Jan 22, 2017
The Hive client library is not smart enough to notice that the current
user is a proxy user; so when using a proxy user, it fails to fetch
delegation tokens from the metastore because of a missing kerberos
TGT for the current user.

To fix it, just run the code that fetches the delegation token as the
real logged in user.

Tested on a kerberos cluster both submitting normally and with a proxy
user; Hive and HBase tokens are retrieved correctly in both cases.

Author: Marcelo Vanzin <vanzincloudera.com>

Closes apache#11358 from vanzin/SPARK-13478.

(cherry picked from commit c7fccb5)

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#16665 from vanzin/SPARK-13478_1.6.

(cherry picked from commit e78138a)
mgummelt pushed a commit to d2iq-archive/spark that referenced this pull request Mar 7, 2017
The Hive client library is not smart enough to notice that the current
user is a proxy user; so when using a proxy user, it fails to fetch
delegation tokens from the metastore because of a missing kerberos
TGT for the current user.

To fix it, just run the code that fetches the delegation token as the
real logged in user.

Tested on a kerberos cluster both submitting normally and with a proxy
user; Hive and HBase tokens are retrieved correctly in both cases.

Author: Marcelo Vanzin <vanzincloudera.com>

Closes apache#11358 from vanzin/SPARK-13478.

(cherry picked from commit c7fccb5)

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#16665 from vanzin/SPARK-13478_1.6.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants