-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-13478] [yarn] Use real user when fetching delegation tokens. #11358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The Hive client library is not smart enough to notice that the current user is a proxy user; so when using a proxy user, it fails to fetch delegation tokens from the metastore because of a missing kerberos TGT for the current user. To fix it, just run the code that fetches the delegation token as the real logged in user. Tested on a kerberos cluster both submitting normally and with a proxy user; Hive and HBase tokens are retrieved correctly in both cases.
Test build #51921 has finished for PR 11358 at commit
|
So I have not tested using the keytab-based login with proxy user stuff at all. We get delegation tokens even there - does this issue affect that as well? |
This looks like it might affect HDFS tokens as well and error that looks like this might come up during the initial token renewal:
In addition to the code that gets the new tokens, I think the |
That is not expected to work. Just use the proper user's credentials if you're going to provide a principal and a keytab. |
@@ -171,8 +173,8 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil { | |||
* @param username the username of the principal requesting the delegating token. | |||
* @return a delegation token | |||
*/ | |||
private[yarn] def obtainTokenForHiveMetastoreInner(conf: Configuration, | |||
username: String): Option[Token[DelegationTokenIdentifier]] = { | |||
private[yarn] def obtainTokenForHiveMetastoreInner(conf: Configuration): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only reason I thought this was passed in was for testing, is that not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sole test for this code didn't seem to do any verification of the given user name, so it didn't seem necessary to have a parameter here.
the code looks fine to me. Do we need any more documentation to explain proxy user vs keytab? |
Ok, ok, I'll add something. |
looks fine. +1 pending jenkins. @harishreedharan any other concerns? |
LGTM |
Test build #52058 has finished for PR 11358 at commit
|
Merging this. |
hi @vanzin, how about spark sql in this issue, in your view? as in spark sql it will revoke SessionState.start in which will finally connected to metastore in Hive. |
@WangTaoTheTonic please don't comment on closed PRs. If there's a bug in Spark SQL, file a bug. |
The Hive client library is not smart enough to notice that the current user is a proxy user; so when using a proxy user, it fails to fetch delegation tokens from the metastore because of a missing kerberos TGT for the current user. To fix it, just run the code that fetches the delegation token as the real logged in user. Tested on a kerberos cluster both submitting normally and with a proxy user; Hive and HBase tokens are retrieved correctly in both cases. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#11358 from vanzin/SPARK-13478.
The Hive client library is not smart enough to notice that the current user is a proxy user; so when using a proxy user, it fails to fetch delegation tokens from the metastore because of a missing kerberos TGT for the current user. To fix it, just run the code that fetches the delegation token as the real logged in user. Tested on a kerberos cluster both submitting normally and with a proxy user; Hive and HBase tokens are retrieved correctly in both cases. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#11358 from vanzin/SPARK-13478. (cherry picked from commit c7fccb5)
The Hive client library is not smart enough to notice that the current user is a proxy user; so when using a proxy user, it fails to fetch delegation tokens from the metastore because of a missing kerberos TGT for the current user. To fix it, just run the code that fetches the delegation token as the real logged in user. Tested on a kerberos cluster both submitting normally and with a proxy user; Hive and HBase tokens are retrieved correctly in both cases. Author: Marcelo Vanzin <vanzincloudera.com> Closes #11358 from vanzin/SPARK-13478. (cherry picked from commit c7fccb5) Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #16665 from vanzin/SPARK-13478_1.6.
The Hive client library is not smart enough to notice that the current user is a proxy user; so when using a proxy user, it fails to fetch delegation tokens from the metastore because of a missing kerberos TGT for the current user. To fix it, just run the code that fetches the delegation token as the real logged in user. Tested on a kerberos cluster both submitting normally and with a proxy user; Hive and HBase tokens are retrieved correctly in both cases. Author: Marcelo Vanzin <vanzincloudera.com> Closes apache#11358 from vanzin/SPARK-13478. (cherry picked from commit c7fccb5) Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#16665 from vanzin/SPARK-13478_1.6. (cherry picked from commit e78138a)
The Hive client library is not smart enough to notice that the current user is a proxy user; so when using a proxy user, it fails to fetch delegation tokens from the metastore because of a missing kerberos TGT for the current user. To fix it, just run the code that fetches the delegation token as the real logged in user. Tested on a kerberos cluster both submitting normally and with a proxy user; Hive and HBase tokens are retrieved correctly in both cases. Author: Marcelo Vanzin <vanzincloudera.com> Closes apache#11358 from vanzin/SPARK-13478. (cherry picked from commit c7fccb5) Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#16665 from vanzin/SPARK-13478_1.6.
The Hive client library is not smart enough to notice that the current
user is a proxy user; so when using a proxy user, it fails to fetch
delegation tokens from the metastore because of a missing kerberos
TGT for the current user.
To fix it, just run the code that fetches the delegation token as the
real logged in user.
Tested on a kerberos cluster both submitting normally and with a proxy
user; Hive and HBase tokens are retrieved correctly in both cases.