|
| 1 | +<!-- |
| 2 | + Licensed to the Apache Software Foundation (ASF) under one |
| 3 | + or more contributor license agreements. See the NOTICE file |
| 4 | + distributed with this work for additional information |
| 5 | + regarding copyright ownership. The ASF licenses this file |
| 6 | + to you under the Apache License, Version 2.0 (the |
| 7 | + "License"); you may not use this file except in compliance |
| 8 | + with the License. You may obtain a copy of the License at |
| 9 | +
|
| 10 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 11 | +
|
| 12 | + Unless required by applicable law or agreed to in writing, software |
| 13 | + distributed under the License is distributed on an "AS IS" BASIS, |
| 14 | + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 15 | + See the License for the specific language governing permissions and |
| 16 | + limitations under the License. |
| 17 | +--> |
| 18 | + |
| 19 | +# Pluggable Authentication for HBase RPCs |
| 20 | + |
| 21 | +## Background |
| 22 | + |
| 23 | +As a distributed database, HBase must be able to authenticate users and HBase |
| 24 | +services across an untrusted network. Clients and HBase services are treated |
| 25 | +equivalently in terms of authentication (and this is the only time we will |
| 26 | +draw such a distinction). |
| 27 | + |
| 28 | +There are currently three modes of authentication which are supported by HBase |
| 29 | +today via the configuration property `hbase.security.authentication` |
| 30 | + |
| 31 | +1. `SIMPLE` |
| 32 | +2. `KERBEROS` |
| 33 | +3. `TOKEN` |
| 34 | + |
| 35 | +`SIMPLE` authentication is effectively no authentication; HBase assumes the user |
| 36 | +is who they claim to be. `KERBEROS` authenticates clients via the KerberosV5 |
| 37 | +protocol using the GSSAPI mechanism of the Java Simple Authentication and Security |
| 38 | +Layer (SASL) protocol. `TOKEN` is a username-password based authentication protocol |
| 39 | +which uses short-lived passwords that can only be obtained via a `KERBEROS` authenticated |
| 40 | +request. `TOKEN` authentication is synonymous with Hadoop-style [Delegation Tokens](https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/hadoop_tokens.html#delegation-tokens). `TOKEN` authentication uses the `DIGEST-MD5` |
| 41 | +SASL mechanism. |
| 42 | + |
| 43 | +[SASL](https://docs.oracle.com/javase/8/docs/technotes/guides/security/sasl/sasl-refguide.html) |
| 44 | +is a library which specifies a network protocol that can authenticate a client |
| 45 | +and a server using an arbitrary mechanism. SASL ships with a [number of mechanisms](https://www.iana.org/assignments/sasl-mechanisms/sasl-mechanisms.xhtml) |
| 46 | +out of the box and it is possible to implement custom mechanisms. SASL is effectively |
| 47 | +decoupling an RPC client-server model from the mechanism used to authenticate those |
| 48 | +requests (e.g. the RPC code is identical whether username-password, Kerberos, or any |
| 49 | +other method is used to authenticate the request). |
| 50 | + |
| 51 | +RFC's define what [SASL mechanisms exist](https://www.iana.org/assignments/sasl-mechanisms/sasl-mechanisms.xml), |
| 52 | +but what RFC's define are a superset of the mechanisms that are |
| 53 | +[implemented in Java](https://docs.oracle.com/javase/8/docs/technotes/guides/security/sasl/sasl-refguide.html#SUN). |
| 54 | +This document limits discussion to SASL mechanisms in the abstract, focusing on those which are well-defined and |
| 55 | +implemented in Java today by the JDK itself. However, it is completely possible that a developer can implement |
| 56 | +and register their own SASL mechanism. Writing a custom mechanism is outside of the scope of this document, but |
| 57 | +not outside of the realm of possibility. |
| 58 | + |
| 59 | +The `SIMPLE` implementation does not use SASL, but instead has its own RPC logic |
| 60 | +built into the HBase RPC protocol. `KERBEROS` and `TOKEN` both use SASL to authenticate, |
| 61 | +relying on the `Token` interface that is intertwined with the Hadoop `UserGroupInformation` |
| 62 | +class. SASL decouples an RPC from the mechanism used to authenticate that request. |
| 63 | + |
| 64 | +## Problem statement |
| 65 | + |
| 66 | +Despite HBase already shipping authentication implementations which leverage SASL, |
| 67 | +it is (effectively) impossible to add a new authentication implementation to HBase. The |
| 68 | +use of the `org.apache.hadoop.hbase.security.AuthMethod` enum makes it impossible |
| 69 | +to define a new method of authentication. Also, the RPC implementation is written |
| 70 | +to only use the methods that are expressly shipped in HBase. Adding a new authentication |
| 71 | +method would require copying and modifying the RpcClient implementation, in addition |
| 72 | +to modifying the RpcServer to invoke the correct authentication check. |
| 73 | + |
| 74 | +While it is possible to add a new authentication method to HBase, it cannot be done |
| 75 | +cleanly or sustainably. This is what is meant by "impossible". |
| 76 | + |
| 77 | +## Proposal |
| 78 | + |
| 79 | +HBase should expose interfaces which allow for pluggable authentication mechanisms |
| 80 | +such that HBase can authenticate against external systems. Because the RPC implementation |
| 81 | +can already support SASL, HBase can standardize on SASL, allowing any authentication method |
| 82 | +which is capable of using SASL to negotiate authentication. `KERBEROS` and `TOKEN` methods |
| 83 | +will naturally fit into these new interfaces, but `SIMPLE` authentication will not (see the following |
| 84 | +chapter for a tangent on SIMPLE authentication today) |
| 85 | + |
| 86 | +### Tangent: on SIMPLE authentication |
| 87 | + |
| 88 | +`SIMPLE` authentication in HBase today is treated as a special case. My impression is that |
| 89 | +this stems from HBase not originally shipping an RPC solution that had any authentication. |
| 90 | + |
| 91 | +Re-implementing `SIMPLE` authentication such that it also flows through SASL (e.g. via |
| 92 | +the `PLAIN` SASL mechanism) would simplify the HBase codebase such that all authentication |
| 93 | +occurs via SASL. This was not done for the initial implementation to reduce the scope |
| 94 | +of the changeset. Changing `SIMPLE` authentication to use SASL may result in some |
| 95 | +performance impact in setting up a new RPC. The same conditional logic to determine |
| 96 | +`if (sasl) ... else SIMPLE` logic is propagated in this implementation. |
| 97 | + |
| 98 | +## Implementation Overview |
| 99 | + |
| 100 | +HBASE-23347 includes a refactoring of HBase RPC authentication where all current methods |
| 101 | +are ported to a new set of interfaces, and all RPC implementations are updated to use |
| 102 | +the new interfaces. In the spirit of SASL, the expectation is that users can provide |
| 103 | +their own authentication methods at runtime, and HBase should be capable of negotiating |
| 104 | +a client who tries to authenticate via that custom authentication method. The implementation |
| 105 | +refers to this "bundle" of client and server logic as an "authentication provider". |
| 106 | + |
| 107 | +### Providers |
| 108 | + |
| 109 | +One authentication provider includes the following pieces: |
| 110 | + |
| 111 | +1. Client-side logic (providing a credential) |
| 112 | +2. Server-side logic (validating a credential from a client) |
| 113 | +3. Client selection logic to choose a provider (from many that may be available) |
| 114 | + |
| 115 | +A provider's client and server side logic are considered to be one-to-one. A `Foo` client-side provider |
| 116 | +should never be used to authenticate against a `Bar` server-side provider. |
| 117 | + |
| 118 | +We do expect that both clients and servers will have access to multiple providers. A server may |
| 119 | +be capable of authenticating via methods which a client is unaware of. A client may attempt to authenticate |
| 120 | +against a server which the server does not know how to process. In both cases, the RPC |
| 121 | +should fail when a client and server do not have matching providers. The server identifies |
| 122 | +client authentication mechanisms via a `byte authCode` (which is already sent today with HBase RPCs). |
| 123 | + |
| 124 | +A client may also have multiple providers available for it to use in authenticating against |
| 125 | +HBase. The client must have some logic to select which provider to use. Because we are |
| 126 | +allowing custom providers, we must also allow a custom selection logic such that the |
| 127 | +correct provider can be chosen. This is a formalization of the logic already present |
| 128 | +in `org.apache.hadoop.hbase.security.token.AuthenticationTokenSelector`. |
| 129 | + |
| 130 | +To enable the above, we have some new interfaces to support the user extensibility: |
| 131 | + |
| 132 | +1. `interface SaslAuthenticationProvider` |
| 133 | +2. `interface SaslClientAuthenticationProvider extends SaslAuthenticationProvider` |
| 134 | +3. `interface SaslServerAuthenticationProvider extends SaslAuthenticationProvider` |
| 135 | +4. `interface AuthenticationProviderSelector` |
| 136 | + |
| 137 | +The `SaslAuthenticationProvider` shares logic which is common to the client and the |
| 138 | +server (though, this is up to the developer to guarantee this). The client and server |
| 139 | +interfaces each have logic specific to the HBase RPC client and HBase RPC server |
| 140 | +codebase, as their name implies. As described above, an implementation |
| 141 | +of one `SaslClientAuthenticationProvider` must match exactly one implementation of |
| 142 | +`SaslServerAuthenticationProvider`. Each Authentication Provider implementation is |
| 143 | +a singleton and is intended to be shared across all RPCs. A provider selector is |
| 144 | +chosen per client based on that client's configuration. |
| 145 | + |
| 146 | +A client authentication provider is uniquely identified among other providers |
| 147 | +by the following characteristics: |
| 148 | + |
| 149 | +1. A name, e.g. "KERBEROS", "TOKEN" |
| 150 | +2. A byte (a value between 0 and 255) |
| 151 | + |
| 152 | +In addition to these attributes, a provider also must define the following attributes: |
| 153 | + |
| 154 | +3. The SASL mechanism being used. |
| 155 | +4. The Hadoop AuthenticationMethod, e.g. "TOKEN", "KERBEROS", "CERTIFICATE" |
| 156 | +5. The Token "kind", the name used to identify a TokenIdentifier, e.g. `HBASE_AUTH_TOKEN` |
| 157 | + |
| 158 | +It is allowed (even expected) that there may be multiple providers that use `TOKEN` authentication. |
| 159 | + |
| 160 | +N.b. Hadoop requires all `TokenIdentifier` implements to have a no-args constructor and a `ServiceLoader` |
| 161 | +entry in their packaging JAR file (e.g. `META-INF/services/org.apache.hadoop.security.token.TokenIdentifier`). |
| 162 | +Otherwise, parsing the `TokenIdentifier` on the server-side end of an RPC from a Hadoop `Token` will return |
| 163 | +`null` to the caller (often, in the `CallbackHandler` implementation). |
| 164 | + |
| 165 | +### Factories |
| 166 | + |
| 167 | +To ease development with these unknown set of providers, there are two classes which |
| 168 | +find, instantiate, and cache the provider singletons. |
| 169 | + |
| 170 | +1. Client side: `class SaslClientAuthenticationProviders` |
| 171 | +2. Server side: `class SaslServerAuthenticationProviders` |
| 172 | + |
| 173 | +These classes use [Java ServiceLoader](https://docs.oracle.com/javase/8/docs/api/java/util/ServiceLoader.html) |
| 174 | +to find implementations available on the classpath. The provided HBase implementations |
| 175 | +for the three out-of-the-box implementations all register themselves via the `ServiceLoader`. |
| 176 | + |
| 177 | +Each class also enables providers to be added via explicit configuration in hbase-site.xml. |
| 178 | +This enables unit tests to define custom implementations that may be toy/naive/unsafe without |
| 179 | +any worry that these may be inadvertently deployed onto a production HBase cluster. |
0 commit comments