Skip to content

Commit

Permalink
[KYUUBI apache#4655] [DOCS] Enrich docs for Kyuubi Hive JDBC driver
Browse files Browse the repository at this point in the history
### _Why are the changes needed?_

Update the outdated words for Kyuubi Hive JDBC driver, and supply more details about Kerberos authentication.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

<img width="1400" alt="image" src="https://user-images.githubusercontent.com/26535726/229476374-d662c3b2-c1bc-44e9-a717-92f401586feb.png">

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes apache#4655 from pan3793/docs-v2.

Closes apache#4655

9d2cb48 [Cheng Pan] Update docs/quick_start/quick_start_with_jdbc.md
00af58e [Cheng Pan] address comments
48bf216 [Cheng Pan] Update docs/quick_start/quick_start_with_jupyter.md
054e2be [Cheng Pan] nit
a0a80b8 [Cheng Pan] nit
41ff97d [Cheng Pan] [DOCS] Enrich docs for Kyuubi Hive JDBC Driver

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
  • Loading branch information
pan3793 committed Apr 3, 2023
1 parent b818c6f commit a947dcb
Show file tree
Hide file tree
Showing 8 changed files with 159 additions and 98 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ HiveServer2 can identify and authenticate a caller, and then if the caller also

Kyuubi extends the use of STS in a multi-tenant model based on a unified interface and relies on the concept of multi-tenancy to interact with cluster managers to finally gain the ability of resources sharing/isolation and data security. The loosely coupled architecture of the Kyuubi server and engine dramatically improves the client concurrency and service stability of the service itself.

#### DataLake/LakeHouse Support
#### DataLake/Lakehouse Support

The vision of Kyuubi is to unify the portal and become an easy-to-use data lake management platform. Different kinds of workloads, such as ETL processing and BI analytics, can be supported by one platform, using one copy of data, with one SQL interface.

Expand Down
4 changes: 2 additions & 2 deletions docs/appendix/terminology.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,9 +129,9 @@ As an enterprise service, SLA commitment is essential. Deploying Kyuubi in High
</em>
</p>

## DataLake & LakeHouse
## DataLake & Lakehouse

Kyuubi unifies DataLake & LakeHouse access in the simplest pure SQL way, meanwhile it's also the securest way with authentication and SQL standard authorization.
Kyuubi unifies DataLake & Lakehouse access in the simplest pure SQL way, meanwhile it's also the securest way with authentication and SQL standard authorization.

### Apache Iceberg

Expand Down
14 changes: 7 additions & 7 deletions docs/client/jdbc/hive_jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,18 @@

## Instructions

Kyuubi does not provide its own JDBC Driver so far,
as it is fully compatible with Hive JDBC and ODBC drivers that let you connect to popular Business Intelligence (BI) tools to query,
analyze and visualize data though Spark SQL engines.
Kyuubi is fully compatible with Hive JDBC and ODBC drivers that let you connect to popular Business Intelligence (BI)
tools to query, analyze and visualize data though Spark SQL engines.

It's recommended to use [Kyuubi JDBC driver](./kyuubi_jdbc.html) for new applications.

## Install Hive JDBC

For programing, the easiest way to get `hive-jdbc` is from [the maven central](https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc). For example,

The following sections demonstrate how to use Hive JDBC driver 2.3.8 to connect Kyuubi Server, actually, any version
less or equals 3.1.x should work fine.

- **maven**

```xml
Expand Down Expand Up @@ -76,7 +80,3 @@ jdbc:hive2://<host>:<port>/<dbName>;<sessionVars>?<kyuubiConfs>#<[spark|hive]Var
jdbc:hive2://localhost:10009/default;hive.server2.proxy.user=proxy_user?kyuubi.engine.share.level=CONNECTION;spark.ui.enabled=false#var_x=y
```

## Unsupported Hive Features

- Connect to HiveServer2 using HTTP transport. ```transportMode=http```

115 changes: 88 additions & 27 deletions docs/client/jdbc/kyuubi_jdbc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,14 @@ Kyuubi Hive JDBC Driver
=======================

.. versionadded:: 1.4.0
Since 1.4.0, kyuubi community maintains a forked hive jdbc driver module and provides both shaded and non-shaded packages.
Kyuubi community maintains a forked Hive JDBC driver module and provides both shaded and non-shaded packages.

This packages aims to support some missing functionalities of the original hive jdbc.
For kyuubi engines that support multiple catalogs, it provides meta APIs for better support.
The behaviors of the original hive jdbc have remained.
This packages aims to support some missing functionalities of the original Hive JDBC driver.
For Kyuubi engines that support multiple catalogs, it provides meta APIs for better support.
The behaviors of the original Hive JDBC driver have remained.

To access a Hive data warehouse or new lakehouse formats, such as Apache Iceberg/Hudi, delta lake using the kyuubi jdbc driver for Apache kyuubi, you need to configure
the following:
To access a Hive data warehouse or new Lakehouse formats, such as Apache Iceberg/Hudi, Delta Lake using the Kyuubi JDBC driver
for Apache kyuubi, you need to configure the following:

- The list of driver library files - :ref:`referencing-libraries`.
- The Driver or DataSource class - :ref:`registering_class`.
Expand All @@ -46,28 +46,28 @@ In the code, specify the artifact `kyuubi-hive-jdbc-shaded` from `Maven Central`
Maven
^^^^^

.. code-block:: xml
.. parsed-literal::
<dependency>
<groupId>org.apache.kyuubi</groupId>
<artifactId>kyuubi-hive-jdbc-shaded</artifactId>
<version>1.5.2-incubating</version>
<version>\ |release|\</version>
</dependency>
Sbt
sbt
^^^

.. code-block:: sbt
.. parsed-literal::
libraryDependencies += "org.apache.kyuubi" % "kyuubi-hive-jdbc-shaded" % "1.5.2-incubating"
libraryDependencies += "org.apache.kyuubi" % "kyuubi-hive-jdbc-shaded" % "\ |release|\"
Gradle
^^^^^^

.. code-block:: gradle
.. parsed-literal::
implementation group: 'org.apache.kyuubi', name: 'kyuubi-hive-jdbc-shaded', version: '1.5.2-incubating'
implementation group: 'org.apache.kyuubi', name: 'kyuubi-hive-jdbc-shaded', version: '\ |release|\'
Using the Driver in a JDBC Application
**************************************
Expand All @@ -92,11 +92,9 @@ connection for JDBC:

.. code-block:: java
private static Connection connectViaDM() throws Exception
{
Connection connection = null;
connection = DriverManager.getConnection(CONNECTION_URL);
return connection;
private static Connection newKyuubiConnection() throws Exception {
Connection connection = DriverManager.getConnection(CONNECTION_URL);
return connection;
}
.. _building_url:
Expand All @@ -112,12 +110,13 @@ accessing. The following is the format of the connection URL for the Kyuubi Hive

.. code-block:: jdbc
jdbc:subprotocol://host:port/schema;<clientProperties;><[#|?]sessionProperties>
jdbc:subprotocol://host:port[/catalog]/[schema];<clientProperties;><[#|?]sessionProperties>
- subprotocol: kyuubi or hive2
- host: DNS or IP address of the kyuubi server
- port: The number of the TCP port that the server uses to listen for client requests
- dbName: Optional database name to set the current database to run the query against, use `default` if absent.
- catalog: Optional catalog name to set the current catalog to run the query against.
- schema: Optional database name to set the current database to run the query against, use `default` if absent.
- clientProperties: Optional `semicolon(;)` separated `key=value` parameters identified and affect the client behavior locally. e.g., user=foo;password=bar.
- sessionProperties: Optional `semicolon(;)` separated `key=value` parameters used to configure the session, operation or background engines.
For instance, `kyuubi.engine.share.level=CONNECTION` determines the background engine instance is used only by the current connection. `spark.ui.enabled=false` disables the Spark UI of the engine.
Expand All @@ -127,7 +126,7 @@ accessing. The following is the format of the connection URL for the Kyuubi Hive
- Properties are case-sensitive
- Do not duplicate properties in the connection URL

Connection URL over Http
Connection URL over HTTP
************************

.. versionadded:: 1.6.0
Expand All @@ -145,16 +144,78 @@ Connection URL over Service Discovery
jdbc:subprotocol://<zookeeper quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi
- zookeeper quorum is the corresponding zookeeper cluster configured by `kyuubi.ha.zookeeper.quorum` at the server side.
- zooKeeperNamespace is the corresponding namespace configured by `kyuubi.ha.zookeeper.namespace` at the server side.
- zookeeper quorum is the corresponding zookeeper cluster configured by `kyuubi.ha.addresses` at the server side.
- zooKeeperNamespace is the corresponding namespace configured by `kyuubi.ha.namespace` at the server side.

Authentication
--------------
Kerberos Authentication
-----------------------
Since 1.6.0, Kyuubi JDBC driver implements the Kerberos authentication based on JAAS framework instead of `Hadoop UserGroupInformation`_,
which means it does not forcibly rely on Hadoop dependencies to connect a kerberized Kyuubi Server.

Kyuubi JDBC driver supports different approaches to connect a kerberized Kyuubi Server. First of all, please follow
the `krb5.conf instruction`_ to setup ``krb5.conf`` properly.

DataTypes
---------
Authentication by Principal and Keytab
**************************************

.. versionadded:: 1.6.0

.. tip::

It's the simplest way w/ minimal setup requirements for Kerberos authentication.

It's straightforward to use principal and keytab for Kerberos authentication, just simply configure them in the JDBC URL.

.. code-block::
jdbc:subprotocol://<zookeeper quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi
jdbc:kyuubi://host:port/schema;clientKeytab=<clientKeytab>;clientPrincipal=<clientPrincipal>;serverPrincipal=<serverPrincipal>
- clientKeytab: path of Kerberos ``keytab`` file for client authentication
- clientPrincipal: Kerberos ``principal`` for client authentication
- serverPrincipal: Kerberos ``principal`` configured by `kyuubi.kinit.principal` at the server side. ``serverPrincipal`` is available
since 1.7.0, for previous versions, use ``principal`` instead.

Authentication by Principal and TGT Cache
*****************************************

Another typical usage of Kerberos authentication is using `kinit` to generate the TGT cache first, then the application
does Kerberos authentication through the TGT cache.

.. code-block::
jdbc:kyuubi://host:port/schema;serverPrincipal=<serverPrincipal>
Authentication by `Hadoop UserGroupInformation`_ ``doAs`` (programing only)
***************************************************************************

.. tip::

This approach allows project which already uses `Hadoop UserGroupInformation`_ for Kerberos authentication to easily
connect the kerberized Kyuubi Server. This approach does not work between [1.6.0, 1.7.0], and got fixed in 1.7.1.

.. code-block::
String jdbcUrl = "jdbc:kyuubi://host:port/schema;serverPrincipal=<serverPrincipal>"
UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytab(clientPrincipal, clientKeytab);
ugi.doAs((PrivilegedExceptionAction<String>) () -> {
Connection conn = DriverManager.getConnection(jdbcUrl);
...
});
Authentication by Subject (programing only)
*******************************************

.. code-block:: java
String jdbcUrl = "jdbc:kyuubi://host:port/schema;serverPrincipal=<serverPrincipal>;kerberosAuthType=fromSubject"
Subject kerberizedSubject = ...;
Subject.doAs(kerberizedSubject, (PrivilegedExceptionAction<String>) () -> {
Connection conn = DriverManager.getConnection(jdbcUrl);
...
});
.. _Maven Central: https://mvnrepository.com/artifact/org.apache.kyuubi/kyuubi-hive-jdbc-shaded
.. _JDBC Applications: ../bi_tools/index.html
.. _java.sql.DriverManager: https://docs.oracle.com/javase/8/docs/api/java/sql/DriverManager.html
.. _Hadoop UserGroupInformation: https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/security/UserGroupInformation.html
.. _krb5.conf instruction: https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/KerberosReq.html
4 changes: 2 additions & 2 deletions docs/extensions/server/authentication.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,12 @@ To create custom Authenticator class derived from the above interface, we need t

- Referencing the library

.. code-block:: xml
.. parsed-literal::
<dependency>
<groupId>org.apache.kyuubi</groupId>
<artifactId>kyuubi-common_2.12</artifactId>
<version>1.5.2-incubating</version>
<version>\ |release|\</version>
<scope>provided</scope>
</dependency>
Expand Down
2 changes: 1 addition & 1 deletion docs/quick_start/quick_start_with_helm.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
- limitations under the License.
-->

# Getting Started With Kyuubi on Kubernetes
# Getting Started with Helm

## Running Kyuubi with Helm

Expand Down
114 changes: 57 additions & 57 deletions docs/quick_start/quick_start_with_jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,82 +15,82 @@
- limitations under the License.
-->

# Getting Started With Hive JDBC
# Getting Started with Hive JDBC

## How to install JDBC driver
## How to get the Kyuubi JDBC driver

Kyuubi JDBC driver is fully compatible with the 2.3.* version of hive JDBC driver, so we reuse hive JDBC driver to connect to Kyuubi server.
Kyuubi Thrift API is fully compatible w/ HiveServer2, so technically, it allows to use any Hive JDBC driver to connect
Kyuubi Server. But it's recommended to use [Kyuubi Hive JDBC driver](../client/jdbc/kyuubi_jdbc), which is forked from
Hive 3.1.x JDBC driver, aims to support some missing functionalities of the original Hive JDBC driver.

Add repository to your maven configuration file which may reside in `$MAVEN_HOME/conf/settings.xml`.
The driver is available from Maven Central:

```xml
<repositories>
<repository>
<id>central maven repo</id>
<name>central maven repo https</name>
<url>https://repo.maven.apache.org/maven2</url>
</repository>
</repositories>
```

You can add below dependency to your `pom.xml` file in your application.

```xml
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.3.7</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<!-- keep consistent with the build hadoop version -->
<version>2.7.4</version>
<groupId>org.apache.kyuubi</groupId>
<artifactId>kyuubi-hive-jdbc-shaded</artifactId>
<version>1.7.0</version>
</dependency>
```

## Use JDBC driver with kerberos
## Connect to non-kerberized Kyuubi Server

The below java code is using a keytab file to login and connect to Kyuubi server by JDBC.

```java
package org.apache.kyuubi.examples;

import java.io.IOException;
import java.security.PrivilegedExceptionAction;
import java.sql.*;

import org.apache.hadoop.security.UserGroupInformation;

public class JDBCTest {

private static String driverName = "org.apache.hive.jdbc.HiveDriver";
private static String kyuubiJdbcUrl = "jdbc:hive2://localhost:10009/default;";

public static void main(String[] args) throws ClassNotFoundException, SQLException {
String principal = args[0]; // kerberos principal
String keytab = args[1]; // keytab file location
Configuration configuration = new Configuration();
configuration.set(HADOOP_SECURITY_AUTHENTICATION, "kerberos");
UserGroupInformation.setConfiguration(configuration);
UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab);

Class.forName(driverName);
Connection conn = ugi.doAs(new PrivilegedExceptionAction<Connection>(){
public Connection run() throws SQLException {
return DriverManager.getConnection(kyuubiJdbcUrl);
}
});
Statement st = conn.createStatement();
ResultSet res = st.executeQuery("show databases");
while (res.next()) {
System.out.println(res.getString(1));
public class KyuubiJDBC {

private static String driverName = "org.apache.kyuubi.jdbc.KyuubiHiveDriver";
private static String kyuubiJdbcUrl = "jdbc:kyuubi://localhost:10009/default;";

public static void main(String[] args) throws SQLException {
try (Connection conn = DriverManager.getConnection(kyuubiJdbcUrl)) {
try (Statement stmt = conn.createStatement()) {
try (ResultSet rs = st.executeQuery("show databases")) {
while (rs.next()) {
System.out.println(rs.getString(1));
}
}
}
}
}
}
```

## Connect to Kerberized Kyuubi Server

The following Java code uses a keytab file to login and connect to Kyuubi Server by JDBC.

```java
package org.apache.kyuubi.examples;

import java.sql.*;

public class KyuubiJDBCDemo {

private static String driverName = "org.apache.kyuubi.jdbc.KyuubiHiveDriver";
private static String kyuubiJdbcUrlTemplate = "jdbc:kyuubi://localhost:10009/default;" +
"clientPrincipal=%s;clientKeytab=%s;serverPrincipal=%s";

public static void main(String[] args) throws SQLException {
String clientPrincipal = args[0]; // Kerberos principal
String clientKeytab = args[1]; // Keytab file location
String serverPrincipal = arg[2]; // Kerberos principal used by Kyuubi Server
String kyuubiJdbcUrl = String.format(kyuubiJdbcUrl, clientPrincipal, clientKeytab, serverPrincipal);
try (Connection conn = DriverManager.getConnection(kyuubiJdbcUrl)) {
try (Statement stmt = conn.createStatement()) {
try (ResultSet rs = st.executeQuery("show databases")) {
while (rs.next()) {
System.out.println(rs.getString(1));
}
}
res.close();
st.close();
conn.close();
}
}
}
}
```

Loading

0 comments on commit a947dcb

Please sign in to comment.