Skip to content

Add guide for custom SQL database support with HSQLDB #986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/StardustDocs/d.tree
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
<toc-element topic="io.md">
<toc-element topic="read.md"/>
<toc-element topic="readSqlDatabases.md"/>
<toc-element topic="readSqlFromCustomDatabase.md"/>
<toc-element topic="write.md"/>
</toc-element>
<toc-element topic="info.md">
Expand Down
84 changes: 30 additions & 54 deletions docs/StardustDocs/topics/readSqlDatabases.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,11 @@ Also, there are a few **extension functions** available on `Connection`,
**NOTE:** This is an experimental module, and for now,
we only support four databases: MS SQL, MariaDB, MySQL, PostgreSQL, and SQLite.

Moreover, since release 0.15 we support the possibility to register custom SQL database, read more in our [guide](readSqlFromCustomDatabase.md).

Additionally, support for JSON and date-time types is limited.
Please take this into consideration when using these functions.


## Getting started with reading from SQL database in Gradle Project

In the first, you need to add a dependency
Expand Down Expand Up @@ -70,15 +71,15 @@ implementation("com.mysql:mysql-connector-j:$version")

Maven Central version could be found [here](https://mvnrepository.com/artifact/com.mysql/mysql-connector-j).

For SQLite:
For **SQLite**:

```kotlin
implementation("org.xerial:sqlite-jdbc:$version")
```

Maven Central version could be found [here](https://mvnrepository.com/artifact/org.xerial/sqlite-jdbc).

For MS SQL:
For **MS SQL**:

```kotlin
implementation("com.microsoft.sqlserver:mssql-jdbc:$version")
Expand Down Expand Up @@ -158,14 +159,17 @@ otherwise, it will be considered non-nullable for the newly created `DataFrame`
These functions read all data from a specific table in the database.
Variants with a limit parameter restrict how many rows will be read from the table.

**readSqlTable(dbConfig: DbConnectionConfig, tableName: String, limit: Int, inferNullability: Boolean): AnyFrame**
**readSqlTable(dbConfig: DbConnectionConfig, tableName: String, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

Read all data from a specific table in the SQL database and transform it into an `AnyFrame` object.

The `dbConfig: DbConnectionConfig` parameter represents the configuration for a database connection,
created under the hood and managed by the library.
Typically, it requires a URL, username, and password.

The `dbType` parameter is the type of database, could be a custom object, provided by user, optional, default is `null`,
to know more, read the [guide](readSqlFromCustomDatabase.md).

```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig

Expand All @@ -180,7 +184,7 @@ The `limit: Int` parameter allows setting the maximum number of records to be re
val users = DataFrame.readSqlTable(dbConfig, "Users", limit = 100)
```

**readSqlTable(connection: Connection, tableName: String, limit: Int, inferNullability: Boolean): AnyFrame**
**readSqlTable(connection: Connection, tableName: String, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

Another variant, where instead of `dbConfig: DbConnectionConfig` we use a JDBC connection: `Connection` object.

Expand Down Expand Up @@ -210,7 +214,7 @@ val users = connection.readDataFrame("Users", 100)
connection.close()
```

**Connection.readDataFrame(sqlQueryOrTableName: String, limit: Int, inferNullability: Boolean): AnyFrame**
**Connection.readDataFrame(sqlQueryOrTableName: String, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

Read all data from a specific table in the SQL database and transform it into an `AnyFrame` object.

Expand All @@ -222,7 +226,7 @@ It should not contain `;` symbol.

All other parameters are described above.

**DbConnectionConfig.readDataFrame(sqlQueryOrTableName: String, limit: Int, inferNullability: Boolean): AnyFrame**
**DbConnectionConfig.readDataFrame(sqlQueryOrTableName: String, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

If you do not have a connection object or need to run a quick,
isolated experiment reading data from an SQL database,
Expand All @@ -233,7 +237,7 @@ you can delegate the creation of the connection to `DbConnectionConfig`.
These functions execute an SQL query on the database and convert the result into a `DataFrame` object.
If a limit is provided, only that many rows will be returned from the result.

**readSqlQuery(dbConfig: DbConnectionConfig, sqlQuery: String, limit: Int, inferNullability: Boolean): AnyFrame**
**readSqlQuery(dbConfig: DbConnectionConfig, sqlQuery: String, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

Execute a specific SQL query on the SQL database and retrieve the resulting data as an AnyFrame.

Expand All @@ -249,7 +253,7 @@ val dbConfig = DbConnectionConfig("URL_TO_CONNECT_DATABASE", "USERNAME", "PASSWO
val df = DataFrame.readSqlQuery(dbConfig, "SELECT * FROM Users WHERE age > 35")
```

**readSqlQuery(connection: Connection, sqlQuery: String, limit: Int, inferNullability: Boolean): AnyFrame**
**readSqlQuery(connection: Connection, sqlQuery: String, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

Another variant, where instead of `dbConfig: DbConnectionConfig` we use a JDBC connection: `Connection` object.

Expand Down Expand Up @@ -301,16 +305,18 @@ The `dbType: DbType` parameter specifies the type of our database (e.g., Postgre
supported by a library.
Currently, the following classes are available: `H2, MsSql, MariaDb, MySql, PostgreSql, Sqlite`.

Also, users have an ability to pass objects, describing their custom databases, more information in [guide](readSqlFromCustomDatabase.md).

```kotlin
import org.jetbrains.kotlinx.dataframe.io.db.PostgreSql
import java.sql.ResultSet

val df = DataFrame.readResultSet(resultSet, PostgreSql)
```

**readResultSet(resultSet: ResultSet, connection: Connection, limit: Int, inferNullability: Boolean): AnyFrame**
**readResultSet(resultSet: ResultSet, connection: Connection, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

Another variant, where instead of `dbType: DbType` we use a JDBC connection: `Connection` object.
Another variant, we use a JDBC connection: `Connection` object.

```kotlin
import java.sql.Connection
Expand Down Expand Up @@ -340,7 +346,7 @@ val df = rs.readDataFrame(connection, 10)
connection.close()
```

**ResultSet.readDataFrame(connection: Connection, limit: Int, inferNullability: Boolean): AnyFrame**
**ResultSet.readDataFrame(connection: Connection, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

Reads the data from a `ResultSet` and converts it into a `DataFrame`.

Expand All @@ -352,7 +358,7 @@ that the `ResultSet` belongs to.
These functions read all data from all tables in the connected database.
Variants with a limit parameter restrict how many rows will be read from each table.

**readAllSqlTables(dbConfig: DbConnectionConfig, limit: Int, inferNullability: Boolean): Map\<String, AnyFrame>**
**readAllSqlTables(dbConfig: DbConnectionConfig, limit: Int, inferNullability: Boolean, dbType: DbType?): Map\<String, AnyFrame>**

Retrieves data from all the non-system tables in the SQL database and returns them as a map of table names to `AnyFrame` objects.

Expand All @@ -368,7 +374,7 @@ val dbConfig = DbConnectionConfig("URL_TO_CONNECT_DATABASE", "USERNAME", "PASSWO
val dataframes = DataFrame.readAllSqlTables(dbConfig)
```

**readAllSqlTables(connection: Connection, limit: Int, inferNullability: Boolean): Map\<String, AnyFrame>**
**readAllSqlTables(connection: Connection, limit: Int, inferNullability: Boolean, dbType: DbType?): Map\<String, AnyFrame>**

Another variant, where instead of `dbConfig: DbConnectionConfig` we use a JDBC connection: `Connection` object.

Expand All @@ -389,7 +395,7 @@ The purpose of these functions is to facilitate the retrieval of table schema.
By providing a table name and either a database configuration or connection,
these functions return the [DataFrameSchema](schema.md) of the specified table.

**getSchemaForSqlTable(dbConfig: DbConnectionConfig, tableName: String): DataFrameSchema**
**getSchemaForSqlTable(dbConfig: DbConnectionConfig, tableName: String, dbType: DbType?): DataFrameSchema**

This function captures the schema of a specific table from an SQL database.

Expand All @@ -405,7 +411,7 @@ val dbConfig = DbConnectionConfig("URL_TO_CONNECT_DATABASE", "USERNAME", "PASSWO
val schema = DataFrame.getSchemaForSqlTable(dbConfig, "Users")
```

**getSchemaForSqlTable(connection: Connection, tableName: String): DataFrameSchema**
**getSchemaForSqlTable(connection: Connection, tableName: String, dbType: DbType?): DataFrameSchema**

Another variant, where instead of `dbConfig: DbConnectionConfig` we use a JDBC connection: `Connection` object.

Expand All @@ -427,7 +433,7 @@ These functions return the schema of an SQL query result.
Once you provide a database configuration or connection and an SQL query,
they return the [DataFrameSchema](schema.md) of the query result.

**getSchemaForSqlQuery(dbConfig: DbConnectionConfig, sqlQuery: String): DataFrameSchema**
**getSchemaForSqlQuery(dbConfig: DbConnectionConfig, sqlQuery: String, dbType: DbType?): DataFrameSchema**

This function executes an SQL query on the database and then retrieves the resulting schema.

Expand All @@ -443,7 +449,7 @@ val dbConfig = DbConnectionConfig("URL_TO_CONNECT_DATABASE", "USERNAME", "PASSWO
val schema = DataFrame.getSchemaForSqlQuery(dbConfig, "SELECT * FROM Users WHERE age > 35")
```

**getSchemaForSqlQuery(connection: Connection, sqlQuery: String): DataFrameSchema**
**getSchemaForSqlQuery(connection: Connection, sqlQuery: String, dbType: DbType?): DataFrameSchema**

Another variant, where instead of `dbConfig: DbConnectionConfig` we use a JDBC connection: `Connection` object.

Expand Down Expand Up @@ -472,11 +478,11 @@ val schema = connection.getDataFrameSchema("SELECT * FROM Users WHERE age > 35")

connection.close()
```
**Connection.getDataFrameSchema(sqlQueryOrTableName: String): DataFrameSchema**
**Connection.getDataFrameSchema(sqlQueryOrTableName: String, dbType: DbType?): DataFrameSchema**

Retrieves the schema of an SQL query result or an SQL table using the provided database configuration.

**DbConnectionConfig.getDataFrameSchema(sqlQueryOrTableName: String): DataFrameSchema**
**DbConnectionConfig.getDataFrameSchema(sqlQueryOrTableName: String, dbType: DbType?): DataFrameSchema**

Retrieves the schema of an SQL query result or an SQL table using the provided database configuration.

Expand Down Expand Up @@ -507,49 +513,19 @@ The `dbType: DbType` parameter specifies the type of our database (e.g., Postgre
supported by a library.
Currently, the following classes are available: `H2, MariaDb, MySql, PostgreSql, Sqlite`.

Also, users have an ability to pass objects, describing their custom databases, more information in [guide](readSqlFromCustomDatabase.md).

```kotlin
import org.jetbrains.kotlinx.dataframe.io.db.PostgreSql
import java.sql.ResultSet

val schema = DataFrame.getSchemaForResultSet(resultSet, PostgreSql)
```

**getSchemaForResultSet(connection: Connection, sqlQuery: String): DataFrameSchema**

Another variant, where instead of `dbType: DbType` we use a JDBC connection: `Connection` object.

```kotlin
import java.sql.Connection
import java.sql.DriverManager

val connection = DriverManager.getConnection("URL_TO_CONNECT_DATABASE")

val schema = DataFrame.getSchemaForResultSet(resultSet, connection)

connection.close()
```

### Extension functions for schema reading from the ResultSet

The same example, rewritten with the extension function:

```kotlin
import java.sql.Connection
import java.sql.DriverManager

val connection = DriverManager.getConnection("URL_TO_CONNECT_DATABASE")

val schema = resultSet.getDataFrameSchema(connection)

connection.close()
```

if you are using this extension function

**ResultSet.getDataFrameSchema(connection: Connection): DataFrameSchema**

or

```kotlin
import org.jetbrains.kotlinx.dataframe.io.db.PostgreSql
import java.sql.ResultSet
Expand All @@ -566,7 +542,7 @@ based on
These functions return a list of all [`DataFrameSchema`](schema.md) from all the non-system tables in the SQL database.
They can be called with either a database configuration or a connection.

**getSchemaForAllSqlTables(dbConfig: DbConnectionConfig): Map\<String, DataFrameSchema>**
**getSchemaForAllSqlTables(dbConfig: DbConnectionConfig, dbType: DbType?): Map\<String, DataFrameSchema>**

This function retrieves the schema of all tables from an SQL database
and returns them as a map of table names to [`DataFrameSchema`](schema.md) objects.
Expand All @@ -583,7 +559,7 @@ val dbConfig = DbConnectionConfig("URL_TO_CONNECT_DATABASE", "USERNAME", "PASSWO
val schemas = DataFrame.getSchemaForAllSqlTables(dbConfig)
```

**getSchemaForAllSqlTables(connection: Connection): Map\<String, DataFrameSchema>**
**getSchemaForAllSqlTables(connection: Connection, dbType: DbType?): Map\<String, DataFrameSchema>**

This function retrieves the schema of all tables using a JDBC connection: `Connection` object
and returns them as a list of [`DataFrameSchema`](schema.md).
Expand Down
Loading
Loading