Skip to content

Added JDBC-integration #451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 63 commits into from
Oct 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
d67e25e
Created a module
zaleslaw Jul 17, 2023
13c0af2
Updated a module with the inital integration and test
zaleslaw Jul 17, 2023
b17fdda
Added a new complex example for reading with Native SQL Query
zaleslaw Jul 17, 2023
56b2484
Added an implementation for a new complex example for reading with Na…
zaleslaw Jul 17, 2023
24b8b2b
Added an implementation for a new complex example for reading with Na…
zaleslaw Jul 17, 2023
e477a74
Added idea for test
zaleslaw Jul 19, 2023
8a9b496
Added mariadb4j integration
zaleslaw Jul 19, 2023
b58fae6
Attempt with test containers
zaleslaw Jul 19, 2023
b97a6b5
Added the H2 support for testing database capabilities
zaleslaw Jul 21, 2023
65810a2
Set up a draft Kotlin logging
zaleslaw Jul 25, 2023
1306bdc
Started with ImportDataSchema changes
zaleslaw Jul 26, 2023
b4acfde
Started with ImportDataSchema changes
zaleslaw Jul 26, 2023
926b540
Added missed dependencies
zaleslaw Jul 28, 2023
bcfe733
Added simple generation for one table
zaleslaw Jul 28, 2023
7b7cb85
Finished simple prototype
zaleslaw Jul 28, 2023
b1ffdb1
Added some minor ideas
zaleslaw Aug 1, 2023
d5ab3a8
Fixed bug in the KNB test
zaleslaw Aug 2, 2023
fe96d02
Fixed bug in the KNB test
zaleslaw Aug 2, 2023
6f8dabf
Add JDBC support to dataframe gradle plugin
zaleslaw Aug 4, 2023
d6dcc9f
Added API methods
zaleslaw Aug 6, 2023
bdb91fc
Finished API methods
zaleslaw Aug 6, 2023
6bc8c3c
Added import data schema annotation support
zaleslaw Aug 6, 2023
81a84de
Added import data schema annotation support
zaleslaw Aug 6, 2023
a7147a3
Added force-classloading for drivers
zaleslaw Aug 7, 2023
0ebe177
Refactored jdbc
zaleslaw Aug 7, 2023
01c4e48
Updated tests
zaleslaw Aug 7, 2023
9bc0ebe
Support schema generation for SqlQuery
zaleslaw Aug 7, 2023
b647ace
Support schema generation for SqlQuery
zaleslaw Aug 7, 2023
24968e5
Added experimental methods
zaleslaw Aug 7, 2023
6498cc0
Added experimental methods
zaleslaw Aug 7, 2023
e8c9179
Added experimental methods
zaleslaw Aug 7, 2023
fd99a14
Added H2 types support
zaleslaw Aug 28, 2023
f9f3bb2
Added SQlite types support
zaleslaw Aug 28, 2023
6dc7e8d
Added initial Postgre test data
zaleslaw Aug 29, 2023
3b76f4c
Fixed PostgreSQL mapping
zaleslaw Aug 31, 2023
f40869f
Made test green
zaleslaw Aug 31, 2023
b19ca39
Added type mapping for mariadb and mysql
zaleslaw Aug 31, 2023
9123779
Add test for mariadb
zaleslaw Aug 31, 2023
9582a66
Added initial support for Mariadb
zaleslaw Sep 1, 2023
0de6f4c
Refactored sealed hierarchy and fixed SQlite tests
zaleslaw Sep 1, 2023
f4ba91d
Updated test and moved coverage above 60 percentage
zaleslaw Sep 5, 2023
ed0d960
Implemented readAllTables function for H2
zaleslaw Sep 6, 2023
4361cf5
Implemented readAllTables for other databases
zaleslaw Sep 7, 2023
9a3a4f7
Added documentation
zaleslaw Sep 8, 2023
db2af8d
Fixed in plugins
zaleslaw Sep 8, 2023
980b231
Merge branch 'master' into issue-212
zaleslaw Sep 8, 2023
4ed5bb4
Fixed in plugins
zaleslaw Sep 8, 2023
d1f3316
Refactored toml
zaleslaw Sep 11, 2023
309883a
Fixed symbol processing plugin
zaleslaw Sep 12, 2023
53e05b4
Added integration test
zaleslaw Sep 12, 2023
89313e8
Fixed inspections
zaleslaw Sep 12, 2023
3db5d28
Added documentation
zaleslaw Sep 13, 2023
8419bb0
Update SQL reading documentation
zaleslaw Sep 13, 2023
f9499b9
Merge branch 'master' into issue-212
zaleslaw Oct 2, 2023
4a4e988
Fixed Review Part 1
zaleslaw Oct 3, 2023
af0fc0a
Merge remote-tracking branch 'fork/issue-212' into issue-212
zaleslaw Oct 3, 2023
4ac9dfd
Fixed Review Part 1
zaleslaw Oct 3, 2023
f427331
Rename test files to match Kotlin conventions and refactor tests
zaleslaw Oct 3, 2023
97f68f6
Refactor readAllTables method name to readAllSqlTables
zaleslaw Oct 3, 2023
3d0906b
Enhance exception messages and add uniqueness check
zaleslaw Oct 4, 2023
4e22936
Added buildTableMetadata method to DbType and handle JSON type issues
zaleslaw Oct 4, 2023
f1763c5
Ignore test cases due to configuration issues.
zaleslaw Oct 6, 2023
42dfe6d
Remove SQL reading features
zaleslaw Oct 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ dependencies {
api(project(":dataframe-arrow"))
api(project(":dataframe-excel"))
api(project(":dataframe-openapi"))
api(project(":dataframe-jdbc"))
}

allprojects {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,24 @@ import org.jetbrains.kotlinx.dataframe.io.JSON

/**
* Annotation preprocessing will generate a DataSchema interface from the data at `path`.
* Data must be of supported format: CSV, JSON, Apache Arrow, Excel, OpenAPI (Swagger) in YAML/JSON.
* Data must be of supported format: CSV, JSON, Apache Arrow, Excel, OpenAPI (Swagger) in YAML/JSON, JDBC.
* Generated data schema has properties inferred from data and a companion object with `read method`.
* `read method` is either `readCSV` or `readJson` that returns `DataFrame<name>`
*
* @param name name of the generated interface
* @param path URL or relative path to data.
* if path starts with protocol (http, https, ftp), it's considered a URL. Otherwise, it's treated as relative path.
* If a path starts with protocol (http, https, ftp, jdbc), it's considered a URL.
* Otherwise, it's treated as a relative path.
* By default, it will be resolved relatively to project dir, i.e. File(projectDir, path)
* You can configure it by passing `dataframe.resolutionDir` option to preprocessor, see https://kotlinlang.org/docs/ksp-quickstart.html#pass-options-to-processors
* You can configure it by passing `dataframe.resolutionDir` option to preprocessor,
* see https://kotlinlang.org/docs/ksp-quickstart.html#pass-options-to-processors
* @param visibility visibility of the generated interface.
* @param normalizationDelimiters if not empty, split property names by delimiters,
* lowercase parts and join to camel case. Set empty list to disable normalization
* @param withDefaultPath if `true`, generate `defaultPath` property to the data schema's companion object and make it default argument for a `read method`
* @param csvOptions options to parse CSV data. Not used when data is not Csv
* @param jsonOptions options to parse JSON data. Not used when data is not Json
* @param jdbcOptions options to parse data from a database via JDBC. Not used when data is not stored in the database
*/
@Retention(AnnotationRetention.SOURCE)
@Target(AnnotationTarget.FILE)
Expand All @@ -35,6 +38,7 @@ public annotation class ImportDataSchema(
val withDefaultPath: Boolean = true,
val csvOptions: CsvOptions = CsvOptions(','),
val jsonOptions: JsonOptions = JsonOptions(),
val jdbcOptions: JdbcOptions = JdbcOptions(),
)

public enum class DataSchemaVisibility {
Expand All @@ -45,6 +49,12 @@ public annotation class CsvOptions(
public val delimiter: Char,
)

public annotation class JdbcOptions(
public val user: String = "", // TODO: I'm not sure about the default parameters
public val password: String = "", // TODO: I'm not sure about the default parameters)
public val sqlQuery: String = ""
)

public annotation class JsonOptions(

/** Allows the choice of how to handle type clashes when reading a JSON file. */
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ private const val verify = "verify" // cast(true) is obscure, i think it's bette
private const val readCSV = "readCSV"
private const val readTSV = "readTSV"
private const val readJson = "readJson"
private const val readJdbc = "readJdbc"

public abstract class AbstractDefaultReadMethod(
private val path: String?,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import org.jetbrains.kotlinx.dataframe.schema.ColumnSchema
import org.jetbrains.kotlinx.dataframe.schema.CompareResult
import org.jetbrains.kotlinx.dataframe.schema.DataFrameSchema

internal class DataFrameSchemaImpl(override val columns: Map<String, ColumnSchema>) : DataFrameSchema {
public class DataFrameSchemaImpl(override val columns: Map<String, ColumnSchema>) : DataFrameSchema {

override fun compare(other: DataFrameSchema): CompareResult {
require(other is DataFrameSchemaImpl)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -169,14 +169,15 @@ internal class Integration(
if (version != null) {
dependencies(
"org.jetbrains.kotlinx:dataframe-excel:$version",
"org.jetbrains.kotlinx:dataframe-jdbc:$version",
"org.jetbrains.kotlinx:dataframe-arrow:$version",
"org.jetbrains.kotlinx:dataframe-openapi:$version",
)
}

try {
setMinimalKernelVersion(MIN_KERNEL_VERSION)
} catch (_: NoSuchMethodError) { // will be thrown on version < 0.11.0.198
} catch (_: NoSuchMethodError) { // will be thrown when a version < 0.11.0.198
throw IllegalStateException(
getKernelUpdateMessage(notebook.kernelVersion, MIN_KERNEL_VERSION, notebook.jupyterClientType)
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,24 @@ import org.jetbrains.kotlinx.dataframe.io.JSON

/**
* Annotation preprocessing will generate a DataSchema interface from the data at `path`.
* Data must be of supported format: CSV, JSON, Apache Arrow, Excel, OpenAPI (Swagger) in YAML/JSON.
* Data must be of supported format: CSV, JSON, Apache Arrow, Excel, OpenAPI (Swagger) in YAML/JSON, JDBC.
* Generated data schema has properties inferred from data and a companion object with `read method`.
* `read method` is either `readCSV` or `readJson` that returns `DataFrame<name>`
*
* @param name name of the generated interface
* @param path URL or relative path to data.
* if path starts with protocol (http, https, ftp), it's considered a URL. Otherwise, it's treated as relative path.
* If a path starts with protocol (http, https, ftp, jdbc), it's considered a URL.
* Otherwise, it's treated as a relative path.
* By default, it will be resolved relatively to project dir, i.e. File(projectDir, path)
* You can configure it by passing `dataframe.resolutionDir` option to preprocessor, see https://kotlinlang.org/docs/ksp-quickstart.html#pass-options-to-processors
* You can configure it by passing `dataframe.resolutionDir` option to preprocessor,
* see https://kotlinlang.org/docs/ksp-quickstart.html#pass-options-to-processors
* @param visibility visibility of the generated interface.
* @param normalizationDelimiters if not empty, split property names by delimiters,
* lowercase parts and join to camel case. Set empty list to disable normalization
* @param withDefaultPath if `true`, generate `defaultPath` property to the data schema's companion object and make it default argument for a `read method`
* @param csvOptions options to parse CSV data. Not used when data is not Csv
* @param jsonOptions options to parse JSON data. Not used when data is not Json
* @param jdbcOptions options to parse data from a database via JDBC. Not used when data is not stored in the database
*/
@Retention(AnnotationRetention.SOURCE)
@Target(AnnotationTarget.FILE)
Expand All @@ -35,6 +38,7 @@ public annotation class ImportDataSchema(
val withDefaultPath: Boolean = true,
val csvOptions: CsvOptions = CsvOptions(','),
val jsonOptions: JsonOptions = JsonOptions(),
val jdbcOptions: JdbcOptions = JdbcOptions(),
)

public enum class DataSchemaVisibility {
Expand All @@ -45,6 +49,12 @@ public annotation class CsvOptions(
public val delimiter: Char,
)

public annotation class JdbcOptions(
public val user: String = "", // TODO: I'm not sure about the default parameters
public val password: String = "", // TODO: I'm not sure about the default parameters)
public val sqlQuery: String = ""
)

public annotation class JsonOptions(

/** Allows the choice of how to handle type clashes when reading a JSON file. */
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ private const val verify = "verify" // cast(true) is obscure, i think it's bette
private const val readCSV = "readCSV"
private const val readTSV = "readTSV"
private const val readJson = "readJson"
private const val readJdbc = "readJdbc"

public abstract class AbstractDefaultReadMethod(
private val path: String?,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import org.jetbrains.kotlinx.dataframe.schema.ColumnSchema
import org.jetbrains.kotlinx.dataframe.schema.CompareResult
import org.jetbrains.kotlinx.dataframe.schema.DataFrameSchema

internal class DataFrameSchemaImpl(override val columns: Map<String, ColumnSchema>) : DataFrameSchema {
public class DataFrameSchemaImpl(override val columns: Map<String, ColumnSchema>) : DataFrameSchema {

override fun compare(other: DataFrameSchema): CompareResult {
require(other is DataFrameSchemaImpl)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -169,14 +169,15 @@ internal class Integration(
if (version != null) {
dependencies(
"org.jetbrains.kotlinx:dataframe-excel:$version",
"org.jetbrains.kotlinx:dataframe-jdbc:$version",
"org.jetbrains.kotlinx:dataframe-arrow:$version",
"org.jetbrains.kotlinx:dataframe-openapi:$version",
)
}

try {
setMinimalKernelVersion(MIN_KERNEL_VERSION)
} catch (_: NoSuchMethodError) { // will be thrown on version < 0.11.0.198
} catch (_: NoSuchMethodError) { // will be thrown when a version < 0.11.0.198
throw IllegalStateException(
getKernelUpdateMessage(notebook.kernelVersion, MIN_KERNEL_VERSION, notebook.jupyterClientType)
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ internal class DefaultReadExcelMethod(path: String?) : AbstractDefaultReadMethod
private const val readExcel = "readExcel"

/**
* @param sheetName sheet to read. By default, first sheet in the document
* @param sheetName sheet to read. By default, the first sheet in the document
* @param columns comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”)
* @param skipRows number of rows before header
* @param rowsCount number of rows to read.
Expand All @@ -77,7 +77,7 @@ public fun DataFrame.Companion.readExcel(
}

/**
* @param sheetName sheet to read. By default, first sheet in the document
* @param sheetName sheet to read. By default, the first sheet in the document
* @param columns comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”)
* @param skipRows number of rows before header
* @param rowsCount number of rows to read.
Expand All @@ -97,7 +97,7 @@ public fun DataFrame.Companion.readExcel(
}

/**
* @param sheetName sheet to read. By default, first sheet in the document
* @param sheetName sheet to read. By default, the first sheet in the document
* @param columns comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”)
* @param skipRows number of rows before header
* @param rowsCount number of rows to read.
Expand All @@ -114,7 +114,7 @@ public fun DataFrame.Companion.readExcel(
): AnyFrame = readExcel(asURL(fileOrUrl), sheetName, skipRows, columns, rowsCount, nameRepairStrategy)

/**
* @param sheetName sheet to read. By default, first sheet in the document
* @param sheetName sheet to read. By default, the first sheet in the document
* @param columns comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”)
* @param skipRows number of rows before header
* @param rowsCount number of rows to read.
Expand All @@ -134,7 +134,7 @@ public fun DataFrame.Companion.readExcel(
}

/**
* @param sheetName sheet to read. By default, first sheet in the document
* @param sheetName sheet to read. By default, the first sheet in the document
* @param columns comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”)
* @param skipRows number of rows before header
* @param rowsCount number of rows to read.
Expand Down Expand Up @@ -446,18 +446,18 @@ private fun Cell.setCellValueByGuessedType(any: Any) {

/**
* Set LocalDateTime value correctly also if date have zero value in Excel.
* Zero date is usually used fore storing time component only,
* is displayed as 00.01.1900 in Excel and as 30.12.1899 in LibreOffice Calc and also in POI.
* Zero dates are usually used for storing a time component only,
* are displayed as 00.01.1900 in Excel and as 30.12.1899 in LibreOffice Calc and also in POI.
* POI can not set 1899 year directly.
*/
private fun Cell.setTime(localDateTime: LocalDateTime) {
this.setCellValue(DateUtil.getExcelDate(localDateTime.plusDays(1)) - 1.0)
}

/**
* Set Date value correctly also if date have zero value in Excel.
* Zero date is usually used fore storing time component only,
* is displayed as 00.01.1900 in Excel and as 30.12.1899 in LibreOffice Calc and also in POI.
* Set Date value correctly also if date has zero value in Excel.
* Zero dates are usually used for storing a time component only,
* are displayed as 00.01.1900 in Excel and as 30.12.1899 in LibreOffice Calc and also in POI.
* POI can not set 1899 year directly.
*/
private fun Cell.setDate(date: Date) {
Expand Down
43 changes: 43 additions & 0 deletions dataframe-jdbc/build.gradle.kts
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
plugins {
kotlin("jvm")
kotlin("libs.publisher")
id("org.jetbrains.kotlinx.kover")
kotlin("jupyter.api")
}

group = "org.jetbrains.kotlinx"

val jupyterApiTCRepo: String by project

repositories {
mavenCentral()
maven(jupyterApiTCRepo)
}

dependencies {
api(project(":core"))
implementation(libs.mariadb)
implementation(libs.kotlinLogging)
testImplementation(libs.sqlite)
testImplementation(libs.postgresql)
testImplementation(libs.mysql)
testImplementation(libs.h2db)
testImplementation(libs.junit)
testImplementation(libs.sl4j)
testImplementation(libs.kotestAssertions) {
exclude("org.jetbrains.kotlin", "kotlin-stdlib-jdk8")
}
}

kotlinPublications {
publication {
publicationName.set("dataframeJDBC")
artifactId.set(project.name)
description.set("JDBC support for Kotlin Dataframe")
packageName.set(artifactId)
}
}

kotlin {
explicitApi()
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
package org.jetbrains.kotlinx.dataframe.io

import org.jetbrains.kotlinx.dataframe.AnyFrame
import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.codeGen.AbstractDefaultReadMethod
import org.jetbrains.kotlinx.dataframe.codeGen.DefaultReadDfMethod
import org.jetbrains.kotlinx.jupyter.api.Code
import java.io.File
import java.io.InputStream

// TODO: https://github.com/Kotlin/dataframe/issues/450
public class Jdbc : SupportedCodeGenerationFormat, SupportedDataFrameFormat {
public override fun readDataFrame(stream: InputStream, header: List<String>): AnyFrame = DataFrame.readJDBC(stream)

public override fun readDataFrame(file: File, header: List<String>): AnyFrame = DataFrame.readJDBC(file)
override fun readCodeForGeneration(
stream: InputStream,
name: String,
generateHelperCompanionObject: Boolean
): Code {
TODO("Not yet implemented")
}

override fun readCodeForGeneration(
file: File,
name: String,
generateHelperCompanionObject: Boolean
): Code {
TODO("Not yet implemented")
}

override fun acceptsExtension(ext: String): Boolean = ext == "jdbc"

override fun acceptsSample(sample: SupportedFormatSample): Boolean = true // Extension is enough

override val testOrder: Int = 40000

override fun createDefaultReadMethod(pathRepresentation: String?): DefaultReadDfMethod {
return DefaultReadJdbcMethod(pathRepresentation)
}
}

private fun DataFrame.Companion.readJDBC(stream: File): DataFrame<*> {
TODO("Not yet implemented")
}

private fun DataFrame.Companion.readJDBC(stream: InputStream): DataFrame<*> {
TODO("Not yet implemented")
}

internal class DefaultReadJdbcMethod(path: String?) : AbstractDefaultReadMethod(path, MethodArguments.EMPTY, readJDBC)

private const val readJDBC = "readJDBC"
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
package org.jetbrains.kotlinx.dataframe.io.db

import org.jetbrains.kotlinx.dataframe.io.TableColumnMetadata
import org.jetbrains.kotlinx.dataframe.schema.ColumnSchema
import java.sql.ResultSet
import org.jetbrains.kotlinx.dataframe.io.TableMetadata

/**
* The `DbType` class represents a database type used for reading dataframe from the database.
*
* @property [dbTypeInJdbcUrl] The name of the database as specified in the JDBC URL.
*/
public abstract class DbType(public val dbTypeInJdbcUrl: String) {
/**
* Converts the data from the given [ResultSet] into the specified [TableColumnMetadata] type.
*
* @param rs The [ResultSet] containing the data to be converted.
* @param tableColumnMetadata The [TableColumnMetadata] representing the target type of the conversion.
* @return The converted data as an instance of [Any].
*/
public abstract fun convertDataFromResultSet(rs: ResultSet, tableColumnMetadata: TableColumnMetadata): Any?

/**
* Returns a [ColumnSchema] produced from [tableColumnMetadata].
*/
public abstract fun toColumnSchema(tableColumnMetadata: TableColumnMetadata): ColumnSchema

/**
* Checks if the given table name is a system table for the specified database type.
*
* @param [tableMetadata] the table object representing the table from the database.
* @param [dbType] the database type to check against.
* @return True if the table is a system table for the specified database type, false otherwise.
*/
public abstract fun isSystemTable(tableMetadata: TableMetadata): Boolean

/**
* Builds the table metadata based on the database type and the ResultSet from the query.
*
* @param [tables] the ResultSet containing the table's meta-information.
* @return the TableMetadata object representing the table metadata.
*/
public abstract fun buildTableMetadata(tables: ResultSet): TableMetadata
}
Loading