-
Notifications
You must be signed in to change notification settings - Fork 73
Extracted subchapters #391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,248 +11,33 @@ It ignores order of columns in [`DataFrame`](DataFrame.md), but tracks column hi | |
|
||
In Jupyter environment compile-time [`DataFrame`](DataFrame.md) schema is synchronized with real-time data after every cell execution. | ||
|
||
In IDEA projects, you can use the [Gradle plugin](gradle.md#configuration) to extract schema from the dataset | ||
In IDEA projects, you can use the [Gradle plugin](schemasGradle.md#configuration) to extract schema from the dataset | ||
and generate extension properties. | ||
|
||
## DataSchema workflow in Jupyter | ||
|
||
After execution of cell | ||
## Popular use cases with Data Schemas | ||
|
||
<!---FUN createDfNullable--> | ||
Here's a list of the most popular use cases with Data Schemas. | ||
zaleslaw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```kotlin | ||
val df = dataFrameOf("name", "age")( | ||
"Alice", 15, | ||
"Bob", null | ||
) | ||
``` | ||
* [**Data Schemas in Gradle projects**](schemasGradle.md) <br/> | ||
If you are developing a server application and building it with Gradle. | ||
|
||
<!---END--> | ||
* [**DataSchema workflow in Jupyter**](schemasJupyter.md) <br/> | ||
If you prefer Notebooks. | ||
|
||
the following actions take place: | ||
* [**Schema inheritance**](schemasInheritance.md) <br/> | ||
It's worth knowing how to reuse Data Schemas generated earlier. | ||
|
||
1. Columns in `df` are analyzed to extract data schema | ||
2. Empty interface with [`DataSchema`](schema.md) annotation is generated: | ||
* [**Custom Data Schemas**](schemasCustom.md) <br/> | ||
Sometimes it is necessary to create your own scheme. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. *schema There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but we call it "schema" everywhere else. Also "scheme" means something different than "schema". A scheme is a plan, an idea. A schema is a written down or drawn-out idea. So schema is rightfully used across the docs. |
||
|
||
```kotlin | ||
@DataSchema | ||
interface DataFrameType | ||
``` | ||
* [**Use external Data Schemas in Jupyter**](schemasExternalJupyter.md) <br/> | ||
Sometimes it is convenient to extract reusable code from Jupyter Notebook into the Kotlin JVM library. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. *a Kotiln JVM library There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All grammar checkers including Grammarly and Grazie put here the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, but they don't know the context and concepts. They don't know whether there are one or multiple Kotlin JVM libraries. |
||
Schema interfaces should also be extracted if this code uses Custom Data Schemas. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. custom should not have a capital C There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, probably all the words should be or not should be capitalized There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. custom data schemas There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. agreed :) |
||
|
||
3. Extension properties for this [`DataSchema`](schema.md) are generated: | ||
* [**Import OpenAPI Schemas in Gradle project**](schemasImportOpenApiGradle.md) <br/> | ||
When you need to take data from the endpoint with OpenAPI Schema. | ||
zaleslaw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```kotlin | ||
val ColumnsContainer<DataFrameType>.age: DataColumn<Int?> @JvmName("DataFrameType_age") get() = this["age"] as DataColumn<Int?> | ||
val DataRow<DataFrameType>.age: Int? @JvmName("DataFrameType_age") get() = this["age"] as Int? | ||
val ColumnsContainer<DataFrameType>.name: DataColumn<String> @JvmName("DataFrameType_name") get() = this["name"] as DataColumn<String> | ||
val DataRow<DataFrameType>.name: String @JvmName("DataFrameType_name") get() = this["name"] as String | ||
``` | ||
|
||
Every column produces two extension properties: | ||
|
||
* Property for `ColumnsContainer<DataFrameType>` returns column | ||
* Property for `DataRow<DataFrameType>` returns cell value | ||
|
||
4. `df` variable is typed by schema interface: | ||
|
||
```kotlin | ||
val temp = df | ||
``` | ||
|
||
```kotlin | ||
val df = temp.cast<DataFrameType>() | ||
``` | ||
|
||
> _Note, that object instance after casting remains the same. See [cast](cast.md). | ||
|
||
To log all these additional code executions, use cell magic | ||
|
||
``` | ||
%trackExecution -all | ||
``` | ||
|
||
## Schema inheritance | ||
|
||
In order to reduce amount of generated code, previously generated [`DataSchema`](schema.md) interfaces are reused and only new | ||
properties are introduced | ||
|
||
Let's filter out all `null` values from `age` column and add one more column of type `Boolean`: | ||
|
||
```kotlin | ||
val filtered = df.filter { age != null }.add("isAdult") { age!! > 18 } | ||
``` | ||
|
||
New schema interface for `filtered` variable will be derived from previously generated `DataFrameType`: | ||
|
||
```kotlin | ||
@DataSchema | ||
interface DataFrameType1 : DataFrameType | ||
``` | ||
|
||
Extension properties for data access are generated only for new and overriden members of `DataFrameType1` interface: | ||
|
||
```kotlin | ||
val ColumnsContainer<DataFrameType1>.age: DataColumn<Int> get() = this["age"] as DataColumn<Int> | ||
val DataRow<DataFrameType1>.age: Int get() = this["age"] as Int | ||
val ColumnsContainer<DataFrameType1>.isAdult: DataColumn<Boolean> get() = this["isAdult"] as DataColumn<Boolean> | ||
val DataRow<DataFrameType1>.isAdult: String get() = this["isAdult"] as Boolean | ||
``` | ||
|
||
Then variable `filtered` is cast to new interface: | ||
|
||
```kotlin | ||
val temp = filtered | ||
``` | ||
|
||
```kotlin | ||
val filtered = temp.cast<DataFrameType1>() | ||
``` | ||
|
||
## Custom data schemas | ||
|
||
You can define your own [`DataSchema`](schema.md) interfaces and use them in functions and classes to represent [`DataFrame`](DataFrame.md) with | ||
specific set of columns: | ||
|
||
```kotlin | ||
@DataSchema | ||
interface Person { | ||
val name: String | ||
val age: Int | ||
} | ||
``` | ||
|
||
After execution of this cell in Jupyter or annotation processing in IDEA, extension properties for data access will be | ||
generated. Now we can use these properties to create functions for typed [`DataFrame`](DataFrame.md): | ||
|
||
```kotlin | ||
fun DataFrame<Person>.splitName() = split { name }.by(",").into("firstName", "lastName") | ||
fun DataFrame<Person>.adults() = filter { age > 18 } | ||
``` | ||
|
||
In Jupyter these functions will work automatically for any [`DataFrame`](DataFrame.md) that matches `Person` schema: | ||
|
||
<!---FUN extendedDf--> | ||
|
||
```kotlin | ||
val df = dataFrameOf("name", "age", "weight")( | ||
"Merton, Alice", 15, 60.0, | ||
"Marley, Bob", 20, 73.5 | ||
) | ||
``` | ||
|
||
<!---END--> | ||
|
||
Schema of `df` is compatible with `Person`, so auto-generated schema interface will inherit from it: | ||
|
||
```kotlin | ||
@DataSchema(isOpen = false) | ||
interface DataFrameType : Person | ||
|
||
val ColumnsContainer<DataFrameType>.weight: DataColumn<Double> get() = this["weight"] as DataColumn<Double> | ||
val DataRow<DataFrameType>.weight: Double get() = this["weight"] as Double | ||
``` | ||
|
||
Despite `df` has additional column `weight`, previously defined functions for `DataFrame<Person>` will work for it: | ||
|
||
<!---FUN splitNameWorks--> | ||
|
||
```kotlin | ||
df.splitName() | ||
``` | ||
|
||
<!---END--> | ||
|
||
```text | ||
firstName lastName age weight | ||
Merton Alice 15 60.000 | ||
Marley Bob 20 73.125 | ||
``` | ||
|
||
<!---FUN adultsWorks--> | ||
|
||
```kotlin | ||
df.adults() | ||
``` | ||
|
||
<!---END--> | ||
|
||
```text | ||
name age weight | ||
Marley, Bob 20 73.5 | ||
``` | ||
|
||
In JVM project you will have to [cast](cast.md) [`DataFrame`](DataFrame.md) explicitly to the target interface: | ||
|
||
```kotlin | ||
df.cast<Person>().splitName() | ||
``` | ||
|
||
## Use external data schemas in Jupyter | ||
|
||
Sometimes it is convenient to extract reusable code from Jupyter notebook into Kotlin JVM library. If this code | ||
uses [Custom data schemas](#custom-data-schemas), schema interfaces should also be extracted. In order to enable support | ||
them in Jupyter, you should register them in | ||
library [integration class](https://github.com/Kotlin/kotlin-jupyter/blob/master/docs/libraries.md) with `useSchema` | ||
function: | ||
|
||
```kotlin | ||
@DataSchema | ||
interface Person { | ||
val name: String | ||
val age: Int | ||
} | ||
|
||
fun DataFrame<Person>.countAdults() = count { it[Person::age] > 18 } | ||
|
||
@JupyterLibrary | ||
internal class Integration : JupyterIntegration() { | ||
|
||
override fun Builder.onLoaded() { | ||
onLoaded { | ||
useSchema<Person>() | ||
} | ||
} | ||
} | ||
``` | ||
|
||
After loading this library into Jupyter notebook, schema interfaces for all [`DataFrame`](DataFrame.md) variables that match `Person` | ||
schema will derive from `Person` | ||
|
||
<!---FUN createDf--> | ||
|
||
```kotlin | ||
val df = dataFrameOf("name", "age")( | ||
"Alice", 15, | ||
"Bob", 20 | ||
) | ||
``` | ||
|
||
<!---END--> | ||
|
||
Now `df` is assignable to `DataFrame<Person>` and `countAdults` is available: | ||
|
||
```kotlin | ||
df.countAdults() | ||
``` | ||
|
||
## Import Data Schemas, e.g. from OpenAPI, in Jupyter | ||
|
||
Similar to [importing OpenAPI data schemas in Gradle projects](gradle.md#openapi-schemas), you can also | ||
do this in Jupyter notebooks. There is only a slight difference in notation: | ||
|
||
Import the schema using any path (`String`), `URL`, or `File`: | ||
|
||
```kotlin | ||
val PetStore = importDataSchema("https://petstore3.swagger.io/api/v3/openapi.json") | ||
``` | ||
|
||
and then from next cell you run and onwards, you can call, for example: | ||
|
||
```kotlin | ||
val df = PetStore.Pet.readJson("https://petstore3.swagger.io/api/v3/pet/findByStatus?status=available") | ||
``` | ||
|
||
So, very similar indeed! | ||
|
||
(Note: The type of `PetStore` will be generated as `PetStoreDataSchema`, but this doesn't affect the way you can use | ||
it.) | ||
* [**Import Data Schemas, e.g. from OpenAPI, in Jupyter**](schemasImportOpenApiJupyter.md) <br/> | ||
Similar to [importing OpenAPI Data Schemas in Gradle projects](schemasImportOpenApiGradle.md), | ||
you can also do this in Jupyter Notebooks. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
[//]: # (title: Custom Data Schemas) | ||
|
||
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas--> | ||
|
||
You can define your own [`DataSchema`](schema.md) interfaces and use them in functions and classes to represent [`DataFrame`](DataFrame.md) with | ||
specific set of columns: | ||
|
||
```kotlin | ||
@DataSchema | ||
interface Person { | ||
val name: String | ||
val age: Int | ||
} | ||
``` | ||
|
||
After execution of this cell in Jupyter or annotation processing in IDEA, extension properties for data access will be | ||
generated. Now we can use these properties to create functions for typed [`DataFrame`](DataFrame.md): | ||
|
||
```kotlin | ||
fun DataFrame<Person>.splitName() = split { name }.by(",").into("firstName", "lastName") | ||
fun DataFrame<Person>.adults() = filter { age > 18 } | ||
``` | ||
|
||
In Jupyter these functions will work automatically for any [`DataFrame`](DataFrame.md) that matches `Person` schema: | ||
|
||
<!---FUN extendedDf--> | ||
|
||
```kotlin | ||
val df = dataFrameOf("name", "age", "weight")( | ||
"Merton, Alice", 15, 60.0, | ||
"Marley, Bob", 20, 73.5 | ||
) | ||
``` | ||
|
||
<!---END--> | ||
|
||
Schema of `df` is compatible with `Person`, so auto-generated schema interface will inherit from it: | ||
|
||
```kotlin | ||
@DataSchema(isOpen = false) | ||
interface DataFrameType : Person | ||
|
||
val ColumnsContainer<DataFrameType>.weight: DataColumn<Double> get() = this["weight"] as DataColumn<Double> | ||
val DataRow<DataFrameType>.weight: Double get() = this["weight"] as Double | ||
``` | ||
|
||
Despite `df` has additional column `weight`, previously defined functions for `DataFrame<Person>` will work for it: | ||
|
||
<!---FUN splitNameWorks--> | ||
|
||
```kotlin | ||
df.splitName() | ||
``` | ||
|
||
<!---END--> | ||
|
||
```text | ||
firstName lastName age weight | ||
Merton Alice 15 60.000 | ||
Marley Bob 20 73.125 | ||
``` | ||
|
||
<!---FUN adultsWorks--> | ||
|
||
```kotlin | ||
df.adults() | ||
``` | ||
|
||
<!---END--> | ||
|
||
```text | ||
name age weight | ||
Marley, Bob 20 73.5 | ||
``` | ||
|
||
In JVM project you will have to [cast](cast.md) [`DataFrame`](DataFrame.md) explicitly to the target interface: | ||
|
||
```kotlin | ||
df.cast<Person>().splitName() | ||
``` |
Uh oh!
There was an error while loading. Please reload this page.