Skip to content

Converted all the types mentions to the links on the mentioned types pages #278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/DataColumn.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ df.filter { year > 2000 }

<!---END-->

To convert `ColumnAccessor` into [`DataColumn`](DataColumn.md) add values using `withValues` function:
To convert [`ColumnAccessor`](columnAccessorsApi.md) into [`DataColumn`](DataColumn.md) add values using `withValues` function:

<!---FUN columnAccessorToColumn-->

Expand Down
6 changes: 4 additions & 2 deletions docs/StardustDocs/topics/KPropertiesApi.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.ApiLevels-->

[`DataFrame`](DataFrame.md) can be used as an intermediate structure for data transformation between two data formats.
If either source or destination is a Kotlin object, e.g. data class, it is convenient to use its properties for typed data access in [`DataFrame`](DataFrame.md).
If either source or destination is a Kotlin object, e.g. data class, it is convenient to use its properties
for typed data access in [`DataFrame`](DataFrame.md).
This can be done using `::` expression that provides [property references](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.reflect/-k-property/)

<!---FUN kproperties1-->
Expand All @@ -29,7 +30,8 @@ val passengers = DataFrame.read("titanic.csv")

<!---END-->

By default, [`DataFrame`](DataFrame.md) uses `name` and `returnType` of `KProperty` for typed access to data. When column name differs from property name, use `ColumnName` annotation:
By default, [`DataFrame`](DataFrame.md) uses `name` and `returnType` of `KProperty` for typed access to data.
When column name differs from property name, use `@ColumnName` annotation:

<!---FUN kproperties2-->

Expand Down
8 changes: 5 additions & 3 deletions docs/StardustDocs/topics/adjustSchema.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
[//]: # (title: Adjust schema)

[`DataFrame`](DataFrame.md) interface has type argument `T` that doesn't affect contents of `DataFrame`, but marks `DataFrame` with a type that represents data schema that this `DataFrame` is supposed to have.
[`DataFrame`](DataFrame.md) interface has type argument `T` that doesn't affect contents of [`DataFrame`](DataFrame.md),
but marks [`DataFrame`](DataFrame.md) with a type that represents data schema that this [`DataFrame`](DataFrame.md) is supposed to have.
This argument is used to generate [extension properties](extensionPropertiesApi.md) for typed data access.

Actual data in [`DataFrame`](DataFrame.md) may diverge from compile-time schema marker `T` due to dynamic nature of data inside `DataFrame`. However, at some points of code you may know exactly what `DataFrame` schema is expected.
Actual data in [`DataFrame`](DataFrame.md) may diverge from compile-time schema marker `T` due to dynamic nature of data inside [`DataFrame`](DataFrame.md).
However, at some points of code you may know exactly what [`DataFrame`](DataFrame.md) schema is expected.
To match your knowledge with expected real-time [`DataFrame`](DataFrame.md) contents you can use one of two functions:
* [`cast`](cast.md) — change type argument of [`DataFrame`](DataFrame.md) to the expected schema without changing data in `DataFrame`.
* [`cast`](cast.md) — change type argument of [`DataFrame`](DataFrame.md) to the expected schema without changing data in [`DataFrame`](DataFrame.md).
* [`convertTo`](convertTo.md) — convert [`DataFrame`](DataFrame.md) contents to match the expected schema.

12 changes: 7 additions & 5 deletions docs/StardustDocs/topics/apiLevels.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ By nature data frames are dynamic objects, column labels depend on the input sou
or deleted while wrangling. Kotlin, in contrast, is a statically typed language and all types are defined and verified
ahead of execution. That's why creating a flexible, handy, and, at the same time, safe API to a data frame is tricky.

In Kotlin DataFrame library we provide four different ways to access columns, and, while they are essentially different, they
In the Kotlin DataFrame library we provide four different ways to access columns, and, while they are essentially different, they
look pretty similar in the data wrangling DSL.

## List of Access APIs
Expand Down Expand Up @@ -141,16 +141,18 @@ df.add("weight") { ... } // add a new column `weight`, calculated by some expres
We don't need to interrupt a function call chain and declare a column accessor or generate new properties.

In contrast, generated [extension properties](extensionPropertiesApi.md) are the most convenient and the safest API.
Using it, you can always be sure that you work with correct data and types. But its bottleneck is the moment of generation.
To get new extension properties you have to run a cell in a notebook, which could lead to unnecessary variable declarations.
Using it, you can always be sure that you work with correct data and types.
But its bottleneck is the moment of generation.
To get new extension properties you have to run a cell in a notebook,
which could lead to unnecessary variable declarations.
Currently, we are working on compiler a plugin that generates these properties on the fly while typing!

The [Column Accessors API](columnAccessorsApi.md) is a kind of trade-off between safety and needs to be written ahead of
the execution type declaration. It was designed to better be able to write code in an IDE without a notebook experience.
It provides type-safe access to columns but doesn't ensure that the columns really exist in a particular [`DataFrame`](DataFrame.md).
It provides type-safe access to columns but doesn't ensure that the columns really exist in a particular data frame.

The [KProperties API](KPropertiesApi.md) is useful when you already have declared classed in your application business
logic with fields that correspond columns of [`DataFrame`](DataFrame.md).
logic with fields that correspond columns of a data frame.

<table>
<tr>
Expand Down
11 changes: 7 additions & 4 deletions docs/StardustDocs/topics/collectionsInterop.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@ _Kotlin DataFrame_ and _Kotlin Collection_ represent two different approaches to
* [`DataFrame`](DataFrame.md) stores data by fields/columns
* `Collection` stores data by records/rows

Although [`DataFrame`](DataFrame.md) doesn't implement `Collection` or `Iterable` interface, it has many similar operations, such as [`filter`](filter.md), [`take`](sliceRows.md#take), [`first`](first.md), [`map`](map.md), [`groupBy`](groupBy.md) etc.
Although [`DataFrame`](DataFrame.md) doesn't implement [`Collection`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-collection/#kotlin.collections.Collection) or [`Iterable`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-iterable/) interface, it has many similar operations,
such as [`filter`](filter.md), [`take`](sliceRows.md#take), [`first`](first.md), [`map`](map.md), [`groupBy`](groupBy.md) etc.

[`DataFrame`](DataFrame.md) has two-way compatibility with `Map` and `List`:
[`DataFrame`](DataFrame.md) has two-way compatibility with [`Map`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-map/) and [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/):
* `List<T>` -> `DataFrame<T>`: [toDataFrame](createDataFrame.md#todataframe)
* `DataFrame<T>` -> `List<T>`: [toList](toList.md)
* `Map<String, List<*>>` -> `DataFrame<*>`: [toDataFrame](createDataFrame.md#todataframe)
Expand Down Expand Up @@ -68,11 +69,13 @@ val df2 = df.add("c") { a + b }

<tip>

To enable extension properties generation you should use [dataframe plugin](gradle.md) for Gradle or [Kotlin jupyter kernel](installation.md)
To enable extension properties generation you should use [dataframe plugin](gradle.md)
for Gradle or [Kotlin jupyter kernel](installation.md)

</tip>

After data is transformed, [`DataFrame`](DataFrame.md) can be exported into `List` of another data class using [toList](toList.md) or [toListOf](toList.md#tolistof) extensions:
After data is transformed, [`DataFrame`](DataFrame.md) can be exported
into [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/) of another data class using [toList](toList.md) or [toListOf](toList.md#tolistof) extensions:

<!---FUN listInterop4-->

Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/convertTo.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[//]: # (title: convertTo)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->

[Converts](convert.md) columns in [`DataFrame`](DataFrame.md) to match given schema `Schema`
[Converts](convert.md) columns in [`DataFrame`](DataFrame.md) to match given schema [`Schema`](schema.md).

```kotlin
convertTo<Schema>(excessiveColumns = ExcessiveColumns.Keep)
Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/createColumn.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ values.toColumn("data", Infer.Nulls) // type: Any

### toColumnOf

Converts `Iterable` of values into column of given type
Converts [`Iterable`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-iterable/) of values into column of given type

<!---FUN createValueColumnOfType-->

Expand Down
5 changes: 3 additions & 2 deletions docs/StardustDocs/topics/createDataFrame.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ map.toDataFrame()

<!---END-->

`DataFrame` from `Iterable` of objects:
[`DataFrame`](DataFrame.md) from [`Iterable`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-iterable/) of objects:

<!---FUN readDataFrameFromObject-->

Expand All @@ -148,7 +148,8 @@ val df = persons.toDataFrame()

<!---END-->

Scans object properties using reflection and creates [ValueColumn](DataColumn.md#valuecolumn) for every property. Scope of properties for scanning is defined at compile-time by formal types of objects in `Iterable`, so properties of implementation classes will not be scanned.
Scans object properties using reflection and creates [ValueColumn](DataColumn.md#valuecolumn) for every property.
Scope of properties for scanning is defined at compile-time by formal types of objects in [`Iterable`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-iterable/), so properties of implementation classes will not be scanned.

Specify `depth` parameter to perform deep object graph traversal and convert nested objects into [ColumnGroups](DataColumn.md#columngroup) and [FrameColumns](DataColumn.md#framecolumn):

Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/cumSum.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Returns a [`DataFrame`](DataFrame.md) or [`DataColumn`](DataColumn.md) containin
**Available for:**
* [`DataFrame`](DataFrame.md)
* [`DataColumn`](DataColumn.md)
* [`GroupBy`](groupBy.md) — cumulative sum per every data group
* [`GroupBy DataFrame`](groupBy.md#transformation) — cumulative sum per every data group

<!---FUN cumSum-->

Expand Down
3 changes: 2 additions & 1 deletion docs/StardustDocs/topics/duplicate.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ Returns [`DataFrame`](DataFrame.md) with original [`DataRow`](DataRow.md) repeat
DataRow.duplicate(n): DataFrame
```

Returns [`FrameColumn`](DataColumn.md#framecolumn) with original [`DataFrame`](DataFrame.md) repeated `n` times. Resulting `FrameColumn` will have an empty [`name`](DataColumn.md#properties).
Returns [`FrameColumn`](DataColumn.md#framecolumn) with original [`DataFrame`](DataFrame.md) repeated `n` times.
Resulting [`FrameColumn`](DataColumn.md#framecolumn) will have an empty [`name`](DataColumn.md#properties).
```text
DataFrame.duplicate(n): FrameColumn
```
Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/explode.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ df.explode { a and b }

<!---END-->

Explode `DataColumn<Collection>`:
Explode [`DataColumn<Collection>`](DataColumn.md):

<!---FUN explodeColumnList-->

Expand Down
6 changes: 4 additions & 2 deletions docs/StardustDocs/topics/extensionPropertiesApi.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,11 @@ df.add("lastName") { name.split(",").last() }

<!---END-->

Extension properties are generated for DataSchema that is extracted from [`DataFrame`](DataFrame.md) instance after REPL line execution. After that `DataFrame` variable is typed with its own `DataSchema`, so only valid extension properties corresponding to actual columns in DataFrame will be allowed by the compiler and suggested by completion.
Extension properties are generated for [`DataSchema`](schemas.md) that is extracted from [`DataFrame`](DataFrame.md)
instance after REPL line execution.
After that [`DataFrame`](DataFrame.md) variable is typed with its own [`DataSchema`](schemas.md), so only valid extension properties corresponding to actual columns in DataFrame will be allowed by the compiler and suggested by completion.

Also, extension properties [can be generated in IntelliJ IDEA](gradle.md) using [Kotlin Dataframe Gradle plugin](installation.md#data-schema-preprocessor).
Also, extension properties [can be generated in IntelliJ IDEA](gradle.md) using the [Kotlin Dataframe Gradle plugin](installation.md#data-schema-preprocessor).

<warning>
In notebooks generated properties won't appear and be updated until the cell has been executed. It often means that you have to introduce new variable frequently to sync extension properties with actual schema
Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/gather.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ valueTransform: (value) -> R
See [column selectors](ColumnSelectors.md)

Configuration options:
* `explodeLists` — gathered values of type `List` will be exploded into their elements, so `where`, `cast`, `notNull` and `mapValues` will be applied to list elements instead of lists themselves
* `explodeLists` — gathered values of type [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/) will be exploded into their elements, so `where`, `cast`, `notNull` and `mapValues` will be applied to list elements instead of lists themselves
* `cast` — inform compiler about the expected type of gathered elements. This type will be passed to `where` and `mapKeys` lambdas
* `notNull` — skip gathered `null` values
* `where` — filter gathered values
Expand Down
6 changes: 3 additions & 3 deletions docs/StardustDocs/topics/gettingStarted.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
[//]: # (title: Getting started)

Kotlin DataFrame Library is a JVM Kotlin library for in-memory data manipulation.
The Kotlin DataFrame library is a JVM Kotlin library for in-memory data manipulation.

![getting started image](gettingStarted.png)

## Installation
First, pick up the latest version of Kotlin DataFrame Library [here](https://search.maven.org/artifact/org.jetbrains.kotlinx/dataframe).
First, pick up the latest version of the Kotlin DataFrame library [here](https://search.maven.org/artifact/org.jetbrains.kotlinx/dataframe).
* If you wish to play with data in interactive mode, setup [Kotlin Kernel for Jupyter](installation.md#jupyter-notebook) and run DataFrame there
* If you have some JVM project, just add a dependency on Kotlin DataFrame library like it's described on [Maven Central search](https://search.maven.org/artifact/org.jetbrains.kotlinx/dataframe) site
* If you have some JVM project, just add a dependency on the Kotlin DataFrame library like it's described on [Maven Central search](https://search.maven.org/artifact/org.jetbrains.kotlinx/dataframe) site

## Examples
Hope that this documentation will help you to implement all the things you want to do with your data. To get inspiration, take a look at [examples](https://github.com/Kotlin/dataframe/tree/master/examples) folder. [Puzzles](https://github.com/Kotlin/dataframe/blob/master/examples/jupyter-notebooks/puzzles/40%20puzzles.ipynb) will quickly familiarize you with the power of DataFrame, and other notebooks and projects will show you some applications of DataFrame in practical data analysis.
Expand Down
16 changes: 8 additions & 8 deletions docs/StardustDocs/topics/gradle.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,21 @@

<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->

In Gradle project Kotlin DataFrame library provides
In Gradle project the Kotlin DataFrame library provides

1. Annotation processing for generation of extension properties
2. Annotation processing for `DataSchema` inference from datasets.
3. Gradle task for `DataSchema` inference from datasets.
2. Annotation processing for [`DataSchema`](schemas.md) inference from datasets.
3. Gradle task for [`DataSchema`](schemas.md) inference from datasets.

### Configuration

To use [extension properties API](extensionPropertiesApi.md) in Gradle project you
should [configure Kotlin DataFrame plugin](installation.md#data-schema-preprocessor).
should [configure the Kotlin DataFrame plugin](installation.md#data-schema-preprocessor).

### Annotation processing

Declare data schemas in your code and use them to access data in [`DataFrames`](DataFrame.md).
A data schema is a class or interface annotated with `@DataSchema`:
A data schema is a class or interface annotated with [`@DataSchema`](schemas.md):

```kotlin
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
Expand Down Expand Up @@ -51,7 +51,7 @@ Specify schema with preferred method and execute the `build` task.
<tabs>
<tab title="Method 1. Annotation processing">

ImportDataSchema annotation must be above package directive. You can put this annotation in the same file as data
`@ImportDataSchema` annotation must be above package directive. You can put this annotation in the same file as data
processing code. You can import schema from URL or relative path of the file. Relative path by default is resolved to
project root directory. You can configure it
by [passing](https://kotlinlang.org/docs/ksp-quickstart.html#pass-options-to-processors) `dataframe.resolutionDir`
Expand All @@ -72,7 +72,7 @@ the same package as file containing the annotation.
import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
```

See KDocs for `ImportDataSchema` in IDE
See KDocs for `@ImportDataSchema` in IDE
or [github](https://github.com/Kotlin/dataframe/blob/master/core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/annotations/ImportDataSchema.kt)
for more details

Expand Down Expand Up @@ -149,7 +149,7 @@ dataframes {
```

The only difference is that the name provided is now irrelevant, since the type names are provided by the OpenAPI spec.
(If you were wondering, yes, Kotlin DataFrame library can tell the difference between an OpenAPI spec and normal JSON data)
(If you were wondering, yes, the Kotlin DataFrame library can tell the difference between an OpenAPI spec and normal JSON data)

After importing the data schema, you can now start to import any JSON data you like using the generated schemas.
For instance, one of the types in the schema above is `PetStore.Pet` (which can also be
Expand Down
8 changes: 4 additions & 4 deletions docs/StardustDocs/topics/groupBy.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ Returns `GroupBy` object.

## Transformation

`GroupBy` is a [`DataFrame`](DataFrame.md) with one chosen [`FrameColumn`](DataColumn.md#framecolumn) containing data groups.
`GroupBy DataFrame` is a [`DataFrame`](DataFrame.md) with one chosen [`FrameColumn`](DataColumn.md#framecolumn) containing data groups.

It supports the following operations:
* [`add`](add.md)
Expand All @@ -86,7 +86,7 @@ It supports the following operations:
* [`pivot`](pivot.md#pivot-groupby)
* [`concat`](concat.md)

Any [`DataFrame`](DataFrame.md) with `FrameColumn` can be reinterpreted as `GroupBy`:
Any [`DataFrame`](DataFrame.md) with `FrameColumn` can be reinterpreted as `GroupBy DataFrame`:

<!---FUN dataFrameToGroupBy-->

Expand All @@ -100,7 +100,7 @@ df.asGroupBy { data } // convert dataframe to GroupBy by interpreting 'data' col

<!---END-->

And any `GroupBy` can be reinterpreted as `DataFrame` with `FrameColumn`:
And any [`GroupBy DataFrame`](groupBy.md#transformation) can be reinterpreted as [`DataFrame`](DataFrame.md) with `FrameColumn`:

<!---FUN groupByToFrame-->

Expand Down Expand Up @@ -230,7 +230,7 @@ df.groupBy("city").aggregate { maxBy("age")["name"] }
</tab></tabs>
<!---END-->

Most common aggregation functions can be computed directly at `GroupBy`:
Most common aggregation functions can be computed directly at [`GroupBy DataFrame`](groupBy.md#transformation) :

<!---FUN groupByDirectAggregations-->
<tabs>
Expand Down
Loading