japila-books
diff --git a/‎docs/Analyzer.md
+1-1 b/‎docs/Analyzer.md
+1-1
diff --git a/‎docs/CacheManager.md
+6-6 b/‎docs/CacheManager.md
+6-6
diff --git a/‎docs/CatalystSerde.md
+1-1 b/‎docs/CatalystSerde.md
+1-1
diff --git a/‎docs/Column.md
+1-1 b/‎docs/Column.md
+1-1
diff --git a/‎docs/DataFrameNaFunctions.md
+1-1 b/‎docs/DataFrameNaFunctions.md
+1-1
diff --git a/‎docs/DataFrameStatFunctions.md
+1-1 b/‎docs/DataFrameStatFunctions.md
+1-1
diff --git a/‎docs/DataFrameWriter.md
+9-9 b/‎docs/DataFrameWriter.md
+9-9
diff --git a/‎docs/DataFrameWriterV2.md
+3-3 b/‎docs/DataFrameWriterV2.md
+3-3
diff --git a/‎docs/Dataset.md
+10-31 b/‎docs/Dataset.md
+10-31
diff --git a/‎docs/Encoder.md
+1-1 b/‎docs/Encoder.md
+1-1
@@ -170,7 +170,7 @@ scala> :type spark.sessionState.analyzer
 org.apache.spark.sql.catalyst.analysis.Analyzer
 ```
 
-You can access the analyzed logical plan of a structured query using [Dataset.explain](dataset-operators.md#explain) basic action (with `extended` flag enabled) or SQL's `EXPLAIN EXTENDED` SQL command.
+You can access the analyzed logical plan of a structured query using [Dataset.explain](dataset/index.md#explain) basic action (with `extended` flag enabled) or SQL's `EXPLAIN EXTENDED` SQL command.
 
 ```text
 // sample structured query
 
@@ -12,7 +12,7 @@ spark.sharedState.cacheManager
 
 ## Dataset.cache and persist Operators
 
-A structured query (as [Dataset](Dataset.md)) can be [cached](#cacheQuery) and registered with `CacheManager` using [Dataset.cache](caching-and-persistence.md#cache) or [Dataset.persist](caching-and-persistence.md#persist) high-level operators.
+A structured query (as [Dataset](dataset/index.md)) can be [cached](#cacheQuery) and registered with `CacheManager` using [Dataset.cache](caching-and-persistence.md#cache) or [Dataset.persist](caching-and-persistence.md#persist) high-level operators.
 
 ## <span id="CachedData"> Cached Queries { #cachedData }
 
@@ -92,7 +92,7 @@ lookupCachedData(
 
 `lookupCachedData` is used when:
 
-* [Dataset.storageLevel](dataset-operators.md#storageLevel) action is used
+* [Dataset.storageLevel](dataset/index.md#storageLevel) action is used
 * `CatalogImpl` is requested to [isCached](CatalogImpl.md#isCached)
 * `CacheManager` is requested to [cacheQuery](#cacheQuery) and [useCachedData](#useCachedData)
 
@@ -116,7 +116,7 @@ uncacheQuery(
 
 `uncacheQuery` is used when:
 
-* [Dataset.unpersist](dataset-operators.md#unpersist) basic action is used
+* [Dataset.unpersist](dataset/index.md#unpersist) basic action is used
 * `DropTableCommand` and [TruncateTableCommand](logical-operators/TruncateTableCommand.md) logical commands are executed
 * `CatalogImpl` is requested to [uncache](CatalogImpl.md#uncacheTable) and [refresh](CatalogImpl.md#refreshTable) a table or view, [dropTempView](CatalogImpl.md#dropTempView) and [dropGlobalTempView](CatalogImpl.md#dropGlobalTempView)
 
@@ -129,9 +129,9 @@ cacheQuery(
   storageLevel: StorageLevel = MEMORY_AND_DISK): Unit
 ```
 
-`cacheQuery` adds the [analyzed logical plan](Dataset.md#logicalPlan) of the input [Dataset](Dataset.md) to the [cachedData](#cachedData) internal registry of cached queries.
+`cacheQuery` adds the [analyzed logical plan](dataset/index.md#logicalPlan) of the input [Dataset](dataset/index.md) to the [cachedData](#cachedData) internal registry of cached queries.
 
-Internally, `cacheQuery` requests the `Dataset` for the [analyzed logical plan](Dataset.md#logicalPlan) and creates a [InMemoryRelation](logical-operators/InMemoryRelation.md) with the following:
+Internally, `cacheQuery` requests the `Dataset` for the [analyzed logical plan](dataset/index.md#logicalPlan) and creates a [InMemoryRelation](logical-operators/InMemoryRelation.md) with the following:
 
 * [spark.sql.inMemoryColumnarStorage.compressed](configuration-properties.md#spark.sql.inMemoryColumnarStorage.compressed) configuration property
 * [spark.sql.inMemoryColumnarStorage.batchSize](configuration-properties.md#spark.sql.inMemoryColumnarStorage.batchSize) configuration property
@@ -152,7 +152,7 @@ Asked to cache already cached data.
 
 `cacheQuery` is used when:
 
-* [Dataset.persist](dataset-operators.md#persist) basic action is used
+* [Dataset.persist](dataset/index.md#persist) basic action is used
 * `CatalogImpl` is requested to [cache](CatalogImpl.md#cacheTable) and [refresh](CatalogImpl.md#refreshTable) a table or view in-memory
 
 ## Clearing Cache { #clearCache }
 
@@ -17,7 +17,7 @@ Internally, `deserialize` creates an `UnresolvedDeserializer` for the deserializ
 
 `deserialize` is used when:
 
-* `Dataset` is requested for a [QueryExecution](Dataset.md#rddQueryExecution)
+* `Dataset` is requested for a [QueryExecution](dataset/index.md#rddQueryExecution)
 * `ExpressionEncoder` is requested to [resolveAndBind](ExpressionEncoder.md#resolveAndBind)
 * `MapPartitions` utility is used to [apply](logical-operators/MapPartitions.md#apply)
 * `MapElements` utility is used to `apply`
 
@@ -335,6 +335,6 @@ named: NamedExpression
 
 `named` is used when the following operators are used:
 
-* [Dataset.select](dataset-operators.md#select)
+* [Dataset.select](dataset/index.md#select)
 * [KeyValueGroupedDataset.agg](KeyValueGroupedDataset.md#agg)
 -->
@@ -7,7 +7,7 @@ subtitle: Working With Missing Data
 
 `DataFrameNaFunctions` is used to work with missing data in a [DataFrame](DataFrame.md).
 
-`DataFrameNaFunctions` is available using [na](dataset-untyped-transformations.md#na) untyped transformation.
+`DataFrameNaFunctions` is available using [na](dataset/untyped-transformations.md#na) untyped transformation.
 
 ```scala
 val q: DataFrame = ...
 
@@ -6,7 +6,7 @@ title: DataFrameStatFunctions
 
 `DataFrameStatFunctions` API gives the statistic functions to be used in a structured query.
 
-`DataFrameStatFunctions` is available using [stat](dataset-untyped-transformations.md#stat) untyped transformation.
+`DataFrameStatFunctions` is available using [stat](dataset/untyped-transformations.md#stat) untyped transformation.
 
 ```scala
 val q: DataFrame = ...
 
@@ -8,13 +8,13 @@
 
 `DataFrameWriter` ends description of a write specification and does trigger a Spark job (unlike [DataFrameWriter](DataFrameWriter.md)).
 
-`DataFrameWriter` is available using [Dataset.write](Dataset.md#write) operator.
+`DataFrameWriter` is available using [Dataset.write](dataset/index.md#write) operator.
 
 ## Creating Instance
 
 `DataFrameWriter` takes the following to be created:
 
-* <span id="ds"> [Dataset](Dataset.md)
+* <span id="ds"> [Dataset](dataset/index.md)
 
 ### Demo
 
@@ -29,7 +29,7 @@ assert(writer.isInstanceOf[DataFrameWriter])
 
 ## DataFrame { #df }
 
-When [created](#creating-instance), `DataFrameWriter` converts the [Dataset](#ds) to a [DataFrame](Dataset.md#toDF).
+When [created](#creating-instance), `DataFrameWriter` converts the [Dataset](#ds) to a [DataFrame](dataset/index.md#toDF).
 
 ## <span id="format"> Name of Data Source { #source }
 
@@ -55,7 +55,7 @@ insertInto(
   tableName: String): Unit
 ```
 
-`insertInto` requests the [DataFrame](#df) for the [SparkSession](Dataset.md#sparkSession).
+`insertInto` requests the [DataFrame](#df) for the [SparkSession](dataset/index.md#sparkSession).
 
 `insertInto` tries to [look up the TableProvider](#lookupV2Provider) for the [data source](#source).
 
@@ -106,7 +106,7 @@ saveAsTable(
   tableName: String): Unit
 ```
 
-`saveAsTable` requests the [DataFrame](#df) for the [SparkSession](Dataset.md#sparkSession).
+`saveAsTable` requests the [DataFrame](#df) for the [SparkSession](dataset/index.md#sparkSession).
 
 `saveAsTable` tries to [look up the TableProvider](#lookupV2Provider) for the [data source](#source).
 
@@ -174,7 +174,7 @@ Saves a `DataFrame` (the result of executing a structured query) to a data sourc
 Internally, `save` uses `DataSource` to [look up the class of the requested data source](DataSource.md#lookupDataSource) (for the [source](#source) option and the [SQLConf](SessionState.md#conf)).
 
 !!! note
-    `save` uses [SparkSession](Dataset.md#sparkSession) to access the [SessionState](SparkSession.md#sessionState) and in turn the [SQLConf](SessionState.md#conf).
+    `save` uses [SparkSession](dataset/index.md#sparkSession) to access the [SessionState](SparkSession.md#sessionState) and in turn the [SQLConf](SessionState.md#conf).
 
     ```text
     val df: DataFrame = ???
@@ -279,10 +279,10 @@ partitioningAsV2: Seq[Transform]
 saveToV1Source(): Unit
 ```
 
-`saveToV1Source` creates a [DataSource](DataSource.md#apply) (for the [source](#source) class name, the [partitioningColumns](#partitioningColumns) and the [extraOptions](#extraOptions)) and requests it for the [logical command for writing](DataSource.md#planForWriting) (with the [mode](#mode) and the [analyzed logical plan](Dataset.md#logicalPlan) of the structured query).
+`saveToV1Source` creates a [DataSource](DataSource.md#apply) (for the [source](#source) class name, the [partitioningColumns](#partitioningColumns) and the [extraOptions](#extraOptions)) and requests it for the [logical command for writing](DataSource.md#planForWriting) (with the [mode](#mode) and the [analyzed logical plan](dataset/index.md#logicalPlan) of the structured query).
 
 !!! note
-    While requesting the [analyzed logical plan](Dataset.md#logicalPlan) of the structured query, `saveToV1Source` triggers execution of logical commands.
+    While requesting the [analyzed logical plan](dataset/index.md#logicalPlan) of the structured query, `saveToV1Source` triggers execution of logical commands.
 
 In the end, `saveToV1Source` [runs the logical command for writing](#runCommand).
 
@@ -336,7 +336,7 @@ createTable(
 
 `createTable` creates a [CatalogTable](CatalogTable.md) (with the [bucketSpec](CatalogTable.md#bucketSpec) per [getBucketSpec](#getBucketSpec)).
 
-In the end, `createTable` creates a [CreateTable](logical-operators/CreateTable.md) logical command (with the `CatalogTable`, [mode](#mode) and the [logical query plan](Dataset.md#planWithBarrier) of the [dataset](#df)) and [runs](#runCommand) it.
+In the end, `createTable` creates a [CreateTable](logical-operators/CreateTable.md) logical command (with the `CatalogTable`, [mode](#mode) and the [logical query plan](dataset/index.md#planWithBarrier) of the [dataset](#df)) and [runs](#runCommand) it.
 
 ---
 
 
@@ -1,6 +1,6 @@
 # DataFrameWriterV2
 
-`DataFrameWriterV2` is an API for Spark SQL developers to describe how to write a [Dataset](Dataset.md) to an external storage using the DataSource V2.
+`DataFrameWriterV2` is an API for Spark SQL developers to describe how to write a [Dataset](dataset/index.md) to an external storage using the DataSource V2.
 
 `DataFrameWriterV2` is a [CreateTableWriter](CreateTableWriter.md) (and thus a [WriteConfigMethods](WriteConfigMethods.md)).
 
@@ -21,11 +21,11 @@ org.apache.spark.sql.DataFrameWriterV2[Long]
 `DataFrameWriterV2` takes the following to be created:
 
 * Name of the target table (_multi-part table identifier_)
-* [Dataset](Dataset.md)
+* [Dataset](dataset/index.md)
 
 `DataFrameWriterV2` is created when:
 
-* [Dataset.writeTo](Dataset.md#writeTo) operator is used
+* [Dataset.writeTo](dataset/index.md#writeTo) operator is used
 
 ## <span id="append"> append
 
 
@@ -1,14 +1,10 @@
----
-title: Dataset
----
-
 # Dataset
 
 `Dataset[T]` is a strongly-typed data structure that represents a structured query over rows of `T` type.
 
-`Dataset` is created using [SQL](sql/index.md) or [Dataset](dataset-operators.md) high-level declarative "languages".
+`Dataset` is created using [SQL](sql/index.md) or [Dataset](dataset/index.md) high-level declarative "languages".
 
-![Dataset's Internals](images/spark-sql-Dataset.png)
+![Dataset's Internals](images/Dataset.png)
 
 It is fair to say that `Dataset` is a Spark SQL developer-friendly layer over the following two low-level entities:
 
@@ -29,23 +25,6 @@ It is fair to say that `Dataset` is a Spark SQL developer-friendly layer over th
 
 When created, `Dataset` requests [QueryExecution](#queryExecution) to [assert analyzed phase is successful](QueryExecution.md#assertAnalyzed).
 
-`Dataset` is created when:
-
-* [Dataset.apply](#apply) (for a [LogicalPlan](logical-operators/LogicalPlan.md) and a [SparkSession](SparkSession.md) with the [Encoder](Encoder.md) in a Scala implicit scope)
-
-* [Dataset.ofRows](#ofRows) (for a [LogicalPlan](logical-operators/LogicalPlan.md) and a [SparkSession](SparkSession.md))
-
-* [Dataset.toDF](dataset-untyped-transformations.md#toDF) untyped transformation is used
-
-* [Dataset.select](dataset-typed-transformations.md#select), [Dataset.randomSplit](dataset-typed-transformations.md#randomSplit) and [Dataset.mapPartitions](dataset-typed-transformations.md#mapPartitions) typed transformations are used
-
-* [KeyValueGroupedDataset.agg](KeyValueGroupedDataset.md#agg) operator is used (that requests `KeyValueGroupedDataset` to [aggUntyped](KeyValueGroupedDataset.md#aggUntyped))
-
-* [SparkSession.emptyDataset](SparkSession.md#emptyDataset) and [SparkSession.range](SparkSession.md#range) operators are used
-
-* `CatalogImpl` is requested to
-[makeDataset](CatalogImpl.md#makeDataset) (when requested to [list databases](CatalogImpl.md#listDatabases), [tables](CatalogImpl.md#listTables), [functions](CatalogImpl.md#listFunctions) and [columns](CatalogImpl.md#listColumns))
-
 ## observe
 
 ```scala
@@ -452,7 +431,7 @@ dataset.filter('value % 2 === 0).count
 dataset.filter("value % 2 = 0").count
 ```
 
-The <<dataset-operators.md#, Dataset API>> offers declarative and type-safe operators that makes for an improved experience for data processing (comparing to [DataFrames](DataFrame.md) that were a set of index- or column name-based [Row](Row.md)s).
+The <<dataset/index.md#, Dataset API>> offers declarative and type-safe operators that makes for an improved experience for data processing (comparing to [DataFrames](DataFrame.md) that were a set of index- or column name-based [Row](Row.md)s).
 
 `Dataset` offers convenience of RDDs with the performance optimizations of DataFrames and the strong static type-safety of Scala. The last feature of bringing the strong type-safety to [DataFrame](DataFrame.md) makes Dataset so appealing. All the features together give you a more functional programming interface to work with structured data.
 
@@ -504,13 +483,13 @@ A `Dataset` is <<Queryable, Queryable>> and `Serializable`, i.e. can be saved to
 
 NOTE: SparkSession.md[SparkSession] and [QueryExecution](QueryExecution.md) are transient attributes of a `Dataset` and therefore do not participate in Dataset serialization. The only _firmly-tied_ feature of a `Dataset` is the [Encoder](Encoder.md).
 
-You can request the ["untyped" view](dataset-operators.md#toDF) of a Dataset or access the dataset-operators.md#rdd[RDD] that is generated after executing the query. It is supposed to give you a more pleasant experience while transitioning from the legacy RDD-based or DataFrame-based APIs you may have used in the earlier versions of Spark SQL or encourage migrating from Spark Core's RDD API to Spark SQL's Dataset API.
+You can request the ["untyped" view](dataset/index.md#toDF) of a Dataset or access the dataset/index.md#rdd[RDD] that is generated after executing the query. It is supposed to give you a more pleasant experience while transitioning from the legacy RDD-based or DataFrame-based APIs you may have used in the earlier versions of Spark SQL or encourage migrating from Spark Core's RDD API to Spark SQL's Dataset API.
 
 The default storage level for `Datasets` is spark-rdd-caching.md[MEMORY_AND_DISK] because recomputing the in-memory columnar representation of the underlying table is expensive. You can however [persist a `Dataset`](caching-and-persistence.md#persist).
 
 NOTE: Spark 2.0 has introduced a new query model called spark-structured-streaming.md[Structured Streaming] for continuous incremental execution of structured queries. That made possible to consider Datasets a static and bounded as well as streaming and unbounded data sets with a single unified API for different execution models.
 
-A `Dataset` is dataset-operators.md#isLocal[local] if it was created from local collections using SparkSession.md#emptyDataset[SparkSession.emptyDataset] or SparkSession.md#createDataset[SparkSession.createDataset] methods and their derivatives like <<toDF,toDF>>. If so, the queries on the Dataset can be optimized and run locally, i.e. without using Spark executors.
+A `Dataset` is dataset/index.md#isLocal[local] if it was created from local collections using SparkSession.md#emptyDataset[SparkSession.emptyDataset] or SparkSession.md#createDataset[SparkSession.createDataset] methods and their derivatives like <<toDF,toDF>>. If so, the queries on the Dataset can be optimized and run locally, i.e. without using Spark executors.
 
 NOTE: `Dataset` makes sure that the underlying `QueryExecution` is [analyzed](QueryExecution.md#analyzed) and CheckAnalysis.md#checkAnalysis[checked].
 
@@ -531,7 +510,7 @@ Used when:
 
 * `Dataset` is <<apply, created>> (for a logical plan in a given `SparkSession`)
 
-* dataset-operators.md#dataset-operators.md[Dataset.toLocalIterator] operator is used (to create a Java `Iterator` of objects of type `T`)
+* dataset/index.md#dataset/index.md[Dataset.toLocalIterator] operator is used (to create a Java `Iterator` of objects of type `T`)
 
 * `Dataset` is requested to <<collectFromPlan, collect all rows from a spark plan>>
 
@@ -712,7 +691,7 @@ collectFromPlan(plan: SparkPlan): Array[T]
 
 `collectFromPlan`...FIXME
 
-NOTE: `collectFromPlan` is used for dataset-operators.md#head[Dataset.head], dataset-operators.md#collect[Dataset.collect] and dataset-operators.md#collectAsList[Dataset.collectAsList] operators.
+NOTE: `collectFromPlan` is used for dataset/index.md#head[Dataset.head], dataset/index.md#collect[Dataset.collect] and dataset/index.md#collectAsList[Dataset.collectAsList] operators.
 
 === [[selectUntyped]] `selectUntyped` Internal Method
 
@@ -723,7 +702,7 @@ selectUntyped(columns: TypedColumn[_, _]*): Dataset[_]
 
 `selectUntyped`...FIXME
 
-NOTE: `selectUntyped` is used exclusively when <<dataset-typed-transformations.md#select, Dataset.select>> typed transformation is used.
+NOTE: `selectUntyped` is used exclusively when <<typed-transformations.md#select, Dataset.select>> typed transformation is used.
 
 === [[sortInternal]] `sortInternal` Internal Method
 
@@ -752,7 +731,7 @@ Internally, `sortInternal` firstly builds ordering expressions for the given `so
 
 In the end, `sortInternal` <<withTypedPlan, creates a Dataset>> with <<Sort.md#, Sort>> unary logical operator (with the ordering expressions, the given `global` flag, and the <<logicalPlan, logicalPlan>> as the <<Sort.md#child, child logical plan>>).
 
-NOTE: `sortInternal` is used for the <<dataset-operators.md#sort, sort>> and <<dataset-operators.md#sortWithinPartitions, sortWithinPartitions>> typed transformations in the Dataset API (with the only change of the `global` flag being enabled and disabled, respectively).
+NOTE: `sortInternal` is used for the <<dataset/index.md#sort, sort>> and <<dataset/index.md#sortWithinPartitions, sortWithinPartitions>> typed transformations in the Dataset API (with the only change of the `global` flag being enabled and disabled, respectively).
 
 === [[withPlan]] Helper Method for Untyped Transformations and Basic Actions -- `withPlan` Internal Method
 
@@ -765,7 +744,7 @@ withPlan(logicalPlan: LogicalPlan): DataFrame
 
 NOTE: `withPlan` is annotated with Scala's https://www.scala-lang.org/api/current/scala/inline.html[@inline] annotation that requests the Scala compiler to try especially hard to inline it.
 
-`withPlan` is used in [untyped transformations](dataset-untyped-transformations.md)
+`withPlan` is used in [untyped transformations](dataset/untyped-transformations.md)
 
 === [[i-want-more]] Further Reading and Watching
 
 
@@ -11,7 +11,7 @@
 
 `Encoder` is also called _"a container of serde expressions in Dataset"_.
 
-`Encoder` is a part of [Dataset](Dataset.md)s (to serialize and deserialize the records of this dataset).
+`Encoder` is a part of [Dataset](dataset/index.md)s (to serialize and deserialize the records of this dataset).
 
 `Encoder` knows the [schema](#schema) of the records and that is how they offer significantly faster serialization and deserialization (comparing to the default Java or Kryo serializers).