Rename all mentions in docs of DataSet to Dataset (#3148)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com> Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
kedro-org · Oct 10, 2023 · bb61b17 · bb61b17
1 parent 2bc1cbc
commit bb61b17
Show file tree

Hide file tree

Showing 30 changed files with 293 additions and 294 deletions.
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -112,16 +112,15 @@
         "typing.Type",
         "typing.Set",
         "kedro.config.config.ConfigLoader",
-        "kedro.io.core.AbstractDataSet",
-        "kedro.io.core.AbstractVersionedDataSet",
-        "kedro.io.core.DataSetError",
+        "kedro.io.core.AbstractDataset",
+        "kedro.io.core.AbstractVersionedDataset",
+        "kedro.io.core.DatasetError",
         "kedro.io.core.Version",
         "kedro.io.data_catalog.DataCatalog",
-        "kedro.io.memory_dataset.MemoryDataSet",
-        "kedro.io.partitioned_dataset.PartitionedDataSet",
+        "kedro.io.memory_dataset.MemoryDataset",
+        "kedro.io.partitioned_dataset.PartitionedDataset",
         "kedro.pipeline.pipeline.Pipeline",
         "kedro.runner.runner.AbstractRunner",
-        "kedro.runner.parallel_runner._SharedMemoryDataSet",
         "kedro.runner.parallel_runner._SharedMemoryDataset",
         "kedro.framework.context.context.KedroContext",
         "kedro.framework.startup.ProjectMetadata",
@@ -136,7 +135,7 @@
         "CONF_SOURCE",
         "integer -- return number of occurrences of value",
         "integer -- return first index of value.",
-        "kedro_datasets.pandas.json_dataset.JSONDataSet",
+        "kedro_datasets.pandas.json_dataset.JSONDataset",
         "pluggy._manager.PluginManager",
         "PluginManager",
         "_DI",
@@ -165,7 +164,7 @@
         "ValueError",
         "BadConfigException",
         "MissingConfigException",
-        "DataSetError",
+        "DatasetError",
         "ImportError",
         "KedroCliError",
         "Exception",
@@ -347,16 +346,16 @@ def autolink_replacements(what: str) -> list[tuple[str, str, str]]:
     is a reStructuredText link to their documentation.
 
     For example, if the docstring reads:
-        This LambdaDataSet loads and saves ...
+        This LambdaDataset loads and saves ...
 
-    Then the word ``LambdaDataSet``, will be replaced by
-    :class:`~kedro.io.LambdaDataSet`
+    Then the word ``LambdaDataset``, will be replaced by
+    :class:`~kedro.io.LambdaDataset`
 
     Works for plural as well, e.g:
-        These ``LambdaDataSet``s load and save
+        These ``LambdaDataset``s load and save
 
     Will convert to:
-        These :class:`kedro.io.LambdaDataSet` load and save
+        These :class:`kedro.io.LambdaDataset` load and save
 
     Args:
         what: The objects to create replacement tuples for. Possible values

diff --git a/docs/source/configuration/advanced_configuration.md b/docs/source/configuration/advanced_configuration.md
@@ -60,8 +60,8 @@ bucket_name: "my_s3_bucket"
 key_prefix: "my/key/prefix/"
 
 datasets:
-    csv: "pandas.CSVDataSet"
-    spark: "spark.SparkDataSet"
+    csv: "pandas.CSVDataset"
+    spark: "spark.SparkDataset"
 
 folders:
     raw: "01_raw"
@@ -99,7 +99,7 @@ Alternatively, you can declare which values to fill in the template through a di
     "bucket_name": "another_bucket_name",
     "non_string_key": 10,
     "key_prefix": "my/key/prefix",
-    "datasets": {"csv": "pandas.CSVDataSet", "spark": "spark.SparkDataSet"},
+    "datasets": {"csv": "pandas.CSVDataset", "spark": "spark.SparkDataset"},
     "folders": {
         "raw": "01_raw",
         "int": "02_intermediate",
@@ -117,7 +117,7 @@ CONFIG_LOADER_ARGS = {
         "bucket_name": "another_bucket_name",
         "non_string_key": 10,
         "key_prefix": "my/key/prefix",
-        "datasets": {"csv": "pandas.CSVDataSet", "spark": "spark.SparkDataSet"},
+        "datasets": {"csv": "pandas.CSVDataset", "spark": "spark.SparkDataset"},
         "folders": {
             "raw": "01_raw",
             "int": "02_intermediate",
@@ -185,7 +185,7 @@ From version 0.17.0, `TemplatedConfigLoader` also supports the [Jinja2](https://
     type: MemoryDataset
 
 {{ speed }}-cars:
-    type: pandas.CSVDataSet
+    type: pandas.CSVDataset
     filepath: s3://${bucket_name}/{{ speed }}-cars.csv
     save_args:
         index: true
@@ -205,13 +205,13 @@ The output Python dictionary will look as follows:
 {
     "fast-trains": {"type": "MemoryDataset"},
     "fast-cars": {
-        "type": "pandas.CSVDataSet",
+        "type": "pandas.CSVDataset",
         "filepath": "s3://my_s3_bucket/fast-cars.csv",
         "save_args": {"index": True},
     },
     "slow-trains": {"type": "MemoryDataset"},
     "slow-cars": {
-        "type": "pandas.CSVDataSet",
+        "type": "pandas.CSVDataset",
         "filepath": "s3://my_s3_bucket/slow-cars.csv",
         "save_args": {"index": True},
     },
@@ -260,7 +260,7 @@ companies:
 and a file containing the template values called `catalog_globals.yml`:
 ```yaml
 _pandas:
-  type: pandas.CSVDataSet
+  type: pandas.CSVDataset
 ```
 
 Since both of the file names (`catalog.yml` and `catalog_globals.yml`) match the config pattern for catalogs, the `OmegaConfigLoader` will load the files and resolve the placeholders correctly.
@@ -279,7 +279,7 @@ Suppose you have global variables located in the file `conf/base/globals.yml`:
 ```yaml
 my_global_value: 45
 dataset_type:
-  csv: pandas.CSVDataSet
+  csv: pandas.CSVDataset
 ```
 You can access these global variables in your catalog or parameters config files with a `globals` resolver like this:
 `conf/base/parameters.yml`:
@@ -318,7 +318,7 @@ kedro run --params random=3
 You can also specify a default value to be used in case the runtime parameter is not specified with the `kedro run` command. Consider this catalog entry:
 ```yaml
 companies:
-  type: pandas.CSVDataSet
+  type: pandas.CSVDataset
   filepath: "${runtime_params:folder, 'data/01_raw'}/companies.csv"
 ```
 If the `folder` parameter is not passed through the CLI `--params` option with `kedro run`, the default value `'data/01_raw/'` is used for the `filepath`.
@@ -366,7 +366,7 @@ types to the catalog entry.
 
 ```yaml
 my_polars_dataset:
-  type: polars.CSVDataSet
+  type: polars.CSVDataset
   filepath: data/01_raw/my_dataset.csv
   load_args:
     dtypes:

diff --git a/docs/source/configuration/config_loader_migration.md b/docs/source/configuration/config_loader_migration.md
@@ -132,8 +132,8 @@ Suppose you are migrating a templated **catalog** file from using `TemplatedConf
 
 - datasets:
 + _datasets:
-    csv: "pandas.CSVDataSet"
-    spark: "spark.SparkDataSet"
+    csv: "pandas.CSVDataset"
+    spark: "spark.SparkDataset"
 
 ```
 
@@ -175,8 +175,8 @@ bucket_name: "my_s3_bucket"
 key_prefix: "my/key/prefix/"
 
 datasets:
-    csv: "pandas.CSVDataSet"
-    spark: "spark.SparkDataSet"
+    csv: "pandas.CSVDataset"
+    spark: "spark.SparkDataset"
 
 folders:
     raw: "01_raw"
@@ -218,11 +218,11 @@ If you take the example from [the `TemplatedConfigLoader` with Jinja2 documentat
 - {% for speed in ['fast', 'slow'] %}
 - {{ speed }}-trains:
 + "{speed}-trains":
-    type: MemoryDataSet
+    type: MemoryDataset
 
 - {{ speed }}-cars:
 + "{speed}-cars":
-    type: pandas.CSVDataSet
+    type: pandas.CSVDataset
 -    filepath: s3://${bucket_name}/{{ speed }}-cars.csv
 +    filepath: s3://${bucket_name}/{speed}-cars.csv
     save_args:

diff --git a/docs/source/data/advanced_data_catalog_usage.md b/docs/source/data/advanced_data_catalog_usage.md
@@ -11,29 +11,29 @@ In the following code, we use several pre-built data loaders documented in the [
 ```python
 from kedro.io import DataCatalog
 from kedro_datasets.pandas import (
-    CSVDataSet,
-    SQLTableDataSet,
-    SQLQueryDataSet,
-    ParquetDataSet,
+    CSVDataset,
+    SQLTableDataset,
+    SQLQueryDataset,
+    ParquetDataset,
 )
 
 io = DataCatalog(
     {
-        "bikes": CSVDataSet(filepath="../data/01_raw/bikes.csv"),
-        "cars": CSVDataSet(filepath="../data/01_raw/cars.csv", load_args=dict(sep=",")),
-        "cars_table": SQLTableDataSet(
+        "bikes": CSVDataset(filepath="../data/01_raw/bikes.csv"),
+        "cars": CSVDataset(filepath="../data/01_raw/cars.csv", load_args=dict(sep=",")),
+        "cars_table": SQLTableDataset(
             table_name="cars", credentials=dict(con="sqlite:///kedro.db")
         ),
-        "scooters_query": SQLQueryDataSet(
+        "scooters_query": SQLQueryDataset(
             sql="select * from cars where gear=4",
             credentials=dict(con="sqlite:///kedro.db"),
         ),
-        "ranked": ParquetDataSet(filepath="ranked.parquet"),
+        "ranked": ParquetDataset(filepath="ranked.parquet"),
     }
 )
 ```
 
-When using `SQLTableDataSet` or `SQLQueryDataSet` you must provide a `con` key containing [SQLAlchemy compatible](https://docs.sqlalchemy.org/en/13/core/engines.html#database-urls) database connection string. In the example above we pass it as part of `credentials` argument. Alternative to `credentials` is to put `con` into `load_args` and `save_args` (`SQLTableDataSet` only).
+When using `SQLTableDataset` or `SQLQueryDataset` you must provide a `con` key containing [SQLAlchemy compatible](https://docs.sqlalchemy.org/en/13/core/engines.html#database-urls) database connection string. In the example above we pass it as part of `credentials` argument. Alternative to `credentials` is to put `con` into `load_args` and `save_args` (`SQLTableDataset` only).
 
 ## How to view the available data sources
 
@@ -130,7 +130,7 @@ my_gcp_credentials:
 Your code will look as follows:
 
 ```python
-CSVDataSet(
+CSVDataset(
     filepath="s3://test_bucket/data/02_intermediate/company/motorbikes.csv",
     load_args=dict(sep=",", skiprows=5, skipfooter=1, na_values=["#NA", "NA"]),
     credentials=dict(key="token", secret="key"),
@@ -145,7 +145,7 @@ If you require programmatic control over load and save versions of a specific da
 
 ```python
 from kedro.io import DataCatalog, Version
-from kedro_datasets.pandas import CSVDataSet
+from kedro_datasets.pandas import CSVDataset
 import pandas as pd
 
 data1 = pd.DataFrame({"col1": [1, 2], "col2": [4, 5], "col3": [5, 6]})
@@ -155,7 +155,7 @@ version = Version(
     save=None,  # generate save version automatically on each save operation
 )
 
-test_dataset = CSVDataSet(
+test_dataset = CSVDataset(
     filepath="data/01_raw/test.csv", save_args={"index": False}, version=version
 )
 io = DataCatalog({"test_dataset": test_dataset})
@@ -179,7 +179,7 @@ version = Version(
     save="my_exact_version",  # save to exact version
 )
 
-test_dataset = CSVDataSet(
+test_dataset = CSVDataset(
     filepath="data/01_raw/test.csv", save_args={"index": False}, version=version
 )
 io = DataCatalog({"test_dataset": test_dataset})
@@ -212,7 +212,7 @@ version = Version(
     save="my_data_20230818.csv",  # save to exact version
 )
 
-test_dataset = CSVDataSet(
+test_dataset = CSVDataset(
     filepath="data/01_raw/test.csv", save_args={"index": False}, version=version
 )
 io = DataCatalog({"test_dataset": test_dataset})

diff --git a/docs/source/data/data_catalog.md b/docs/source/data/data_catalog.md
@@ -12,15 +12,15 @@ The example below registers two `csv` datasets, and an `xlsx` dataset. The minim
 
 ```yaml
 companies:
-  type: pandas.CSVDataSet
+  type: pandas.CSVDataset
   filepath: data/01_raw/companies.csv
 
 reviews:
-  type: pandas.CSVDataSet
+  type: pandas.CSVDataset
   filepath: data/01_raw/reviews.csv
 
 shuttles:
-  type: pandas.ExcelDataSet
+  type: pandas.ExcelDataset
   filepath: data/01_raw/shuttles.xlsx
   load_args:
     engine: openpyxl # Use modern Excel engine (the default since Kedro 0.18.0)
@@ -63,7 +63,7 @@ For example, to load or save a CSV on a local file system, using specified load/
 
 ```yaml
 cars:
-  type: pandas.CSVDataSet
+  type: pandas.CSVDataset
   filepath: data/01_raw/company/cars.csv
   load_args:
     sep: ','
@@ -116,7 +116,7 @@ and the Data Catalog is specified in `catalog.yml` as follows:
 
 ```yaml
 motorbikes:
-  type: pandas.CSVDataSet
+  type: pandas.CSVDataset
   filepath: s3://your_bucket/data/02_intermediate/company/motorbikes.csv
   credentials: dev_s3
   load_args:
@@ -132,7 +132,7 @@ Kedro enables dataset and ML model versioning through the `versioned` definition
 
 ```yaml
 cars:
-  type: pandas.CSVDataSet
+  type: pandas.CSVDataset
   filepath: data/01_raw/company/cars.csv
   versioned: True
 ```
@@ -148,7 +148,7 @@ where `--load-version` is dataset name and version timestamp separated by `:`.
 
 A dataset offers versioning support if it extends the [`AbstractVersionedDataset`](/kedro.io.AbstractVersionedDataset) class to accept a version keyword argument as part of the constructor and adapt the `_save` and `_load` method to use the versioned data path obtained from `_get_save_path` and `_get_load_path` respectively.
 
-To verify whether a dataset can undergo versioning, you should examine the dataset class code to inspect its inheritance [(you can find contributed datasets within the `kedro-datasets` repository)](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets/kedro_datasets). Check if the dataset class inherits from the `AbstractVersionedDataset`. For instance, if you encounter a class like `CSVDataSet(AbstractVersionedDataset[pd.DataFrame, pd.DataFrame])`, this indicates that the dataset is set up to support versioning.
+To verify whether a dataset can undergo versioning, you should examine the dataset class code to inspect its inheritance [(you can find contributed datasets within the `kedro-datasets` repository)](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets/kedro_datasets). Check if the dataset class inherits from the `AbstractVersionedDataset`. For instance, if you encounter a class like `CSVDataset(AbstractVersionedDataset[pd.DataFrame, pd.DataFrame])`, this indicates that the dataset is set up to support versioning.
 
 ```{note}
 Note that HTTP(S) is a supported file system in the dataset implementations, but if you it, you can't also use versioning.
@@ -166,12 +166,12 @@ To illustrate this, consider the following catalog entry for a dataset named `ca
 ```yaml
 cars:
   filepath: s3://my_bucket/cars.csv
-  type: pandas.CSVDataSet
+  type: pandas.CSVDataset
  ```
 You can overwrite this catalog entry in `conf/local/catalog.yml` to point to a locally stored file instead:
 ```yaml
 cars:
   filepath: data/01_raw/cars.csv
-  type: pandas.CSVDataSet
+  type: pandas.CSVDataset
 ```
 In your pipeline code, when the `cars` dataset is used, it will use the overwritten catalog entry from `conf/local/catalog.yml` and rely on Kedro to detect which definition of `cars` dataset to use in your pipeline.