Skip to content

Commit

Permalink
Enhance Documentation for Dataset Previews (#2074)
Browse files Browse the repository at this point in the history
* update preview example

Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>

* add further examples

Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>

* changes based on review

Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>

---------

Signed-off-by: Sajid Alam <sajid_alam@mckinsey.com>
  • Loading branch information
SajidAlamQB committed Sep 10, 2024
1 parent cc1a119 commit d3fc51f
Show file tree
Hide file tree
Showing 3 changed files with 62 additions and 16 deletions.
Binary file added docs/source/images/preview_datasets_json.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
76 changes: 60 additions & 16 deletions docs/source/preview_custom_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,20 @@ PlotlyPreview = NewType("PlotlyPreview", dict)
JSONPreview = NewType("JSONPreview", dict)
```

Arbitrary arguments can be included in the `preview()` function, which can be later specified in the `catalog.yml` file.
## TablePreview
For `TablePreview`, the returned dictionary must contain the following keys:

`index`: A list of row indices.
`columns`: A list of column names.
`data`: A list of rows, where each row is itself a list of values.

Arbitrary arguments can be included in the `preview()` function, which can be later specified in the `catalog.yml` file. Ensure that these arguments (like `nrows`, `ncolumns`, and `filters`) match the structure of your dataset.

Below is an example demonstrating how to implement the `preview()` function with user-specified arguments for a `CustomDataset` class that utilizes `TablePreview` to enable previewing tabular data on Kedro-Viz:

```yaml
companies:
type: CustomDataset
type: CustomTableDataset
filepath: ${_base_location}/01_raw/companies.csv
metadata:
kedro-viz:
Expand All @@ -34,36 +41,73 @@ companies:

from kedro_datasets._typing import TablePreview

class CustomDataset:
class CustomTableDataset:
def preview(self, nrows, ncolumns, filters) -> TablePreview:
filtered_data = self.data
data = self.load()
for column, value in filters.items():
filtered_data = filtered_data[filtered_data[column] == value]
subset = filtered_data.iloc[:nrows, :ncolumns]
df_dict = {}
for column in subset.columns:
df_dict[column] = subset[column]
return df_dict

data = data[data[column] == value]
subset = data.iloc[:nrows, :ncolumns]
preview_data = {
'index': list(subset.index), # List of row indices
'columns': list(subset.columns), # List of column names
'data': subset.values.tolist() # List of rows, where each row is a list of values
}
return preview_data
```

![](./images/preview_datasets_expanded.png)

## Examples of Previews
## ImagePreview
For `ImagePreview`, the function should return a base64-encoded string representing the image. This is typically used for datasets that output visual data such as plots or images.

1. TablePreview
Below is an example implementation:

![](./images/preview_datasets_expanded.png)
```python

from kedro_datasets._typing import ImagePreview

2. ImagePreview
class CustomImageDataset:
def preview(self) -> ImagePreview:
image_path = self._get_image_path()
with open(image_path, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
return ImagePreview(encoded_string)
```

![](./images/pipeline_visualisation_matplotlib_expand.png)

## PlotlyPreview
For `PlotlyPreview`, the function should return a dictionary containing Plotly figure data. This includes the figure's `data` and `layout` keys.

Below is an example implementation:

3. PlotlyPreview
```python

from kedro_datasets._typing import PlotlyPreview

class CustomPlotlyDataset:
def preview(self) -> PlotlyPreview:
figure = self._load_plotly_figure()
return PlotlyPreview({
"data": figure["data"],
"layout": figure["layout"]
})
```

![](./images/pipeline_visualisation_plotly_expand_1.png)

## JSONPreview
For `JSONPreview`, the function should return a dictionary representing the `JSON` data. This is useful for previewing complex nested data structures.

Below is an example implementation:

```python

from kedro_datasets._typing import JSONPreview

class CustomJSONDataset:
def preview(self) -> JSONPreview:
json_data = self._load_json_data()
return JSONPreview(json.dumps(json_data))
```
![](./images/preview_datasets_json.png)
2 changes: 2 additions & 0 deletions docs/source/preview_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ companies:
preview: false
```
You can also disable previews globally through the settings menu on Kedro-Viz.
```{note}
Starting from Kedro-Viz 9.2.0, previews are disabled by default for the CLI commands `kedro viz deploy` and `kedro viz build`. You can control this behavior using the `--include-previews` flag with these commands. For `kedro viz run`, previews are enabled by default and can be controlled from the publish modal dialog, refer to the [Publish and share](./share_kedro_viz) for more instructions.
```

0 comments on commit d3fc51f

Please sign in to comment.