Skip to content

Commit

Permalink
Update demo project to use OmegaConfigLoader (#1590)
Browse files Browse the repository at this point in the history
* Add missing dependency

Signed-off-by: Alain Anghelidi <alainanghelidi@gmail.com>

* Replace deprecated command in README

Signed-off-by: Alain Anghelidi <alainanghelidi@gmail.com>

* Bump kedro init version

Signed-off-by: Alain Anghelidi <alainanghelidi@gmail.com>

* Use OmegaConfigLoader in project settings

Signed-off-by: Alain Anghelidi <alainanghelidi@gmail.com>

* Remove unneeded CONFIG_LOADER_ARGS

Signed-off-by: Alain Anghelidi <alainanghelidi@gmail.com>

* Update configuration to use OmegaConfigLoader

Signed-off-by: Alain Anghelidi <alainanghelidi@gmail.com>

* Fix parameters of modeling pipeline

Since the version 1.1 of scikit learn the default
value of the parameter max_features of RandomForestRegressor
have been changed from 'auto' to 1.0. Support for the old
'auto' value used have been removed.

This commit fix this issue.

Signed-off-by: Alain Anghelidi <alainanghelidi@gmail.com>

* Fix Pandas SettingWithCopyWarning by using loc

Signed-off-by: Alain Anghelidi <alainanghelidi@gmail.com>

---------

Signed-off-by: Alain Anghelidi <alainanghelidi@gmail.com>
  • Loading branch information
aanghelidi authored Oct 30, 2023
1 parent dca4581 commit d25db88
Show file tree
Hide file tree
Showing 19 changed files with 42 additions and 76 deletions.
4 changes: 2 additions & 2 deletions demo-project/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This project is designed to be a realistic example of what Kedro looks like when

## Setup

1. Run `pip install kedro==0.18.4`
2. Run `kedro install --build-reqs`
1. Run `pip install kedro~=0.18.0`
2. Run `pip install -r src/demo_project/requirements.in`
3. Run `kedro run`
4. Run `kedro viz`
12 changes: 5 additions & 7 deletions demo-project/conf/base/catalog_01_raw.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
companies:
type: pandas.CSVDataSet
filepath: ${base_location}/01_raw/companies.csv
filepath: ${_base_location}/01_raw/companies.csv
metadata:
kedro-viz:
layer: raw
Expand All @@ -9,21 +9,19 @@ companies:

reviews:
type: pandas.CSVDataSet
filepath: ${base_location}/01_raw/reviews.csv
filepath: ${_base_location}/01_raw/reviews.csv
metadata:
kedro-viz:
layer: raw
preview_args:
preview_args:
nrows: 10


shuttles:
type: pandas.ExcelDataSet
filepath: ${base_location}/01_raw/shuttles.xlsx
filepath: ${_base_location}/01_raw/shuttles.xlsx
metadata:
kedro-viz:
layer: raw
preview_args:
nrows: 15


nrows: 15
8 changes: 4 additions & 4 deletions demo-project/conf/base/catalog_02_int.yml
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
ingestion.int_typed_companies:
type: pandas.ParquetDataSet
filepath: ${base_location}/02_intermediate/typed_companies.pq
filepath: ${_base_location}/02_intermediate/typed_companies.pq
metadata:
kedro-viz:
layer: intermediate

ingestion.int_typed_shuttles@pandas1:
type: pandas.ParquetDataSet
filepath: ${base_location}/02_intermediate/typed_shuttles.pq
filepath: ${_base_location}/02_intermediate/typed_shuttles.pq
metadata:
kedro-viz:
layer: intermediate

ingestion.int_typed_shuttles@pandas2:
type: pandas.ParquetDataSet
filepath: ${base_location}/02_intermediate/typed_shuttles.pq
filepath: ${_base_location}/02_intermediate/typed_shuttles.pq
metadata:
kedro-viz:
layer: intermediate

ingestion.int_typed_reviews:
type: pandas.ParquetDataSet
filepath: ${base_location}/02_intermediate/typed_reviews.pq
filepath: ${_base_location}/02_intermediate/typed_reviews.pq
metadata:
kedro-viz:
layer: intermediate
5 changes: 2 additions & 3 deletions demo-project/conf/base/catalog_03_prm.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
prm_shuttle_company_reviews:
type: pandas.ParquetDataSet
filepath: ${base_location}/03_primary/prm_shuttle_company_reviews.pq
filepath: ${_base_location}/03_primary/prm_shuttle_company_reviews.pq
metadata:
kedro-viz:
layer: primary

prm_spine_table:
type: pandas.ParquetDataSet
filepath: ${base_location}/03_primary/prm_spine_table.pq
filepath: ${_base_location}/03_primary/prm_spine_table.pq
metadata:
kedro-viz:
layer: primary

40 changes: 8 additions & 32 deletions demo-project/conf/base/catalog_04_feature.yml
Original file line number Diff line number Diff line change
@@ -1,36 +1,12 @@
# Jinja is super powerful, but does come at the cost of readability
# Set your IDE to Jinja YAML to ensure this is highlighted correctly
# Use dataset factories to reduce duplication
"feature_engineering.feat_{metric_type}_metrics":
type: pandas.ParquetDataSet
filepath: ${_base_location}/04_feature/feat_{metric_type}_metrics.pq
layer: feature

{% set namespace = 'feature_engineering' %}
{% set metric_types = ['weighting', 'scaling'] %}
{% for metric_type in metric_types %}
{{ namespace }}.feat_{{ metric_type }}_metrics:
type: pandas.ParquetDataSet
filepath: ${base_location}/04_feature/feat_{{ metric_type }}_metrics.pq
metadata:
kedro-viz:
layer: feature

{% endfor %}

# This will render to generate the records below...
#
# feature_engineering.feat_weighting_metrics:
# type: pandas.ParquetDataSet
# filepath: ${base_location}/04_feature/feat_weighting_metrics.pq
# layer: feature
#
# feature_engineering.feat_scaling_metrics:
# type: pandas.ParquetDataSet
# filepath: ${base_location}/04_feature/feat_scaling_metrics.pq
# layer: feature


feature_importance_output:
feature_importance_output:
type: pandas.CSVDataSet
filepath: ${base_location}/04_feature/feature_importance_output.csv
filepath: ${_base_location}/04_feature/feature_importance_output.csv
metadata:
kedro-viz:
layer: feature


layer: feature
3 changes: 1 addition & 2 deletions demo-project/conf/base/catalog_05_model_input.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
model_input_table:
type: pandas.ParquetDataSet
filepath: ${base_location}/05_model_input/model_input_table.pq
filepath: ${_base_location}/05_model_input/model_input_table.pq
metadata:
kedro-viz:
layer: model_input

4 changes: 2 additions & 2 deletions demo-project/conf/base/catalog_06_models.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
train_evaluation.linear_regression.regressor:
type: pickle.PickleDataSet
filepath: ${base_location}/06_models/linear_regression.pkl
filepath: ${_base_location}/06_models/linear_regression.pkl
versioned: True

train_evaluation.random_forest.regressor:
type: pickle.PickleDataSet
filepath: ${base_location}/06_models/random_forest.pkl
filepath: ${_base_location}/06_models/random_forest.pkl
versioned: True
11 changes: 5 additions & 6 deletions demo-project/conf/base/catalog_08_reporting.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
reporting.cancellation_policy_breakdown:
type: plotly.PlotlyDataSet # Constructed via plotly_args below
filepath: ${base_location}/08_reporting/cancellation_breakdown.json
filepath: ${_base_location}/08_reporting/cancellation_breakdown.json
metadata:
kedro-viz:
layer: reporting
Expand All @@ -16,26 +16,25 @@ reporting.cancellation_policy_breakdown:

reporting.price_histogram:
type: plotly.JSONDataSet # Constructed via Python API
filepath: ${base_location}/08_reporting/price_histogram.json
filepath: ${_base_location}/08_reporting/price_histogram.json
metadata:
kedro-viz:
layer: reporting
versioned: true

reporting.feature_importance:
type: plotly.JSONDataSet # Constructed via Python API
filepath: ${base_location}/08_reporting/feature_importance_plot.json
filepath: ${_base_location}/08_reporting/feature_importance_plot.json
metadata:
kedro-viz:
layer: reporting
versioned: true

reporting.cancellation_policy_grid:
type: demo_project.extras.datasets.image_dataset.ImageDataSet
filepath: ${base_location}/08_reporting/cancellation_policy_grid.png
filepath: ${_base_location}/08_reporting/cancellation_policy_grid.png

reporting.confusion_matrix:
type: matplotlib.MatplotlibWriter
filepath: ${base_location}/08_reporting/confusion_matrix.png
filepath: ${_base_location}/08_reporting/confusion_matrix.png
versioned: true

8 changes: 4 additions & 4 deletions demo-project/conf/base/catalog_09_tracking.yml
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
train_evaluation.linear_regression.r2_score:
type: tracking.MetricsDataSet
filepath: ${base_location}/09_tracking/linear_score.json
filepath: ${_base_location}/09_tracking/linear_score.json
versioned: True

train_evaluation.random_forest.r2_score:
type: tracking.MetricsDataSet
filepath: ${base_location}/09_tracking/rf_score.json
filepath: ${_base_location}/09_tracking/rf_score.json
versioned: True

train_evaluation.linear_regression.experiment_params:
type: tracking.JSONDataSet
filepath: ${base_location}/09_tracking/linear_params.json
filepath: ${_base_location}/09_tracking/linear_params.json
versioned: True

train_evaluation.random_forest.experiment_params:
type: tracking.JSONDataSet
filepath: ${base_location}/09_tracking/rf_params.json
filepath: ${_base_location}/09_tracking/rf_params.json
versioned: True
1 change: 1 addition & 0 deletions demo-project/conf/base/catalog_globals.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
_base_location: data/
1 change: 0 additions & 1 deletion demo-project/conf/base/globals.yml

This file was deleted.

5 changes: 0 additions & 5 deletions demo-project/conf/base/parameters/feature_engineering.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,3 @@
# This is a boilerplate parameters config generated for pipeline 'feature_engineering'
# using Kedro 0.18.1.
#
# Documentation for this file format can be found in "Parameters"
# Link: https://kedro.readthedocs.io/en/0.18.1/kedro_project_setup/configuration.html#parameters
feature_engineering:
feature:
static:
Expand Down
2 changes: 1 addition & 1 deletion demo-project/conf/base/parameters/modelling.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ train_evaluation:
min_samples_split: 2
min_samples_leaf: 1
min_weight_fraction_leaf: 0
max_features: 'auto'
max_features: 1.0
min_impurity_decrease: 0
bootstrap: True
oob_score: False
Expand Down
1 change: 1 addition & 0 deletions demo-project/conf/prod/catalog_globals.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
_base_location: s3://my_bucket/production/
1 change: 0 additions & 1 deletion demo-project/conf/prod/globals.yml

This file was deleted.

2 changes: 1 addition & 1 deletion demo-project/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[tool.kedro]
package_name = "demo_project"
project_name = "modular-spaceflights"
project_version = "0.18.4"
kedro_init_version = "0.18.14"

[tool.isort]
multi_line_output = 3
Expand Down
2 changes: 1 addition & 1 deletion demo-project/src/demo_project/pipelines/reporting/nodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ def make_price_histogram(model_input_data: pd.DataFrame) -> go.Figure:
Returns:
BaseFigure: Plotly object which is serialised as JSON for rendering
"""
price_data_df = model_input_data[["price", "engine_type"]]
price_data_df = model_input_data.loc[:, ["price", "engine_type"]]
p = np.random.dirichlet([1, 1, 1])
price_data_df["engine_type"] = np.random.choice(
["Quantum", "Plasma", "Nuclear"], len(price_data_df), p=p
Expand Down
1 change: 1 addition & 0 deletions demo-project/src/demo_project/requirements.in
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ wheel>=0.35, <0.37
pillow~=9.0
matplotlib==3.5.0
pre-commit~=1.17
seaborn~=0.11.2
7 changes: 3 additions & 4 deletions demo-project/src/demo_project/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
SESSION_STORE_CLASS = SQLiteStore
SESSION_STORE_ARGS = {"path": str(Path(__file__).parents[2] / "data")}

#Setup for collaborative experiment tracking.
# Setup for collaborative experiment tracking.
# SESSION_STORE_ARGS = {"path": str(Path(__file__).parents[2] / "data"),
# "remote_path": "s3://{path-to-session_store}" }

Expand All @@ -21,7 +21,6 @@
# Define the configuration folder. Defaults to `conf`
# CONF_ROOT = "conf"

from kedro.config import TemplatedConfigLoader # NOQA
from kedro.config import OmegaConfigLoader # NOQA

CONFIG_LOADER_CLASS = TemplatedConfigLoader
CONFIG_LOADER_ARGS = {"globals_pattern": "*globals.yml", "globals_dict": {}}
CONFIG_LOADER_CLASS = OmegaConfigLoader

0 comments on commit d25db88

Please sign in to comment.