Kedro starters are a useful way to create a new project that contains code to run as-is, or to adapt and extend.
A team may find it useful to build Kedro starters to create reusable projects that bootstrap a common base and can be extended.
Please note that users are expected to have Git
installed for the kedro new
flow, which is used in this section.
A Kedro starter is a Cookiecutter template that contains the boilerplate code for a Kedro project. First install cookiecutter
as follows:
pip install cookiecutter
To create a Kedro starter, you need a base project to convert to a template, which forms the boilerplate for all projects that use it. You then need to decide which are:
- the common, boilerplate parts of the project
- the configurable elements, which need to be replaced by
cookiecutter
strings
When you create a new project using a Kedro starter, kedro new
prompts you for a project name. This variable (project_name
) is set in the default starter setup inprompts.yml
.
Kedro automatically generates the following two variables from the entered project_name
:
repo_name
- A name for the directory that holds the project repositorypython_package
- A Python package name for the project package (see Python package naming conventions)
As a starter creator, you can customise the prompts triggered by kedro new
by adding your own prompts into the prompts.yml
file in the root of your template. This is an example of a custom prompt:
custom_prompt:
title: "Prompt title"
text: |
Prompt description that explains to the user what
information they should provide.
At the very least, the prompt title
must be defined for the prompt to be valid. After Kedro receives the user's input for each prompt, it passes the value to Cookiecutter, so every key in prompts.yml
must have a corresponding key in cookiecutter.json
.
If the input to the prompts needs to be validated, for example to make sure it only has alphanumeric characters, you can add regex validation rules via the regex_validator
key. Consider using cookiecutter pre/post-generate hooks for more complex validation.
If you want cookiecutter
to provide sensible default values, in case a user doesn't provide any input, you can add those to cookiecutter.json
. See the default starter cookiecutter.json
as example.
To review an example Kedro starter, check out the spaceflights-pandas
starter on GitHub.
When a new spaceflights-pandas
project is created with kedro new --starter=spaceflights-pandas
, the user is asked to enter a project_name
variable, which is then used to generate the repo_name
and python_package
variables by default.
If you use a configuration file, you must supply all three variables in the file. You can see how these variables are used by inspecting the template:
The human-readable project_name
variable is used in the README.md for the new project.
The top-level folder labelled {{ cookiecutter.repo_name }}
, which forms the top-level folder to contain the starter project when it is created.
Within the parent folder, inside the src
subfolder, is another configurable variable {{ cookiecutter.python_package }} which contains the source code for the example pipelines. The variable is also used within __main__.py
.
Here is the layout of the project as a Cookiecutter template:
{{ cookiecutter.repo_name }} # Parent directory of the template
├── conf # Project configuration files
├── data # Local project data (not committed to version control)
├── docs # Project documentation
├── notebooks # Project related Jupyter notebooks (can be used for experimental code before moving the code to src)
├── pyproject.toml #
├── README.md # Project README
├── requirements.txt
└── src # Project source code
└── {{ cookiecutter.python_package }}
├── __init.py__
├── pipelines
├── pipeline_registry.py
├── __main__.py
└── settings.py
└── tests
You can add an alias by creating a plugin using kedro.starters
entry point which enables you to call kedro new --starter=your_starters
. That is, it can be used directly through the starter
argument in kedro new
rather than needing to explicitly provide the template
and directory
arguments.
A custom starter alias behaves in the same way as an official Kedro starter alias and is also picked up by the command kedro starter list
.
You need to extend the starters by providing a list of KedroStarterSpec
, in this example it is defined in a file called plugin.py
.
Example for a non-git repository starter:
# plugin.py
starters = [
KedroStarterSpec(
alias="test_plugin_starter",
template_path="your_local_directory/starter_folder",
)
]
Example for a git repository starter:
# plugin.py
starters = [
KedroStarterSpec(
alias="test_plugin_starter",
template_path="https://github.com/kedro-org/kedro-starters/",
directory="spaceflights-pandas",
)
]
The directory
argument is optional and should be used when you have multiple templates in one repository as for the official kedro-starters. If you only have one template, your top-level directory will be treated as the template.
In your pyproject.toml
, you need to register the specifications to kedro.starters
:
[project.entry-points."kedro.starters"]
starter = "plugin:starters"
After that you can use this starter with kedro new --starter=test_plugin_starter
.
If your starter is stored on a git repository, Kedro defaults to use a tag or branch labelled with your version of Kedro, e.g. `0.18.12`. This means that you can host different versions of your starter template on the same repository, and the correct one will be used automatically. If you prefer not to follow this structure, you should override it with the `checkout` flag, e.g. `kedro new --starter=test_plugin_starter --checkout=main`.