Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-2970] Improve micropackaging CLI experience #1224

Merged
merged 41 commits into from
Feb 17, 2022
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
90788b1
micropkg implemented
SajidAlamQB Feb 8, 2022
698b4aa
adding micropkg to cli
SajidAlamQB Feb 8, 2022
45d8fb5
lint
SajidAlamQB Feb 8, 2022
48e5d82
test
SajidAlamQB Feb 8, 2022
fb7b5d9
lint
SajidAlamQB Feb 8, 2022
d0cea36
fix tests
SajidAlamQB Feb 9, 2022
7c7c6ee
Update test_cli.py
SajidAlamQB Feb 9, 2022
5b42a2d
Update test_cli.py
SajidAlamQB Feb 9, 2022
bd9da9a
Merge branch 'main' into Add-micropackaging-command-group#ked-2970
SajidAlamQB Feb 9, 2022
2a8c848
revert change to test_cli
SajidAlamQB Feb 9, 2022
7481c5c
added tests for micropkg
SajidAlamQB Feb 9, 2022
07a57cc
Merge branch 'main' into Add-micropackaging-command-group#ked-2970
SajidAlamQB Feb 10, 2022
ea017d2
Create test_micropkg_requirements.py
SajidAlamQB Feb 10, 2022
7071e99
Create test_micropkg.py
SajidAlamQB Feb 10, 2022
2653ed9
adding coverage tests for micropkg
SajidAlamQB Feb 10, 2022
66a380a
Merge branch 'main' into Add-micropackaging-command-group#ked-2970
SajidAlamQB Feb 10, 2022
a582d05
try adding copy pipline_tests
SajidAlamQB Feb 10, 2022
b623b61
Revert "try adding copy pipline_tests"
SajidAlamQB Feb 10, 2022
df68d0d
removing sync_dirs from micropkg
SajidAlamQB Feb 10, 2022
d0437cd
import applicable methods from pipeline cli
SajidAlamQB Feb 10, 2022
e811c03
Update RELEASE.md
SajidAlamQB Feb 10, 2022
fe32040
update user facing help messages
SajidAlamQB Feb 10, 2022
24a12c1
changes based on review
SajidAlamQB Feb 11, 2022
3147674
Merge branch 'main' into Add-micropackaging-command-group#ked-2970
SajidAlamQB Feb 11, 2022
d2d1e43
update tests
SajidAlamQB Feb 11, 2022
ee20ed4
Merge branch 'main' into Add-micropackaging-command-group#ked-2970
SajidAlamQB Feb 14, 2022
af30aeb
Update RELEASE.md
SajidAlamQB Feb 14, 2022
34f0bdd
Update test_micropkg.py
SajidAlamQB Feb 14, 2022
49ea6a2
set micropkg package to dist
SajidAlamQB Feb 14, 2022
46e1565
Merge branch 'main' into Add-micropackaging-command-group#ked-2970
SajidAlamQB Feb 14, 2022
3a890eb
lint
SajidAlamQB Feb 14, 2022
1903cfb
remove empty test
SajidAlamQB Feb 15, 2022
50720ad
changes based on review
SajidAlamQB Feb 15, 2022
b18c011
Merge branch 'main' into Add-micropackaging-command-group#ked-2970
SajidAlamQB Feb 16, 2022
58a752a
Merge branch 'main' into Add-micropackaging-command-group#ked-2970
SajidAlamQB Feb 17, 2022
bf0755c
testing CI fix
SajidAlamQB Feb 17, 2022
51d999e
Revert "testing CI fix"
SajidAlamQB Feb 17, 2022
e6b7c69
Clear old modules in fake_project_cli
antonymilne Feb 17, 2022
2372f63
Revert "Clear old modules in fake_project_cli"
antonymilne Feb 17, 2022
a262749
Clear only modules that start with PACKAGE_NAME
antonymilne Feb 17, 2022
e279748
Tidy and comment
antonymilne Feb 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Major features and improvements
* `pipeline` now accepts `tags` and a collection of `Node`s and/or `Pipeline`s rather than just a single `Pipeline` object. `pipeline` should be used in preference to `Pipeline` when creating a Kedro pipeline.
* `pandas.SQLTableDataSet` and `pandas.SQLQueryDataSet` now only open one connection per database, at instantiation time (therefore at catalog creation time), rather than one per load/save operation.
* Added new command group, `micropkg`, to replace `kedro pipeline pull` and `kedro pipeline package` with `kedro micropkg pull` and `kedro micropkg package` for Kedro 0.18.0. `kedro micropkg package` saves packages to `project/dist` while `kedro pipeline package` saves packages to `project/src/dist`.

## Bug fixes and other changes
* Added tutorial documentation for experiment tracking (`03_tutorial/07_set_up_experiment_tracking.md`).
Expand All @@ -16,6 +17,7 @@
## Minor breaking changes to the API

## Upcoming deprecations for Kedro 0.18.0
* `kedro pipeline pull` and `kedro pipeline package` will be deprecated. Please use `kedro micropkg` instead.

# Release 0.17.6

Expand Down
2 changes: 1 addition & 1 deletion docs/source/03_tutorial/02_tutorial_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Edit your `src/requirements.txt` file to include the following lines:
kedro[pandas.CSVDataSet, pandas.ExcelDataSet, pandas.ParquetDataSet]==0.17.6 # Specify optional Kedro dependencies
kedro-viz~=4.0 # Visualise your pipelines
openpyxl>=3.0.6, <4.0 # Use modern Excel engine (will not be required in 0.18.0)
scikit-learn~=1.0 # For modelling in the data science pipeline
scikit-learn~=1.0 # For modelling in the data science pipeline
```

To install all the project-specific dependencies, navigate to the root directory of the project and run:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/03_tutorial/04_create_pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@ You should see output similar to the following:

### Visualise the pipeline

Kedro-Viz at this point will render a visualisation of a very simple, but valid, pipeline. To show the visualisation, run:
Kedro-Viz at this point will render a visualisation of a very simple, but valid, pipeline. To show the visualisation, run:

```bash
kedro viz
Expand Down
10 changes: 9 additions & 1 deletion kedro/framework/cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
from kedro.framework.cli.catalog import catalog_cli
from kedro.framework.cli.hooks import get_cli_hook_manager
from kedro.framework.cli.jupyter import jupyter_cli
from kedro.framework.cli.micropkg import micropkg_cli
from kedro.framework.cli.pipeline import pipeline_cli
from kedro.framework.cli.project import project_group
from kedro.framework.cli.registry import registry_cli
Expand Down Expand Up @@ -206,7 +207,14 @@ def project_groups(self) -> Sequence[click.MultiCommand]:
if not self._metadata:
return []

built_in = [catalog_cli, jupyter_cli, pipeline_cli, project_group, registry_cli]
built_in = [
catalog_cli,
jupyter_cli,
pipeline_cli,
micropkg_cli,
project_group,
registry_cli,
]

plugins = load_entry_points("project")

Expand Down
293 changes: 293 additions & 0 deletions kedro/framework/cli/micropkg.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,293 @@
"""A collection of CLI commands for working with Kedro micro-packages."""

import re
import sys
import tempfile
from pathlib import Path

import click

from kedro.framework.cli.pipeline import (
_append_package_reqs,
_assert_pkg_name_ok,
_check_module_path,
_check_pipeline_name,
_find_config_files,
_generate_wheel_file,
_get_default_version,
_get_pipeline_artifacts,
_install_files,
_unpack_wheel,
_validate_dir,
)
from kedro.framework.cli.utils import (
KedroCliError,
_clean_pycache,
_get_requirements_in,
command_with_verbosity,
env_option,
)
from kedro.framework.startup import ProjectMetadata

_SETUP_PY_TEMPLATE = """# -*- coding: utf-8 -*-
from setuptools import setup, find_packages

setup(
name="{name}",
version="{version}",
description="Modular pipeline `{name}`",
packages=find_packages(),
include_package_data=True,
package_data={package_data},
install_requires={install_requires},
)
"""


# pylint: disable=missing-function-docstring
@click.group(name="Kedro")
def micropkg_cli(): # pragma: no cover
pass


@micropkg_cli.group()
def micropkg():
"""Commands for working with micro-packages."""


@command_with_verbosity(micropkg, "pull")
@click.argument("package_path", nargs=1, required=False)
@click.option(
"--all",
"-a",
"all_flag",
is_flag=True,
help="Pull and unpack all micro-packages in the `pyproject.toml` package manifest section.",
)
@env_option(
help="Environment to install the micro-package configuration to. Defaults to `base`."
)
@click.option(
"--alias",
type=str,
default="",
callback=_check_pipeline_name,
help="Alternative name to unpackage under.",
)
@click.option(
"--fs-args",
type=click.Path(
exists=True, file_okay=True, dir_okay=False, readable=True, resolve_path=True
),
default=None,
help="Location of a configuration file for the fsspec filesystem used to pull the package.",
)
@click.pass_obj # this will pass the metadata as first argument
def pull_package( # pylint:disable=unused-argument, too-many-arguments
metadata: ProjectMetadata, package_path, env, alias, fs_args, all_flag, **kwargs
) -> None:
"""Pull and unpack a modular pipeline and other micro-packages in your project."""
if not package_path and not all_flag:
click.secho(
"Please specify a package path or add '--all' to pull all micro-packages in the "
"`pyproject.toml` package manifest section."
)
sys.exit(1)

if all_flag:
_pull_packages_from_manifest(metadata)
return

_pull_package(package_path, metadata, env=env, alias=alias, fs_args=fs_args)
as_alias = f" as `{alias}`" if alias else ""
message = f"Micro-package {package_path} pulled and unpacked{as_alias}!"
click.secho(message, fg="green")


def _pull_package(
package_path: str,
metadata: ProjectMetadata,
env: str = None,
alias: str = None,
fs_args: str = None,
):
with tempfile.TemporaryDirectory() as temp_dir:
temp_dir_path = Path(temp_dir).resolve()

_unpack_wheel(package_path, temp_dir_path, fs_args)

dist_info_file = list(temp_dir_path.glob("*.dist-info"))
if len(dist_info_file) != 1:
raise KedroCliError(
f"More than 1 or no dist-info files found from {package_path}. "
f"There has to be exactly one dist-info directory."
)
# Extract package name, based on the naming convention for wheel files
# https://www.python.org/dev/peps/pep-0427/#file-name-convention
package_name = dist_info_file[0].stem.split("-")[0]
package_metadata = dist_info_file[0] / "METADATA"

req_pattern = r"Requires-Dist: (.*?)\n"
package_reqs = re.findall(req_pattern, package_metadata.read_text())
if package_reqs:
requirements_in = _get_requirements_in(
metadata.source_dir, create_empty=True
)
_append_package_reqs(requirements_in, package_reqs, package_name)

_clean_pycache(temp_dir_path)
_install_files(metadata, package_name, temp_dir_path, env, alias)


def _pull_packages_from_manifest(metadata: ProjectMetadata) -> None:
# pylint: disable=import-outside-toplevel
import anyconfig # for performance reasons

config_dict = anyconfig.load(metadata.config_file)
config_dict = config_dict["tool"]["kedro"]
build_specs = config_dict.get("micropkg", {}).get("pull")

if not build_specs:
click.secho(
"Nothing to pull. Please update the `pyproject.toml` package manifest section.",
fg="yellow",
)
return

for package_path, specs in build_specs.items():
if "alias" in specs:
_assert_pkg_name_ok(specs["alias"])
_pull_package(package_path, metadata, **specs)
click.secho(f"Pulled and unpacked `{package_path}`!")

click.secho("Micro-packages pulled and unpacked!", fg="green")


def _package_pipelines_from_manifest(metadata: ProjectMetadata) -> None:
# pylint: disable=import-outside-toplevel
import anyconfig # for performance reasons

config_dict = anyconfig.load(metadata.config_file)
config_dict = config_dict["tool"]["kedro"]
build_specs = config_dict.get("micropkg", {}).get("package")

if not build_specs:
click.secho(
"Nothing to package. Please update the `pyproject.toml` package manifest section.",
fg="yellow",
)
return

for pipeline_name, specs in build_specs.items():
if "alias" in specs:
_assert_pkg_name_ok(specs["alias"])
_package_pipeline(pipeline_name, metadata, **specs)
click.secho(f"Packaged `{pipeline_name}` micro-package!")

click.secho("Micro-packages packaged!", fg="green")


@micropkg.command("package")
@env_option(
help="Environment where the micro-package configuration lives. Defaults to `base`."
)
@click.option(
"--alias",
type=str,
default="",
callback=_check_pipeline_name,
help="Alternative name to package under.",
)
@click.option(
"-d",
"--destination",
type=click.Path(resolve_path=True, file_okay=False),
help="Location where to create the wheel file. Defaults to `dist/`.",
)
@click.option(
"-v",
"--version",
type=str,
help="Version to package under. "
"Defaults to micro-package package version or, "
"if that is not defined, the project package version.",
)
@click.option(
"--all",
"-a",
"all_flag",
is_flag=True,
help="Package all micro-packages in the `pyproject.toml` package manifest section.",
)
@click.argument("name", nargs=1, required=False, callback=_check_module_path)
@click.pass_obj # this will pass the metadata as first argument
def package_pipeline(
metadata: ProjectMetadata, name, env, alias, destination, version, all_flag
): # pylint: disable=too-many-arguments
"""Package up a modular pipeline or micro-package as a Python .whl."""
if not name and not all_flag:
click.secho(
"Please specify a micro-package name or add '--all' to package all micro-packages in "
"the `pyproject.toml` package manifest section."
)
sys.exit(1)

if all_flag:
_package_pipelines_from_manifest(metadata)
return

result_path = _package_pipeline(
name, metadata, alias=alias, destination=destination, env=env, version=version
)

as_alias = f" as `{alias}`" if alias else ""
message = f"Micro-package `{name}` packaged{as_alias}! Location: {result_path}"
click.secho(message, fg="green")


def _package_pipeline( # pylint: disable=too-many-arguments
pipeline_name: str,
metadata: ProjectMetadata,
alias: str = None,
destination: str = None,
env: str = None,
version: str = None,
) -> Path:
package_dir = metadata.source_dir / metadata.package_name
env = env or "base"

artifacts_to_package = _get_pipeline_artifacts(
metadata, pipeline_name=pipeline_name, env=env
)
# as the wheel file will only contain parameters, we aren't listing other
# config files not to confuse users and avoid useless file copies
configs_to_package = _find_config_files(
artifacts_to_package.pipeline_conf,
[f"parameters*/**/{pipeline_name}.yml", f"parameters*/**/{pipeline_name}/**/*"],
)

source_paths = (
artifacts_to_package.pipeline_dir,
artifacts_to_package.pipeline_tests,
configs_to_package,
)

# Check that pipeline directory exists and not empty
_validate_dir(artifacts_to_package.pipeline_dir)

destination = Path(destination) if destination else metadata.project_path / "dist"
SajidAlamQB marked this conversation as resolved.
Show resolved Hide resolved
version = version or _get_default_version(metadata, pipeline_name)

_generate_wheel_file(
pipeline_name=pipeline_name,
destination=destination.resolve(),
source_paths=source_paths,
version=version,
metadata=metadata,
alias=alias,
)

_clean_pycache(package_dir)
_clean_pycache(metadata.project_path)

return destination
10 changes: 10 additions & 0 deletions kedro/framework/cli/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,11 @@ def pull_package( # pylint:disable=unused-argument, too-many-arguments
metadata: ProjectMetadata, package_path, env, alias, fs_args, all_flag, **kwargs
) -> None:
"""Pull and unpack a modular pipeline in your project."""
deprecation_message = (
"DeprecationWarning: Command `kedro pipeline pull` will be deprecated in Kedro 0.18.0. "
"In future please use `kedro micropkg pull` instead."
)
click.secho(deprecation_message, fg="red")
if not package_path and not all_flag:
click.secho(
"Please specify a package path or add '--all' to pull all pipelines in the "
Expand Down Expand Up @@ -418,6 +423,11 @@ def package_pipeline(
metadata: ProjectMetadata, name, env, alias, destination, version, all_flag
): # pylint: disable=too-many-arguments
"""Package up a modular pipeline as a Python .whl."""
deprecation_message = (
"DeprecationWarning: Command `kedro pipeline package` will be deprecated in Kedro 0.18.0. "
"In future please use `kedro micropkg package` instead."
)
click.secho(deprecation_message, fg="red")
if not name and not all_flag:
click.secho(
"Please specify a pipeline name or add '--all' to package all pipelines in "
Expand Down
2 changes: 2 additions & 0 deletions tests/framework/cli/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from kedro.framework.cli.catalog import catalog_cli
from kedro.framework.cli.cli import cli
from kedro.framework.cli.jupyter import jupyter_cli
from kedro.framework.cli.micropkg import micropkg_cli
from kedro.framework.cli.pipeline import pipeline_cli
from kedro.framework.cli.project import project_group
from kedro.framework.cli.registry import registry_cli
Expand Down Expand Up @@ -103,6 +104,7 @@ def fake_kedro_cli():
catalog_cli,
jupyter_cli,
pipeline_cli,
micropkg_cli,
project_group,
registry_cli,
],
Expand Down
Empty file.
Loading