Skip to content

Commit

Permalink
[AUTO-MERGE] Merge main into develop via merge-main-to-develop (#1195)
Browse files Browse the repository at this point in the history
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>

Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com>
Co-authored-by: Antony Milne <49395058+AntonyMilneQB@users.noreply.github.com>
Co-authored-by: Punitvara <punitvara@gmail.com>
Co-authored-by: Isaac <lazzeri89@gmail.com>
Co-authored-by: Georgios Gerogiokas <drago121012@gmail.com>
Co-authored-by: Ajinkya Bokade <acbokade@gmail.com>
Co-authored-by: Antony Milne <antony.milne@quantumblack.com>
  • Loading branch information
8 people authored Feb 2, 2022
1 parent 7388098 commit 49e1f06
Show file tree
Hide file tree
Showing 12 changed files with 67 additions and 13 deletions.
2 changes: 2 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,8 @@ jobs:
python_version:
type: string
executor: win/default
environment:
PIP_DISABLE_PIP_VERSION_CHECK: 1
steps:
- checkout
- win_setup_conda:
Expand Down
2 changes: 2 additions & 0 deletions .github/dco.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
require:
members: false
8 changes: 7 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ clean:
pre-commit clean || true

install-pip-setuptools:
python -m pip install -U "pip>=20.0" "setuptools>=38.0" wheel
python -m pip install -U "pip~=21.2" "setuptools>=38.0" wheel

lint:
pre-commit run -a --hook-stage manual $(hook)
Expand Down Expand Up @@ -55,3 +55,9 @@ uninstall-pre-commit:

print-python-env:
@./tools/print_env.sh

sign-off:
echo "git interpret-trailers --if-exists doNothing \c" >> .git/hooks/commit-msg
echo '--trailer "Signed-off-by: $$(git config user.name) <$$(git config user.email)>" \c' >> .git/hooks/commit-msg
echo '--in-place "$$1"' >> .git/hooks/commit-msg
chmod +x .git/hooks/commit-msg
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ The sign-off can be added automatically to your commit message using the `-s` op
git commit -s -m "This is my commit message"
```

To avoid needing to remember the `-s` flag on every commit, you might like to set up an [alias](https://git-scm.com/book/en/v2/Git-Basics-Git-Aliases) for `git commit -s`.
To avoid needing to remember the `-s` flag on every commit, you might like to set up an [alias](https://git-scm.com/book/en/v2/Git-Basics-Git-Aliases) for `git commit -s`. Alternatively, run `make sign-off` to setup a [`commit-msg` Git hook](https://git-scm.com/docs/githooks#_commit_msg) that automatically signs off all commits (including merge commits) you make while working on the Kedro repository.

If your PR is blocked due to unsigned commits then you will need to follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Expand Down
21 changes: 21 additions & 0 deletions docs/source/data/data_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,27 @@ dev_abs:
account_name: accountname
account_key: key
```
Example 16: Loading a CSV file stored in a remote location through SSH

```eval_rst
.. note:: This example requires [Paramiko](https://www.paramiko.org) to be installed (`pip install paramiko`).
```
```yaml
cool_dataset:
type: pandas.CSVDataSet
filepath: "sftp:///path/to/remote_cluster/cool_data.csv"
credentials: cluster_credentials
```
All parameters required to establish the SFTP connection can be defined through `fs_args` or in `credentials.yml` as follows:

```yaml
cluster_credentials:
username: my_username
host: host_address
port: 22
password: password
```
The list of all available parameters is given in the [Paramiko documentation](https://docs.paramiko.org/en/2.4/api/client.html#paramiko.client.SSHClient.connect).

## Creating a Data Catalog YAML configuration file via CLI

Expand Down
4 changes: 4 additions & 0 deletions docs/source/logging/logging.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ log.warning("Issue warning")
log.info("Send information")
```

```eval_rst
.. note:: The name of a logger corresponds to a key in the ``loggers`` section in ``logging.yml`` (e.g. ``kedro.io``). See `Python's logging documentation <https://docs.python.org/3/library/logging.html#logger-objects>`_ for more information.
```

## Logging for `anyconfig`

By default, [anyconfig](https://github.com/ssato/python-anyconfig) library that is used by `kedro` to read configuration files emits a log message with `INFO` level on every read. To reduce the amount of logs being sent for CLI calls, default project logging configuration in `conf/base/logging.yml` sets the level for `anyconfig` logger to `WARNING`.
Expand Down
8 changes: 3 additions & 5 deletions features/environment.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,8 @@ def _setup_context_with_venv(context, venv_dir):
# this is because exe resolution in subprocess doesn't respect a passed env
if os.name == "posix":
bin_dir = context.venv_dir / "bin"
path_sep = ":"
else:
bin_dir = context.venv_dir / "Scripts"
path_sep = ";"
context.bin_dir = bin_dir
context.pip = str(bin_dir / "pip")
context.python = str(bin_dir / "python")
Expand All @@ -63,11 +61,11 @@ def _setup_context_with_venv(context, venv_dir):

# clone the environment, remove any condas and venvs and insert our venv
context.env = os.environ.copy()
path = context.env["PATH"].split(path_sep)
path = context.env["PATH"].split(os.pathsep)
path = [p for p in path if not (Path(p).parent / "pyvenv.cfg").is_file()]
path = [p for p in path if not (Path(p).parent / "conda-meta").is_dir()]
path = [str(bin_dir)] + path
context.env["PATH"] = path_sep.join(path)
context.env["PATH"] = os.pathsep.join(path)

# Create an empty pip.conf file and point pip to it
pip_conf_path = context.venv_dir / "pip.conf"
Expand Down Expand Up @@ -107,7 +105,7 @@ def _setup_minimal_env(context):
"pip",
"install",
"-U",
"pip>=20.0",
"pip~=21.2",
"setuptools>=38.0",
"wheel",
],
Expand Down
2 changes: 1 addition & 1 deletion features/windows_reqs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# e2e tests on Windows are slow but we don't need to install
# everything, so just this subset will be enough for CI
behave==1.2.6
pandas~=1.2
pandas~=1.3
psutil==5.8.0
requests~=2.20
toml~=0.10.1
Expand Down
23 changes: 22 additions & 1 deletion kedro/extras/datasets/json/json_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,28 @@ class JSONDataSet(AbstractVersionedDataSet):
"""``JSONDataSet`` loads/saves data from/to a JSON file using an underlying
filesystem (e.g.: local, S3, GCS). It uses native json to handle the JSON file.
Example:
Example adding a catalog entry with
`YAML API <https://kedro.readthedocs.io/en/stable/05_data/\
01_data_catalog.html#using-the-data-catalog-with-the-yaml-api>`_:
.. code-block:: yaml
>>> json_dataset:
>>> type: json.JSONDataSet
>>> filepath: data/01_raw/location.json
>>> load_args:
>>> lines: True
>>>
>>> cars:
>>> type: json.JSONDataSet
>>> filepath: gcs://your_bucket/cars.json
>>> fs_args:
>>> project: my-project
>>> credentials: my_gcp_credentials
>>> load_args:
>>> lines: True
Example using Python API:
::
>>> from kedro.extras.datasets.json import JSONDataSet
Expand Down
2 changes: 1 addition & 1 deletion kedro/framework/startup.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ def _add_src_to_path(source_dir: Path, project_path: Path) -> None:

python_path = os.getenv("PYTHONPATH") or ""
if str(source_dir) not in python_path:
sep = ";" if python_path else ""
sep = os.pathsep if python_path else ""
os.environ["PYTHONPATH"] = f"{str(source_dir)}{sep}{python_path}"


Expand Down
4 changes: 2 additions & 2 deletions kedro/runner/parallel_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ def _validate_nodes(cls, nodes: Iterable[Node]):
f"In order to utilize multiprocessing you need to make sure all nodes "
f"are serializable, i.e. nodes should not include lambda "
f"functions, nested functions, closures, etc.\nIf you "
f"are using custom decorators ensure they are correctly using "
f"are using custom decorators ensure they are correctly decorated using "
f"functools.wraps()."
)

Expand Down Expand Up @@ -217,7 +217,7 @@ def _validate_catalog(cls, catalog: DataCatalog, pipeline: Pipeline):
f"need to make sure all data sets are serializable, i.e. data sets "
f"should not make use of lambda functions, nested functions, closures "
f"etc.\nIf you are using custom decorators ensure they are correctly "
f"using functools.wraps()."
f"decorated using functools.wraps()."
)

memory_datasets = []
Expand Down
2 changes: 1 addition & 1 deletion tools/circleci/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
pip>=20.0
pip~=21.2
setuptools>=38.0
twine~=3.0

0 comments on commit 49e1f06

Please sign in to comment.