feat: Add metadata attribute to datasets #189

AhdraMeraliQB · 2023-04-25T08:56:54Z

Description

This is connected to kedro-org/kedro#2537 which adds the metadata attribute to the datasets within kedro.io (MemoryDataSet, LambdaDataSet, PartitionedDataSet)

Also addresses some changes made in #184

Development notes

The metadata attribute is defined solely within self.metadata. Some datasets make use of a _describe method to return a dictionary of the dataset's attributes. I have not included the metadata in these methods as in some instances it would necessitate defining it twice, and I find the use redundant - however I would like to hear the reviewer's opinions on this matter.

Metadata is accessible through dataset_name.metadata.Depending on the implementation this is at times inconsistent within the dataset, but it remains consistent across all datasets.

These changes have been tested manually using both the Python api and through catalog.yml using hooks. The hook I used to access the dataset's metadata is as follows:

from kedro.framework.hooks import hook_impl
from typing import Any, Dict
from kedro.io import DataCatalog, MemoryDataSet

class MetadataHooks:
    @hook_impl
    def after_catalog_created(self,
        catalog: DataCatalog,
        conf_catalog: Dict[str, Any],
        conf_creds: Dict[str, Any],
        feed_dict: Dict[str, Any],
        save_version: str,
        load_versions: Dict[str, str],
    ):
        for k,v in catalog.datasets.__dict__.items():
            print(k + "metadata: \n" + str(v.metadata))

Checklist

Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added a description of this change in the relevant RELEASE.md file
Added tests to cover my changes

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

noklam · 2023-04-25T15:43:31Z

Just thinking top of my head, do we need to change this in all the dataset? How was layer being used before without being an argument in DataSet? I think it should be yes.

It would be also great to think about how metadata could be access via hooks, i.e. how will viz using this new field. after_caterlog_create hook I guess?

AhdraMeraliQB · 2023-04-25T15:54:10Z

How was layer being used before without being an argument in DataSet? I think it should be yes.

layer is currently consumed by the DataCatalog and the general consensus is that this is not ideal. Previously it was defined on every dataset as in this PR

It would be also great to think about how metadata could be access via hooks, i.e. how will viz using this new field. after_caterlog_create hook I guess?

metadata is accessible in the dummy hook implementation that I used to test the datasets (in PR description) , for the specifics on how viz (and other plugins) would consume and use this however, I am not sure

merelcht · 2023-04-26T16:17:10Z

kedro-datasets/kedro_datasets/api/api_dataset.py

@@ -76,6 +74,8 @@ def __init__(
            credentials: Allows specifying secrets in credentials.yml.
                Expected format is ``('login', 'password')`` if given as a tuple or list.
                An ``AuthBase`` instance can be provided for more complex cases.
+            metadata: Any arbitrary user metadata.


I think it would be good to mention here that Kedro doesn't do anything with this metadata, but that it can be consumed by plugins or directly by the user.

How was layer being used before without being an argument in DataSet? I think it should be yes.

layer is currently consumed by the DataCatalog and the general consensus is that this is not ideal. Previously it was defined on every dataset as in this PR

It would be also great to think about how metadata could be access via hooks, i.e. how will viz using this new field. after_caterlog_create hook I guess?

metadata is accessible in the dummy hook implementation that I used to test the datasets (in PR description) , for the specifics on how viz (and other plugins) would consume and use this however, I am not sure

Hey @AhdraMeraliQB -- i think this is very important because currently we get layer information when we context.catalog -- would the metadata information also be in that way?

We would need to change how it's consumed on kedro-viz so that instead of looking at catalog.layers we look at get_dataset(dataset_name).metadata["kedro-viz"]["layer"] (where get_dataset is defined in CatalogRepository).

merelcht · 2023-04-26T16:20:01Z

kedro-datasets/kedro_datasets/snowflake/snowpark_dataset.py

-class SnowparkTableDataSet(AbstractDataSet):
+class SnowparkTableDataSet(
+    AbstractDataSet
+):  # pylint:disable=too-many-instance-attributes


Maybe we should just ignore this on the repo level, it's not really a relevant check for the datasets.

noklam · 2023-04-26T16:44:06Z

@AhdraMeraliQB @merelcht Not sure if we have a ticket already, but we should have docs explaining how this metadata should be consume i.e. via the hook

The implemention in the description is hacky as we are accessing the internal dict in weird way, I think this is related to the discussion we have about the datasets and frozen datasets.

class MetadataHooks:
    @hook_impl
    def after_catalog_created(self,
        catalog: DataCatalog,
        conf_catalog: Dict[str, Any],
        conf_creds: Dict[str, Any],
        feed_dict: Dict[str, Any],
        save_version: str,
        load_versions: Dict[str, str],
    ):
        for k,v in catalog.datasets.__dict__.items():
            print(k + "metadata: \n" + str(v.metadata))

noklam · 2023-04-26T16:45:21Z

~~Changed my thought~~ Originally I think 3 is better but then it will always load the full catalog - in case of viz, you only want to load the data that is associated with the target pipeline. I just wrote down my reasoning here, I think 1 & 2 are equally bad but we don't have better option, so I am think with 1.

catalog.datasets.__dict__ as @AhdraMeraliQB did, which used private variable
catalog._data_sets - still using internal variable
conf_catalog - this doesn't use any internal variable, but it may contains the _ entries?

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

kedro-datasets/kedro_datasets/api/api_dataset.py

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

kedro-datasets/kedro_datasets/svmlight/svmlight_dataset.py

kedro-datasets/kedro_datasets/video/video_dataset.py

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

antonymilne · 2023-05-15T11:29:39Z

@noklam you're right that going through catalog.list() would only show those that are explicitly defined in catalog.yml and would not cover patterned ones. If you want all the datasets actually used in a pipeline then you should use pipelines.data_sets instead. Either way, catalog._get_dataset(dataset_name) will work for both explicitly defined and pattern-matched dataset.

antonymilne · 2023-05-15T11:37:42Z

As for the major change here: I think I may disagree @merelcht here and wonder whether we should actually use metadata rather than _metadata. It looks inconsistent with other attribute, but since metadata will by definition not get used anywhere in kedro we don't really need to mark it as protected. It's a dataset implementation attribute and not part of AbstractDataSet so we don't currently have any sort of interface associated with exposing _metadata. Hence there's currently no way for consumers (e.g. plugins) to use metadata through a public interface, which is not very encouraging for people who want to actually use it.

In theory we could have a @property for metadata to show the underlying _metadata attribute but I don't see any particular need for that when we could just make metadata public in the first place.

Curious what @idanov thinks here...

Aside from that, this and the other PR LGTM.

The metadata attribute is defined solely within self.metadata. Some datasets make use of a _describe method to return a dictionary of the dataset's attributes. I have not included the metadata in these methods as in some instances it would necessitate defining it twice, and I find the use redundant - however I would like to hear the reviewer's opinions on this matter.

Fine by me to not put it in _describe, but I don't understand what you mean by defining it twice here - can you give an example?

Metadata is accessible through dataset_name.metadata.Depending on the implementation this is at times inconsistent within the dataset, but it remains consistent across all datasets.

Also curious what you mean by this. Where is the inconsistency?

AhdraMeraliQB · 2023-05-16T16:14:43Z

@antonymilne

As for the major change here: I think I may disagree @merelcht here and wonder whether we should actually use metadata rather than _metadata. It looks inconsistent with other attribute, but since metadata will by definition not get used anywhere in kedro we don't really need to mark it as protected.

I did a little digging and it looks like the _preview function added by Viz is the same, in that it's not used by Kedro itself. For this reason, I'd argue to keep _metadata and maintain consistency.

However I also agree that that marking it as protected implies that accessing it is somewhat hacky. In this case I'd actually vouch for introducing a metadata property that accesses self._metadata.

Fine by me to not put it in _describe, but I don't understand what you mean by defining it twice here - can you give an example?

This really only came up in APIDataSet - the dataset has an instance variable _request_args which stores most of the arguments provided to the class. It's this instance variable that is then accessed by _describe(), so to include the metadata in _describe() it would need to be stored in two separate places - self._metadata and self._request_args.

Also curious what you mean by this. Where is the inconsistency?

Another example born from APIDataSet: it mostly doesn't make use of instance variables, instead having everything stored within the one variable self._request_args. This implementation ignores that and defines always metadata as an instance variable for every dataset, regardless of any variance in the dataset implementations.

antonymilne · 2023-05-17T08:36:18Z

I did a little digging and it looks like the _preview function added by Viz is the same, in that it's not used by Kedro itself. For this reason, I'd argue to keep _metadata and maintain consistency.

However I also agree that that marking it as protected implies that accessing it is somewhat hacky. In this case I'd actually vouch for introducing a metadata property that accesses self._metadata.

I don't think the comparison with _preview is quite accurate here for various reasons, but I do agree that it looks a little odd to have one public attribute when everything else is protected.

Having a metadata property is ok, but I think it should probably live in AbstractDataSet. It's then maybe a bit weird then that the _metadata attribute is defined in implementations but not the abstract dataset, but I guess it's consistent with how e.g. public AbstractDataSet.exists function wraps the implementation's _exists function (this type of template method pattern I'm also not a fan of, but that's for another time).

So yeah, if you and others think the property is a good idea then fine by me 👍 It does at least make metadata read-only after dataset instantiation, which is probably a good thing, and also provide a public interface for users.

Thanks for explaining about APIDataSet. I think what you've done here is good.

kedro-datasets/RELEASE.md

Co-authored-by: Antony Milne <49395058+antonymilne@users.noreply.github.com>

AhdraMeraliQB · 2023-05-17T14:31:52Z

I agree that the metadata property should live in the AbstractDataSet - I'll make those changes in #2537. I suppose I'll have to open a separate PR to add pass them through the datasets that can then be merged in after the changes on Kedro are released.

@merelcht do you have any thoughts on this?

merelcht · 2023-05-17T15:01:06Z

I agree that the metadata property should live in the AbstractDataSet - I'll make those changes in #2537. I suppose I'll have to open a separate PR to add pass them through the datasets that can then be merged in after the changes on Kedro are released.

@merelcht do you have any thoughts on this?

I think the comments and observations @antonymilne made are very valid. I hadn't properly thought it through. So I'd prefer to just make metadata public then and not add a property. If we add the property, it would be a breaking change to remove it again in future and we don't know yet how this metadata feature is going to be used so I'd prefer to not tie ourselves into this additional property.

AhdraMeraliQB · 2023-05-17T15:57:37Z

If we add the property, it would be a breaking change to remove it again in future and we don't know yet how this metadata feature is going to be used so I'd prefer to not tie ourselves into this additional property.

Wouldn't it be breaking either way?

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

noklam

The discussion has been mainly about how we should implement this new attributes. There are 3 options discussed in this PR and we go for 2 in this PR

self._metadata
self.metadata as a instance attribute
self.metadata (as a property)

If we want to change 2/3 in the future, it will be a breaking change since they are both public method. I do think this should goes into the Abstract Class, in future if a new dataset PR comes in and it doesn't have the metadata field, we will reject it. For this reason we should just define it explicit in the contract (abstract class).

seem to address the point about contract better, but it actually didn't enforce what we want. What we really want is enforcing metadata is part of the signature in the Class Constructor. I think the right way to do it is to enforce it in the abstract class constructor, i.e. super().__init__(xxx, xxx)

Follow up actions:

We should add example hooks for how to consume this new metadata. Viz will likely be the first consumer.
Remove of the layer attribute for 0.19.0 in favor of the metadata attribute

antonymilne · 2023-05-22T10:20:59Z

@noklam agree with everything you say here, but I believe that adding metadata to the AbstractDataSet constructor was difficult for some reason. I don't know the details (I think @merelcht does) but maybe comes down to the things discussed here and related issue about where AbstractDataSet lives: kedro-org/kedro#1076 (comment).

If we want to change 2/3 in the future, it will be a breaking change since they are both public method.

Agree, although a breaking change to a dataset is less awkward to deal with than to framework.

antonymilne

🌟

* Add metadata attribute to all datasets Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* Fix links on GitHub issue templates (#150) Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * add spark_stream_dataset.py Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Migrate most of `kedro-datasets` metadata to `pyproject.toml` (#161) * Include missing requirements files in sdist Fix gh-86. Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Migrate most project metadata to `pyproject.toml` See kedro-org/kedro#2334. Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Move requirements to `pyproject.toml` Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * restructure the strean dataset to align with the other spark dataset Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * adding README.md for specification Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Update kedro-datasets/kedro_datasets/spark/spark_stream_dataset.py Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * rename the dataset Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * resolve comments Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * fix format and pylint Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Update kedro-datasets/kedro_datasets/spark/README.md Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * add unit tests and SparkStreamingDataset in init.py Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * add unit tests Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * update test_save Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Upgrade Polars (#171) * Upgrade Polars Signed-off-by: Juan Luis Cano Rodríguez <hello@juanlu.space> * Update Polars to 0.17.x --------- Signed-off-by: Juan Luis Cano Rodríguez <hello@juanlu.space> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * if release is failed, it return exit code and fail the CI (#158) Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Migrate `kedro-airflow` to static metadata (#172) * Migrate kedro-airflow to static metadata See kedro-org/kedro#2334. Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add explicit PEP 518 build requirements for kedro-datasets Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Typos Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Remove dangling reference to requirements.txt Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add release notes Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Migrate `kedro-telemetry` to static metadata (#174) * Migrate kedro-telemetry to static metadata See kedro-org/kedro#2334. Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add release notes Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * ci: port lint, unit test, and e2e tests to Actions (#155) * Add unit test + lint test on GA * trigger GA - will revert Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Fix lint Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add end to end tests * Add cache key Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add cache action Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Rename workflow files Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Lint + add comment + default bash Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add windows test Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update workflow name + revert changes to READMEs Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add kedro-telemetry/RELEASE.md to trufflehog ignore Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add pytables to test_requirements remove from workflow Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Revert "Add pytables to test_requirements remove from workflow" This reverts commit 8203daa. * Separate pip freeze step Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Migrate `kedro-docker` to static metadata (#173) * Migrate kedro-docker to static metadata See kedro-org/kedro#2334. Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Address packaging warning Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Fix tests Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Actually install current plugin with dependencies Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add release notes Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Introdcuing .gitpod.yml to kedro-plugins (#185) Currently opening gitpod will installed a Python 3.11 which breaks everything because we don't support it set. This PR introduce a simple .gitpod.yml to get it started. Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * sync APIDataSet from kedro's `develop` (#184) * Update APIDataSet Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Sync ParquetDataSet Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Sync Test Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Linting Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Revert Unnecessary ParquetDataSet Changes Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Sync release notes Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> --------- Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * add spark_stream_dataset.py Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * restructure the strean dataset to align with the other spark dataset Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * adding README.md for specification Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Update kedro-datasets/kedro_datasets/spark/spark_stream_dataset.py Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * rename the dataset Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * resolve comments Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * fix format and pylint Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Update kedro-datasets/kedro_datasets/spark/README.md Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * add unit tests and SparkStreamingDataset in init.py Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * add unit tests Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * update test_save Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * lint Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * lint Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * lint Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * update test cases Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * add negative test Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * remove code snippets fpr testing Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * lint Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * update tests Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * update test and remove redundacy Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * linting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * refactor file format Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix read me file Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * docs: Add community contributions (#199) * Add community contributions Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Use newer link to docs Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * adding test for raise error Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * update test and remove redundacy Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * linting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * refactor file format Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix read me file Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * adding test for raise error Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix readme file Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix readme Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix conflicts Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix ci erors Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix lint issue Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * update class documentation Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * add additional test cases Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * add s3 read test cases Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * add s3 read test cases Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * add s3 read test case Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * test s3 read Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * remove redundant test cases Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix streaming dataset configurations Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * update streaming datasets doc Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * resolve comments re documentation Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * bugfix lint Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * update link Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * revert the changes on CI Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * test(docker): remove outdated logging-related step (#207) * fixkedro- docker e2e test Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * fix: add timeout to request to satisfy bandit lint --------- Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * ci: ensure plugin requirements get installed in CI (#208) * ci: install the plugin alongside test requirements * ci: install the plugin alongside test requirements * Update kedro-airflow.yml * Update kedro-datasets.yml * Update kedro-docker.yml * Update kedro-telemetry.yml * Update kedro-airflow.yml * Update kedro-datasets.yml * Update kedro-airflow.yml * Update kedro-docker.yml * Update kedro-telemetry.yml * ci(telemetry): update isort config to correct sort * Don't use profile ¯\_(ツ)_/¯ Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): remove empty `tool.black` section * chore(docker): remove empty `tool.black` section --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * ci: Migrate the release workflow from CircleCI to GitHub Actions (#203) * Create check-release.yml * change from test pypi to pypi * split into jobs and move version logic into script * update github actions output * lint * changes based on review * changes based on review * fix script to not append continuously * change pypi api token logic Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * build: Relax Kedro bound for `kedro-datasets` (#140) * Less strict pin on Kedro for datasets Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * ci: don't run checks on both `push`/`pull_request` (#192) * ci: don't run checks on both `push`/`pull_request` * ci: don't run checks on both `push`/`pull_request` * ci: don't run checks on both `push`/`pull_request` * ci: don't run checks on both `push`/`pull_request` Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * chore: delete extra space ending check-release.yml (#210) Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * ci: Create merge-gatekeeper.yml to make sure PR only merged when all tests checked. (#215) * Create merge-gatekeeper.yml * Update .github/workflows/merge-gatekeeper.yml --------- Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * ci: Remove the CircleCI setup (#209) * remove circleci setup files and utils * remove circleci configs in kedro-telemetry * remove redundant .github in kedro-telemetry * Delete continue_config.yml * Update check-release.yml * lint * increase timeout to 40 mins for docker e2e tests Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * feat: Dataset API add `save` method (#180) * [FEAT] add save method to APIDataset Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] create save_args parameter for api_dataset Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] add tests for socket + http errors Signed-off-by: <jmcdonnell@fieldbox.ai> Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] check save data is json Signed-off-by: <jmcdonnell@fieldbox.ai> Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] clean code Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] handle different data types Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] test coverage for exceptions Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] add examples in APIDataSet docstring Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * sync APIDataSet from kedro's `develop` (#184) * Update APIDataSet Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Sync ParquetDataSet Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Sync Test Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Linting Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Revert Unnecessary ParquetDataSet Changes Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Sync release notes Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> --------- Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] remove support for delete method Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] lint files Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] fix conflicts Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] remove fail save test Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] review suggestions Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] fix tests Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] reorder arguments Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> --------- Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> Signed-off-by: <jmcdonnell@fieldbox.ai> Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Co-authored-by: jmcdonnell <jmcdonnell@fieldbox.ai> Co-authored-by: Nok Lam Chan <mediumnok@gmail.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * ci: Automatically extract release notes for GitHub Releases (#212) * ci: Automatically extract release notes Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * fix lint Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Raise exceptions Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Lint Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Lint Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * feat: Add metadata attribute to datasets (#189) * Add metadata attribute to all datasets Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * feat: Add ManagedTableDataset for managed Delta Lake tables in Databricks (#206) * committing first version of UnityTableCatalog with unit tests. This datasets allows users to interface with Unity catalog tables in Databricks to both read and write. Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * renaming dataset Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * adding mlflow connectors Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * fixing mlflow imports Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * cleaned up mlflow for initial release Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * cleaned up mlflow references from setup.py for initial release Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * fixed deps in setup.py Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * adding comments before intiial PR Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * moved validation to dataclass Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * bug fix in type of partition column and cleanup Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * updated docstring for ManagedTableDataSet Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * added backticks to catalog Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * fixing regex to allow hyphens Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/test_requirements.txt Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * adding backticks to catalog Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Require pandas < 2.0 for compatibility with spark < 3.4 Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Replace use of walrus operator Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Add test coverage for validation methods Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Remove unused versioning functions Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Fix exception catching for invalid schema, add test for invalid schema Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Add pylint ignore Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Add tests/databricks to ignore for no-spark tests Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Nok Lam Chan <mediumnok@gmail.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Nok Lam Chan <mediumnok@gmail.com> * Remove spurious mlflow test dependency Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Add explicit check for database existence Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Remove character limit for table names Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Refactor validation steps in ManagedTable Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Remove spurious checks for table and schema name existence Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> --------- Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> Co-authored-by: Danny Farah <danny.farah@quantumblack.com> Co-authored-by: Danny Farah <danny_farah@mckinsey.com> Co-authored-by: Nok Lam Chan <mediumnok@gmail.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * docs: Update APIDataset docs and refactor (#217) * Update APIDataset docs and refactor * Acknowledge community contributor * Fix more broken doc Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Lint Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Fix release notes of upcoming kedro-datasets --------- Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * feat: Release `kedro-datasets` version `1.3.0` (#219) * Modify release version and RELEASE.md Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Add proper name for ManagedTableDataSet Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/RELEASE.md Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Revert lost semicolon for release 1.2.0 Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> --------- Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * docs: Fix APIDataSet docstring (#220) * Fix APIDataSet docstring Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add release notes Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Separate [docs] extras from [all] in kedro-datasets Fix gh-143. Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * Update kedro-datasets/tests/spark/test_spark_streaming_dataset.py Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * Update kedro-datasets/kedro_datasets/spark/spark_streaming_dataset.py Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * Update kedro-datasets/setup.py Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix linting issue Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> Signed-off-by: Juan Luis Cano Rodríguez <hello@juanlu.space> Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> Signed-off-by: <jmcdonnell@fieldbox.ai> Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> Co-authored-by: Juan Luis Cano Rodríguez <hello@juanlu.space> Co-authored-by: Tingting Wan <110382691+Tingting711@users.noreply.github.com> Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Co-authored-by: Nok Lam Chan <mediumnok@gmail.com> Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Co-authored-by: Tom Kurian <tom_kurian@mckinsey.com> Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Co-authored-by: McDonnellJoseph <90898184+McDonnellJoseph@users.noreply.github.com> Co-authored-by: jmcdonnell <jmcdonnell@fieldbox.ai> Co-authored-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com> Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Co-authored-by: Danny Farah <danny.farah@quantumblack.com> Co-authored-by: Danny Farah <danny_farah@mckinsey.com> Co-authored-by: kuriantom369 <116743025+kuriantom369@users.noreply.github.com>

Ahdra Merali added 3 commits April 24, 2023 11:15

Add metadata to APIDataSet

28e2aaf

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

Add metadata attribute to all datasets

4c73dbf

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

Lint

0d87307

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

AhdraMeraliQB requested review from rashidakanchwala and antonymilne April 25, 2023 08:57

Ahdra Merali added 2 commits April 25, 2023 15:10

Lint pt 2

f685935

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

Update RELEASE.md

3a90021

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

AhdraMeraliQB mentioned this pull request Apr 25, 2023

sync APIDataSet from kedro's develop #184

Merged

4 tasks

Merge branch 'main' into add-metadata-attribute

0f50f61

AhdraMeraliQB mentioned this pull request Apr 25, 2023

Add metadata attribute to kedro.io datasets kedro-org/kedro#2537

Merged

5 tasks

Lint pt 3

7bf3c9f

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

AhdraMeraliQB marked this pull request as ready for review April 25, 2023 14:42

Lint pt 4

a86a8f2

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

merelcht reviewed Apr 26, 2023

View reviewed changes

deepyaman mentioned this pull request Apr 26, 2023

ci: don't run checks on both push/pull_request #192

Merged

4 tasks

Ahdra Merali added 2 commits April 28, 2023 14:09

Adjust wording in docstring

a27dbbb

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

Lint pt n

fb913df

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

AhdraMeraliQB changed the title ~~Add metadata attribute to datasets~~ feat: Add metadata attribute to datasets Apr 28, 2023

AhdraMeraliQB requested a review from merelcht April 28, 2023 13:32

Fix bad paste

b8f5597

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

merelcht reviewed May 2, 2023

View reviewed changes

kedro-datasets/kedro_datasets/api/api_dataset.py Show resolved Hide resolved

Add underscore suggestion from reviews

c4e76d9

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

merelcht reviewed May 3, 2023

View reviewed changes

kedro-datasets/kedro_datasets/svmlight/svmlight_dataset.py Show resolved Hide resolved

kedro-datasets/kedro_datasets/video/video_dataset.py Outdated Show resolved Hide resolved

kedro-datasets/kedro_datasets/video/video_dataset.py Outdated Show resolved Hide resolved

Ahdra Merali added 2 commits May 8, 2023 14:08

Remove metadata from non-dataset classes

11f713b

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

Add SVMLightDataSet init docstring

760dc4c

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

antonymilne reviewed May 17, 2023

View reviewed changes

kedro-datasets/RELEASE.md Outdated Show resolved Hide resolved

Add more description to RELEASE.md

1a6962d

Co-authored-by: Antony Milne <49395058+antonymilne@users.noreply.github.com>

AhdraMeraliQB requested a review from antonymilne May 17, 2023 14:32

AhdraMeraliQB marked this pull request as draft May 17, 2023 14:54

Ahdra Merali and others added 2 commits May 18, 2023 23:07

Revert additon of underscore to metadata attribute

edfa0bb

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

Merge branch 'main' into add-metadata-attribute

1fe3420

AhdraMeraliQB marked this pull request as ready for review May 18, 2023 22:11

Ahdra Merali and others added 3 commits May 18, 2023 23:36

Fix lint

49e18ee

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

Merge branch 'main' into add-metadata-attribute

68c1138

Merge branch 'main' into add-metadata-attribute

2330bd8

noklam self-requested a review May 22, 2023 09:31

noklam approved these changes May 22, 2023

View reviewed changes

antonymilne approved these changes May 22, 2023

View reviewed changes

Merge branch 'main' into add-metadata-attribute

5a52727

merelcht enabled auto-merge (squash) May 22, 2023 14:33

merelcht merged commit de8b833 into main May 22, 2023

merelcht deleted the add-metadata-attribute branch May 22, 2023 14:43

This was referenced Jun 13, 2023

Document datasets' metadata attribute in catalog definition kedro-org/kedro#2676

Closed

Ignore pylint too-many-instance-attributes on repo level #235

Closed

astrojuanlu mentioned this pull request Jun 19, 2023

Move AbstractDataSet to Kedro-Plugins kedro-org/kedro#2409

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add metadata attribute to datasets #189

feat: Add metadata attribute to datasets #189

AhdraMeraliQB commented Apr 25, 2023 •

edited

Loading

noklam commented Apr 25, 2023

AhdraMeraliQB commented Apr 25, 2023 •

edited

Loading

merelcht Apr 26, 2023

rashidakanchwala May 3, 2023

antonymilne May 15, 2023

merelcht Apr 26, 2023

noklam commented Apr 26, 2023 •

edited

Loading

noklam commented Apr 26, 2023 •

edited

Loading

antonymilne commented May 15, 2023

antonymilne commented May 15, 2023 •

edited

Loading

AhdraMeraliQB commented May 16, 2023

antonymilne commented May 17, 2023

AhdraMeraliQB commented May 17, 2023 •

edited

Loading

merelcht commented May 17, 2023

AhdraMeraliQB commented May 17, 2023

noklam left a comment

antonymilne commented May 22, 2023

antonymilne left a comment

feat: Add metadata attribute to datasets #189

feat: Add metadata attribute to datasets #189

Conversation

AhdraMeraliQB commented Apr 25, 2023 • edited Loading

Description

Development notes

Checklist

noklam commented Apr 25, 2023

AhdraMeraliQB commented Apr 25, 2023 • edited Loading

merelcht Apr 26, 2023

Choose a reason for hiding this comment

rashidakanchwala May 3, 2023

Choose a reason for hiding this comment

antonymilne May 15, 2023

Choose a reason for hiding this comment

merelcht Apr 26, 2023

Choose a reason for hiding this comment

noklam commented Apr 26, 2023 • edited Loading

noklam commented Apr 26, 2023 • edited Loading

antonymilne commented May 15, 2023

antonymilne commented May 15, 2023 • edited Loading

AhdraMeraliQB commented May 16, 2023

antonymilne commented May 17, 2023

AhdraMeraliQB commented May 17, 2023 • edited Loading

merelcht commented May 17, 2023

AhdraMeraliQB commented May 17, 2023

noklam left a comment

Choose a reason for hiding this comment

antonymilne commented May 22, 2023

antonymilne left a comment

Choose a reason for hiding this comment

AhdraMeraliQB commented Apr 25, 2023 •

edited

Loading

AhdraMeraliQB commented Apr 25, 2023 •

edited

Loading

noklam commented Apr 26, 2023 •

edited

Loading

noklam commented Apr 26, 2023 •

edited

Loading

antonymilne commented May 15, 2023 •

edited

Loading

AhdraMeraliQB commented May 17, 2023 •

edited

Loading