feat: Dataset API add `save` method #180

McDonnellJoseph · 2023-04-18T12:27:35Z

Description

We sometimes want to save some data with a REST Api (say with Django Rest Framework), most of the time issuing a POST request. So far, the APIDataSet is read only, with the _save method raising a DataSetError, we'd want to extend it to make a request.

Closes #166

Development notes

If applied, this commit will:

Enable save method for APIDataSet, by sending packets of size chunk_size, default is 100 to server. Default HTTP method is POST and returns a request.Response object.
Add, save_args dictionary to start conforming with other datasets.
Raise error when trying to save non-json data.
Raise error for save in case of HTTP error.

Checklist

Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added a description of this change in the relevant RELEASE.md file
Added tests to cover my changes

McDonnellJoseph · 2023-04-18T15:21:39Z

Hello, do you know if there is documentation that needs to be updated apart from the docstring of the class ?

astrojuanlu · 2023-04-18T16:44:35Z

Hello @McDonnellJoseph, thanks for this contribution! In this case "documentation" only refers to the docstrings, yes.

The CI is failing because some lines are not covered by tests:

kedro_datasets/api/api_dataset.py                          67      4    94%   181-185

McDonnellJoseph · 2023-04-19T07:37:30Z

Thanks for pointing this out, I just added full test coverage !

McDonnellJoseph · 2023-04-19T10:00:48Z

I haven't changed any code since last commit which passed CI.
The returned error is "ERROR: Could not build wheels for import-linter, which is required to install pyproject.toml-based projects"
but I don't see what change in my code to cause such error.

McDonnellJoseph · 2023-04-19T10:03:05Z

Otherwise I think this is ready for review

astrojuanlu · 2023-04-19T10:48:55Z

@McDonnellJoseph it's an intermittent issue with unknown root cause (see kedro-org/kedro#2570), I restarted it but I'm confident the tests will pass.

McDonnellJoseph · 2023-04-19T12:15:09Z

@astrojuanlu Yes that's what I thought but I don't think contributors can restart the tests. Anyway, I think the code is good for review now, got full test coverage + some better documentation for the new save feature and how it works.

noklam

Thank you for the PR, it looks pretty solid to me! Just thinking top of my head, does anyone ever use this for POST request? It's seem possible but I find it a bit weird. Maybe the right thing to do is GET for load and PUT DELETE POST for save?

kedro-datasets/kedro_datasets/api/api_dataset.py

noklam

I think it makes sense to only enable GET for load and POST PUT for save. In fact we have already refactor some of the arguments in #181 (It was done in kedro develop but we forgot to sync it).

So we should have load_args and save_args, which make the dataset more consistent with other datasets too.

kedro-datasets/kedro_datasets/api/api_dataset.py

noklam · 2023-04-20T20:19:22Z

Addition - I would suggest to do the refactoring after kedro-org/kedro#2570 is merged as it changes the argument a bit.

McDonnellJoseph · 2023-04-21T15:31:22Z

Addition - I would suggest to do the refactoring after kedro-org/kedro#2570 is merged as it changes the argument a bit.

Hello, maybe I misunderstood but don't you mean pr #184 ?

noklam · 2023-04-24T12:34:39Z

@McDonnellJoseph Sorry about that, you are right! #184 is merged now, please update this branch. There are quite are few changes because of the change in arguments. Let me know if you get stuck.

McDonnellJoseph · 2023-04-26T08:42:40Z

Hello again, I added your suggested modifications to the code and I also merged the recent modifications made to the APIDataSet with my work.
Looking forward to hearing from you

McDonnellJoseph · 2023-05-09T07:31:39Z

Hello, I'm still waiting for a review and would be happy to make a contribution to the project. Is something missing in the PR to get a review ?

merelcht

Thanks for the contribution @McDonnellJoseph ! I've left some comments. I'll be happy to approve when those are addressed 🙂

kedro-datasets/kedro_datasets/api/api_dataset.py

noklam

@McDonnellJoseph Thanks for persevering on this. I missed the Github notification, feel free to @me or find me on Slack if I haven't responded for a long time.

kedro-datasets/kedro_datasets/api/api_dataset.py

kedro-datasets/tests/api/test_api_dataset.py

McDonnellJoseph · 2023-05-11T12:24:00Z

Hello @noklam @merelcht thank you for your comments I have addressed them in the latest version. I spent a lot of time this morning trying to rebase my branch to sign an old commit and this has generated a new problem meaning I can't pass the pipeline. Is this an easy fix for you ?

kedro-datasets/tests/api/test_api_dataset.py

astrojuanlu · 2023-05-11T13:23:55Z

Looks like unsigned commit is 07eb097, which is already in main. @McDonnellJoseph let me have a quick locally at what's the right git spell to run to perform this rebase correctly and I'll post it here.

McDonnellJoseph · 2023-05-11T14:07:51Z

Looks like unsigned commit is 07eb097, which is already in main. @McDonnellJoseph let me have a quick locally at what's the right git spell to run to perform this rebase correctly and I'll post it here.

Thanks ! I got lost with the rebase which is the reason behind the large amount of commits for the PR

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

McDonnellJoseph · 2023-05-11T15:06:48Z

Git magic

The conflicts here are quire hairy. Initially I managed to at least linearize the history by doing

git rebase -i 3b42fae --rebase-merges

and using this rebase-todo:

label onto

# Branch main
reset onto
label main

reset onto
pick 5d374cd [FEAT] add save method to APIDataset
pick 09e627e [ENH] create save_args parameter for api_dataset
pick f6147a0 [ENH] add tests for socket + http errors
pick 9580c38 [ENH] check save data is json
pick 440d249 [FIX] clean code
pick 1ae242c [ENH] handle different data types
pick ec8af04 [FIX] test coverage for exceptions
pick cb51b97 [ENH] add examples in APIDataSet docstring
fixup aa7c1fc make git happy'
pick 07eb097 sync APIDataSet  from kedro's `develop` (#184)
pick 3977607 [FIX] remove support for delete method
fixup 7374b2d [FEAT] add save method to APIDataset
drop f90c20f [FEAT] add save method to APIDataset
pick 318b90e [FIX] lint files
drop a9246d1 [FIX] fix conflicts
drop cc5a368 [FEAT] add save method to APIDataset
drop 30dfc59 [FIX] remove support for delete method
drop 9654232 [FEAT] add save method to APIDataset
pick af518cf [FIX] fix conflicts  # remove _convert_type
drop 58d217c [FIX] lint files # empty
pick d99d28c [FIX] remove fail save test
pick dab7584 [ENH] review suggestions
merge -C cf25924 main # Merge branch 'main' into dataset-api-add_save_method
pick 2ade519 [ENH] fix tests

# Rebase 3b42fae..2ade519 onto 3b42fae (41 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup [-C | -c] <commit> = like "squash" but keep only the previous
#                    commit's log message, unless -C is used, in which case
#                    keep only this commit's message; -c is same as -C but
#                    opens the editor
# x, exec <command> = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
#         create a merge commit using the original merge commit's
#         message (or the oneline, if no original merge commit was
#         specified); use -c <commit> to reword the commit message
# u, update-ref <ref> = track a placeholder for the <ref> to be updated
#                       to this position in the new commits. The <ref> is
#                       updated at the end of the rebase
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#

After that, most of the repeated commits are dropped. However, doing a git rebase main or git merge main after that yields a lot of conflicts. I'm not sure if it's worth going through the hassle of reviewing them or if it's better to squash the result of the first interactive into one commit and just push that.

Git magic worked ! Thanks a lot, won't the commits be squashed once the branch is merged ?

astrojuanlu · 2023-05-11T15:10:52Z

👽 git magic is indeed miraculous!

Yes, the commits will be squashed on merge. But to achieve that, even with this nice linearized history, first you'll need to fix the current conflicts.

astrojuanlu · 2023-05-11T16:32:00Z

Well done @McDonnellJoseph, thanks for persevering! 🙌🏽 I think this is ready for another round of reviews.

noklam

Thank you, the PR is looking better now.

If the DCO fails, I suggested just leave it until everything is reviewed, as this is a little bit hard to review now since all previous review comments are gone.

In general it's good to have documentation consistent with other dataset, I would suggest to have a look at the popular one (i.e. CSVDataSet).

kedro-datasets/kedro_datasets/api/api_dataset.py

noklam

Thank you, the PR is looking better now.

If the DCO fails, I suggested just leave it until everything is reviewed, as this is a little bit hard to review now since all previous review comments are gone.

In general it's good to have documentation consistent with other dataset, I would suggest to have a look at the popular one (i.e. CSVDataSet).

McDonnellJoseph · 2023-05-12T14:38:59Z

Thank you, the PR is looking better now.

If the DCO fails, I suggested just leave it until everything is reviewed, as this is a little bit hard to review now since all previous review comments are gone.

In general it's good to have documentation consistent with other dataset, I would suggest to have a look at the popular one (i.e. CSVDataSet).

Do you think there is somewhere more appropriate in the file for the explanation paragraph on how the save method works ? Or should I just dump it altogether ? @noklam

noklam · 2023-05-19T09:31:00Z

I think it's fine to leave it within the docstring, @stichbury for comments

astrojuanlu

Well done @McDonnellJoseph!

kedro-datasets/kedro_datasets/api/api_dataset.py

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

astrojuanlu · 2023-05-22T10:17:44Z

Congrats @McDonnellJoseph and thank you for your contribution! 🙌🏽

* [FEAT] add save method to APIDataset Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] create save_args parameter for api_dataset Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] add tests for socket + http errors Signed-off-by: <jmcdonnell@fieldbox.ai> Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] check save data is json Signed-off-by: <jmcdonnell@fieldbox.ai> Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] clean code Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] handle different data types Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] test coverage for exceptions Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] add examples in APIDataSet docstring Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * sync APIDataSet from kedro's `develop` (kedro-org#184) * Update APIDataSet Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Sync ParquetDataSet Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Sync Test Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Linting Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Revert Unnecessary ParquetDataSet Changes Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Sync release notes Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> --------- Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] remove support for delete method Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] lint files Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] fix conflicts Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] remove fail save test Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] review suggestions Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] fix tests Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] reorder arguments Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> --------- Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> Signed-off-by: <jmcdonnell@fieldbox.ai> Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Co-authored-by: jmcdonnell <jmcdonnell@fieldbox.ai> Co-authored-by: Nok Lam Chan <mediumnok@gmail.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* Fix links on GitHub issue templates (#150) Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * add spark_stream_dataset.py Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Migrate most of `kedro-datasets` metadata to `pyproject.toml` (#161) * Include missing requirements files in sdist Fix gh-86. Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Migrate most project metadata to `pyproject.toml` See kedro-org/kedro#2334. Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Move requirements to `pyproject.toml` Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * restructure the strean dataset to align with the other spark dataset Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * adding README.md for specification Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Update kedro-datasets/kedro_datasets/spark/spark_stream_dataset.py Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * rename the dataset Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * resolve comments Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * fix format and pylint Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Update kedro-datasets/kedro_datasets/spark/README.md Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * add unit tests and SparkStreamingDataset in init.py Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * add unit tests Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * update test_save Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Upgrade Polars (#171) * Upgrade Polars Signed-off-by: Juan Luis Cano Rodríguez <hello@juanlu.space> * Update Polars to 0.17.x --------- Signed-off-by: Juan Luis Cano Rodríguez <hello@juanlu.space> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * if release is failed, it return exit code and fail the CI (#158) Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Migrate `kedro-airflow` to static metadata (#172) * Migrate kedro-airflow to static metadata See kedro-org/kedro#2334. Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add explicit PEP 518 build requirements for kedro-datasets Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Typos Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Remove dangling reference to requirements.txt Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add release notes Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Migrate `kedro-telemetry` to static metadata (#174) * Migrate kedro-telemetry to static metadata See kedro-org/kedro#2334. Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add release notes Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * ci: port lint, unit test, and e2e tests to Actions (#155) * Add unit test + lint test on GA * trigger GA - will revert Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Fix lint Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add end to end tests * Add cache key Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add cache action Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Rename workflow files Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Lint + add comment + default bash Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add windows test Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update workflow name + revert changes to READMEs Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add kedro-telemetry/RELEASE.md to trufflehog ignore Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add pytables to test_requirements remove from workflow Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Revert "Add pytables to test_requirements remove from workflow" This reverts commit 8203daa. * Separate pip freeze step Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Migrate `kedro-docker` to static metadata (#173) * Migrate kedro-docker to static metadata See kedro-org/kedro#2334. Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Address packaging warning Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Fix tests Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Actually install current plugin with dependencies Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add release notes Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Introdcuing .gitpod.yml to kedro-plugins (#185) Currently opening gitpod will installed a Python 3.11 which breaks everything because we don't support it set. This PR introduce a simple .gitpod.yml to get it started. Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * sync APIDataSet from kedro's `develop` (#184) * Update APIDataSet Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Sync ParquetDataSet Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Sync Test Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Linting Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Revert Unnecessary ParquetDataSet Changes Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Sync release notes Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> --------- Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * add spark_stream_dataset.py Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * restructure the strean dataset to align with the other spark dataset Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * adding README.md for specification Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Update kedro-datasets/kedro_datasets/spark/spark_stream_dataset.py Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * rename the dataset Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * resolve comments Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * fix format and pylint Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * Update kedro-datasets/kedro_datasets/spark/README.md Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * add unit tests and SparkStreamingDataset in init.py Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * add unit tests Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * update test_save Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * formatting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * lint Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * lint Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * lint Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * update test cases Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * add negative test Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * remove code snippets fpr testing Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * lint Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * update tests Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * update test and remove redundacy Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * linting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * refactor file format Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix read me file Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * docs: Add community contributions (#199) * Add community contributions Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Use newer link to docs Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * adding test for raise error Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * update test and remove redundacy Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * linting Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * refactor file format Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix read me file Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * adding test for raise error Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix readme file Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix readme Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix conflicts Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix ci erors Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix lint issue Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * update class documentation Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * add additional test cases Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * add s3 read test cases Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * add s3 read test cases Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * add s3 read test case Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * test s3 read Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * remove redundant test cases Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix streaming dataset configurations Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * update streaming datasets doc Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * resolve comments re documentation Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * bugfix lint Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * update link Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> * revert the changes on CI Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * test(docker): remove outdated logging-related step (#207) * fixkedro- docker e2e test Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * fix: add timeout to request to satisfy bandit lint --------- Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * ci: ensure plugin requirements get installed in CI (#208) * ci: install the plugin alongside test requirements * ci: install the plugin alongside test requirements * Update kedro-airflow.yml * Update kedro-datasets.yml * Update kedro-docker.yml * Update kedro-telemetry.yml * Update kedro-airflow.yml * Update kedro-datasets.yml * Update kedro-airflow.yml * Update kedro-docker.yml * Update kedro-telemetry.yml * ci(telemetry): update isort config to correct sort * Don't use profile ¯\_(ツ)_/¯ Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): remove empty `tool.black` section * chore(docker): remove empty `tool.black` section --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * ci: Migrate the release workflow from CircleCI to GitHub Actions (#203) * Create check-release.yml * change from test pypi to pypi * split into jobs and move version logic into script * update github actions output * lint * changes based on review * changes based on review * fix script to not append continuously * change pypi api token logic Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * build: Relax Kedro bound for `kedro-datasets` (#140) * Less strict pin on Kedro for datasets Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * ci: don't run checks on both `push`/`pull_request` (#192) * ci: don't run checks on both `push`/`pull_request` * ci: don't run checks on both `push`/`pull_request` * ci: don't run checks on both `push`/`pull_request` * ci: don't run checks on both `push`/`pull_request` Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * chore: delete extra space ending check-release.yml (#210) Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * ci: Create merge-gatekeeper.yml to make sure PR only merged when all tests checked. (#215) * Create merge-gatekeeper.yml * Update .github/workflows/merge-gatekeeper.yml --------- Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * ci: Remove the CircleCI setup (#209) * remove circleci setup files and utils * remove circleci configs in kedro-telemetry * remove redundant .github in kedro-telemetry * Delete continue_config.yml * Update check-release.yml * lint * increase timeout to 40 mins for docker e2e tests Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * feat: Dataset API add `save` method (#180) * [FEAT] add save method to APIDataset Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] create save_args parameter for api_dataset Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] add tests for socket + http errors Signed-off-by: <jmcdonnell@fieldbox.ai> Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] check save data is json Signed-off-by: <jmcdonnell@fieldbox.ai> Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] clean code Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] handle different data types Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] test coverage for exceptions Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] add examples in APIDataSet docstring Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * sync APIDataSet from kedro's `develop` (#184) * Update APIDataSet Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Sync ParquetDataSet Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Sync Test Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Linting Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Revert Unnecessary ParquetDataSet Changes Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Sync release notes Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> --------- Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] remove support for delete method Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] lint files Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] fix conflicts Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] remove fail save test Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] review suggestions Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [ENH] fix tests Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> * [FIX] reorder arguments Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> --------- Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> Signed-off-by: <jmcdonnell@fieldbox.ai> Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Co-authored-by: jmcdonnell <jmcdonnell@fieldbox.ai> Co-authored-by: Nok Lam Chan <mediumnok@gmail.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * ci: Automatically extract release notes for GitHub Releases (#212) * ci: Automatically extract release notes Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * fix lint Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Raise exceptions Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Lint Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Lint Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * feat: Add metadata attribute to datasets (#189) * Add metadata attribute to all datasets Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * feat: Add ManagedTableDataset for managed Delta Lake tables in Databricks (#206) * committing first version of UnityTableCatalog with unit tests. This datasets allows users to interface with Unity catalog tables in Databricks to both read and write. Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * renaming dataset Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * adding mlflow connectors Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * fixing mlflow imports Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * cleaned up mlflow for initial release Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * cleaned up mlflow references from setup.py for initial release Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * fixed deps in setup.py Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * adding comments before intiial PR Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * moved validation to dataclass Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * bug fix in type of partition column and cleanup Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * updated docstring for ManagedTableDataSet Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * added backticks to catalog Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * fixing regex to allow hyphens Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/test_requirements.txt Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * adding backticks to catalog Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Require pandas < 2.0 for compatibility with spark < 3.4 Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Replace use of walrus operator Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Add test coverage for validation methods Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Remove unused versioning functions Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Fix exception catching for invalid schema, add test for invalid schema Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Add pylint ignore Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Add tests/databricks to ignore for no-spark tests Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Nok Lam Chan <mediumnok@gmail.com> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Nok Lam Chan <mediumnok@gmail.com> * Remove spurious mlflow test dependency Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Add explicit check for database existence Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Remove character limit for table names Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Refactor validation steps in ManagedTable Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Remove spurious checks for table and schema name existence Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> --------- Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> Co-authored-by: Danny Farah <danny.farah@quantumblack.com> Co-authored-by: Danny Farah <danny_farah@mckinsey.com> Co-authored-by: Nok Lam Chan <mediumnok@gmail.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * docs: Update APIDataset docs and refactor (#217) * Update APIDataset docs and refactor * Acknowledge community contributor * Fix more broken doc Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Lint Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Fix release notes of upcoming kedro-datasets --------- Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * feat: Release `kedro-datasets` version `1.3.0` (#219) * Modify release version and RELEASE.md Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Add proper name for ManagedTableDataSet Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> * Update kedro-datasets/RELEASE.md Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Revert lost semicolon for release 1.2.0 Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> --------- Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * docs: Fix APIDataSet docstring (#220) * Fix APIDataSet docstring Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add release notes Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Separate [docs] extras from [all] in kedro-datasets Fix gh-143. Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * Update kedro-datasets/tests/spark/test_spark_streaming_dataset.py Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * Update kedro-datasets/kedro_datasets/spark/spark_streaming_dataset.py Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * Update kedro-datasets/setup.py Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> * fix linting issue Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com> Signed-off-by: Juan Luis Cano Rodríguez <hello@juanlu.space> Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai> Signed-off-by: <jmcdonnell@fieldbox.ai> Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> Signed-off-by: Danny Farah <danny_farah@mckinsey.com> Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com> Co-authored-by: Juan Luis Cano Rodríguez <hello@juanlu.space> Co-authored-by: Tingting Wan <110382691+Tingting711@users.noreply.github.com> Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Co-authored-by: Nok Lam Chan <mediumnok@gmail.com> Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Co-authored-by: Tom Kurian <tom_kurian@mckinsey.com> Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Co-authored-by: McDonnellJoseph <90898184+McDonnellJoseph@users.noreply.github.com> Co-authored-by: jmcdonnell <jmcdonnell@fieldbox.ai> Co-authored-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com> Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com> Co-authored-by: Danny Farah <danny.farah@quantumblack.com> Co-authored-by: Danny Farah <danny_farah@mckinsey.com> Co-authored-by: kuriantom369 <116743025+kuriantom369@users.noreply.github.com>

McDonnellJoseph force-pushed the dataset-api-add_save_method branch from 8cd49e7 to 96ecbdc Compare April 18, 2023 12:34

McDonnellJoseph marked this pull request as ready for review April 18, 2023 13:43

astrojuanlu mentioned this pull request May 9, 2023

Intermittent pip install failures make Windows tests flaky kedro-org/kedro#2570

Closed

astrojuanlu mentioned this pull request Apr 19, 2023

Add a save method for the APIDataSet #166

Closed

astrojuanlu changed the title ~~Closes #166 Dataset api add save method~~ Dataset API add save method Apr 19, 2023

noklam reviewed Apr 19, 2023

View reviewed changes

kedro-datasets/kedro_datasets/api/api_dataset.py Outdated Show resolved Hide resolved

McDonnellJoseph requested a review from noklam April 19, 2023 13:42

noklam reviewed Apr 20, 2023

View reviewed changes

kedro-datasets/kedro_datasets/api/api_dataset.py Outdated Show resolved Hide resolved

noklam added the Community Issue/PR opened by the open-source community label Apr 24, 2023

McDonnellJoseph requested a review from noklam April 28, 2023 10:00

merelcht reviewed May 9, 2023

View reviewed changes

McDonnellJoseph force-pushed the dataset-api-add_save_method branch from 855912a to b396ad3 Compare May 11, 2023 10:12

noklam reviewed May 11, 2023

View reviewed changes

McDonnellJoseph force-pushed the dataset-api-add_save_method branch from b396ad3 to dab7584 Compare May 11, 2023 12:12

astrojuanlu reviewed May 11, 2023

View reviewed changes

kedro-datasets/tests/api/test_api_dataset.py Outdated Show resolved Hide resolved

jmcdonnell added 5 commits May 11, 2023 17:03

[FIX] lint files

399578e

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

[FIX] fix conflicts

3b053f0

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

[FIX] remove fail save test

81a1c19

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

[ENH] review suggestions

821b671

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

[ENH] fix tests

0e35eef

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

McDonnellJoseph force-pushed the dataset-api-add_save_method branch from 134ab8a to 0e35eef Compare May 11, 2023 15:04

McDonnellJoseph force-pushed the dataset-api-add_save_method branch from 9f37087 to f8631cc Compare May 11, 2023 15:32

astrojuanlu changed the title ~~Dataset API add save method~~ feat: Dataset API add save method May 11, 2023

astrojuanlu requested review from noklam and merelcht May 11, 2023 16:32

noklam reviewed May 11, 2023

View reviewed changes

Merge branch 'main' into dataset-api-add_save_method

1f0b3de

McDonnellJoseph force-pushed the dataset-api-add_save_method branch from f8631cc to 1f0b3de Compare May 12, 2023 14:41

astrojuanlu approved these changes May 19, 2023

View reviewed changes

noklam approved these changes May 19, 2023

View reviewed changes

kedro-datasets/kedro_datasets/api/api_dataset.py Show resolved Hide resolved

jmcdonnell and others added 2 commits May 19, 2023 14:06

[FIX] reorder arguments

a803d58

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

Merge branch 'main' into dataset-api-add_save_method

1fa8630

astrojuanlu enabled auto-merge (squash) May 22, 2023 09:48

astrojuanlu merged commit 4570cb0 into kedro-org:main May 22, 2023

noklam mentioned this pull request May 22, 2023

docs: Update APIDataset docs and refactor #217

Merged

4 tasks

noklam mentioned this pull request Aug 10, 2023

fix(datasets): do not double encode the data as json when saving an A… #301

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Dataset API add `save` method #180

feat: Dataset API add `save` method #180

McDonnellJoseph commented Apr 18, 2023 •

edited by astrojuanlu

Loading

McDonnellJoseph commented Apr 18, 2023

astrojuanlu commented Apr 18, 2023

McDonnellJoseph commented Apr 19, 2023

McDonnellJoseph commented Apr 19, 2023

McDonnellJoseph commented Apr 19, 2023

astrojuanlu commented Apr 19, 2023

McDonnellJoseph commented Apr 19, 2023

noklam left a comment

noklam left a comment

noklam commented Apr 20, 2023

McDonnellJoseph commented Apr 21, 2023

noklam commented Apr 24, 2023

McDonnellJoseph commented Apr 26, 2023

McDonnellJoseph commented May 9, 2023

merelcht left a comment

noklam left a comment

McDonnellJoseph commented May 11, 2023

astrojuanlu commented May 11, 2023

McDonnellJoseph commented May 11, 2023

McDonnellJoseph commented May 11, 2023

astrojuanlu commented May 11, 2023

astrojuanlu commented May 11, 2023

noklam left a comment

noklam left a comment

McDonnellJoseph commented May 12, 2023 •

edited

Loading

noklam commented May 19, 2023

astrojuanlu left a comment

astrojuanlu commented May 22, 2023

feat: Dataset API add save method #180

feat: Dataset API add save method #180

Conversation

McDonnellJoseph commented Apr 18, 2023 • edited by astrojuanlu Loading

Description

Development notes

Checklist

McDonnellJoseph commented Apr 18, 2023

astrojuanlu commented Apr 18, 2023

McDonnellJoseph commented Apr 19, 2023

McDonnellJoseph commented Apr 19, 2023

McDonnellJoseph commented Apr 19, 2023

astrojuanlu commented Apr 19, 2023

McDonnellJoseph commented Apr 19, 2023

noklam left a comment

Choose a reason for hiding this comment

noklam left a comment

Choose a reason for hiding this comment

noklam commented Apr 20, 2023

McDonnellJoseph commented Apr 21, 2023

noklam commented Apr 24, 2023

McDonnellJoseph commented Apr 26, 2023

McDonnellJoseph commented May 9, 2023

merelcht left a comment

Choose a reason for hiding this comment

noklam left a comment

Choose a reason for hiding this comment

McDonnellJoseph commented May 11, 2023

astrojuanlu commented May 11, 2023

McDonnellJoseph commented May 11, 2023

McDonnellJoseph commented May 11, 2023

astrojuanlu commented May 11, 2023

astrojuanlu commented May 11, 2023

noklam left a comment

Choose a reason for hiding this comment

noklam left a comment

Choose a reason for hiding this comment

McDonnellJoseph commented May 12, 2023 • edited Loading

noklam commented May 19, 2023

astrojuanlu left a comment

Choose a reason for hiding this comment

astrojuanlu commented May 22, 2023

feat: Dataset API add `save` method #180

feat: Dataset API add `save` method #180

McDonnellJoseph commented Apr 18, 2023 •

edited by astrojuanlu

Loading

McDonnellJoseph commented May 12, 2023 •

edited

Loading