-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add attribute to flag persistence in Dataset classes #3520
Conversation
Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some questions, but the general gist looks good. I think you'll also need to check if there's any datasets in kedro-datasets
that will need _EPHEMERAL
set to True.
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
One question: What prevents users from creating non-ephemeral datasets with Also, I don't see an issue linked to this, but I guess this is related to kedro-org/kedro-viz#1709 ? |
That is an interest question. Do we want this to be something users have control over (with the possibility for error)? We could use the other implementation suggestion and define a dataset as ephemeral if it inherits from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to add this change to the release notes. I also think it would be helpful to add a sentence here https://docs.kedro.org/en/stable/data/how_to_create_a_custom_dataset.html about the _EPHEMERAL
attribute.
Otherwise this LGTM 👍
Signed-off-by: lrcouto <laurarccouto@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs changes are spot on, thanks! 🌟
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com> Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
Thank you! I just added one extra sentence to clarify the example using MemoryDataset, as it's meant to be an example of a dataset that's not persistent. The phrasing was a little ambiguous IMO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
RELEASE.md
Outdated
@@ -19,6 +19,7 @@ | |||
* Added `source_dir` explicitly in `pyproject.toml` for non-src layout project. | |||
* `MemoryDataset` entries are now included in free outputs. | |||
* Removed black dependency and replaced it functionality with `ruff format`. | |||
* Added the `_EPHEMERAL` attribute to `AbstractDataset` and other Dataset classes that inherit from it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved up to the Upcoming release 0.19.3
section.
Signed-off-by: lrcouto <laurarccouto@gmail.com>
Description
Adds the
_EPHEMERAL
attribute to theAbstractDataset
class. This attribute will serve to indicate if aDataset
object is persistent or not. It is set to be False by default, and should be changed to True in case a Dataset is not meant to be persistent, as seen, for example, inMemoryDataset
.Development notes
Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file