Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add attribute to flag persistence in Dataset classes #3520

Merged
merged 18 commits into from
Jan 24, 2024

Conversation

lrcouto
Copy link
Contributor

@lrcouto lrcouto commented Jan 17, 2024

Description

Adds the _EPHEMERAL attribute to the AbstractDataset class. This attribute will serve to indicate if a Dataset object is persistent or not. It is set to be False by default, and should be changed to True in case a Dataset is not meant to be persistent, as seen, for example, in MemoryDataset.

Development notes

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: lrcouto <laurarccouto@gmail.com>
@lrcouto lrcouto changed the base branch from main to develop January 17, 2024 01:20
lrcouto and others added 3 commits January 17, 2024 11:25
Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
@lrcouto lrcouto linked an issue Jan 17, 2024 that may be closed by this pull request
Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
kedro/io/core.py Outdated Show resolved Hide resolved
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some questions, but the general gist looks good. I think you'll also need to check if there's any datasets in kedro-datasets that will need _EPHEMERAL set to True.

kedro/io/lambda_dataset.py Outdated Show resolved Hide resolved
kedro/io/shared_memory_dataset.py Outdated Show resolved Hide resolved
lrcouto and others added 4 commits January 18, 2024 13:11
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
@lrcouto lrcouto self-assigned this Jan 19, 2024
@lrcouto lrcouto marked this pull request as ready for review January 19, 2024 03:00
@astrojuanlu
Copy link
Member

One question: What prevents users from creating non-ephemeral datasets with _EPHEMERAL = True? Or the other way around?

Also, I don't see an issue linked to this, but I guess this is related to kedro-org/kedro-viz#1709 ?

@merelcht
Copy link
Member

@astrojuanlu It's Add an attribute to dataset classes to flag persistence

@lrcouto
Copy link
Contributor Author

lrcouto commented Jan 19, 2024

One question: What prevents users from creating non-ephemeral datasets with _EPHEMERAL = True? Or the other way around?

Also, I don't see an issue linked to this, but I guess this is related to kedro-org/kedro-viz#1709 ?

That is an interest question. Do we want this to be something users have control over (with the possibility for error)? We could use the other implementation suggestion and define a dataset as ephemeral if it inherits from MemoryDataset. I had understood that this would be something that users would set up themselves, to allow for this option to be used in user-created datasets.

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add this change to the release notes. I also think it would be helpful to add a sentence here https://docs.kedro.org/en/stable/data/how_to_create_a_custom_dataset.html about the _EPHEMERAL attribute.

Otherwise this LGTM 👍

Copy link
Contributor

@stichbury stichbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs changes are spot on, thanks! 🌟

lrcouto and others added 3 commits January 24, 2024 11:38
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
@lrcouto
Copy link
Contributor Author

lrcouto commented Jan 24, 2024

Docs changes are spot on, thanks! 🌟

Thank you! I just added one extra sentence to clarify the example using MemoryDataset, as it's meant to be an example of a dataset that's not persistent. The phrasing was a little ambiguous IMO.

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

RELEASE.md Outdated
@@ -19,6 +19,7 @@
* Added `source_dir` explicitly in `pyproject.toml` for non-src layout project.
* `MemoryDataset` entries are now included in free outputs.
* Removed black dependency and replaced it functionality with `ruff format`.
* Added the `_EPHEMERAL` attribute to `AbstractDataset` and other Dataset classes that inherit from it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved up to the Upcoming release 0.19.3 section.

@lrcouto lrcouto merged commit 7b0f701 into develop Jan 24, 2024
35 checks passed
@lrcouto lrcouto deleted the add-attribute-to-flag-persistence branch January 24, 2024 19:10
@merelcht merelcht mentioned this pull request Mar 20, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an attribute to dataset classes to flag persistence
4 participants