Skip to content
This repository was archived by the owner on Feb 21, 2026. It is now read-only.

[WIP] Adds data stores and references #37

Closed
MSeal wants to merge 2 commits into
nteract:masterfrom
MSeal:references
Closed

[WIP] Adds data stores and references #37
MSeal wants to merge 2 commits into
nteract:masterfrom
MSeal:references

Conversation

@MSeal
Copy link
Copy Markdown
Contributor

@MSeal MSeal commented Apr 1, 2019

This is still a WIP, more testing and internal API contracts need to be evaluated

In the glue api there is now a store and data_path optional arguments. These allow for specifying the path one wishes to use in saving data, as well as the storage mechanism for that data (default "notebook"). In most cases to support remote saves the caller simply has to add data_path="s3://mybucket/my/path/to/data.json to the glue call to save data to that remote path. When recalling the named data scrap the library will automatically know how to fetch and translate the data saved at the referenced path.

Internally, In place of just specifying an encoder, there's now an encoder_name and a store_name ("json", "notebook") associated with data managers. Data managers can implement encoding or storing, or both capabilities. One can implement ("text", None) and (None, "s3") independently. This lets simple encoders return string or bytes that the stores can save while allowing for more complex store and recall mechanism to encode and store at the same time (e.g. dataframe to multi-file s3 parquet via ("arrow", "s3")).

To facilitate the contract changes there is now a scrapbook v2 schema. Loading v1 or v2 data is transparent to the library user.

@MSeal MSeal requested review from mpacer, rgbkrk and willingc April 1, 2019 03:29
@mpacer mpacer mentioned this pull request Apr 1, 2019
Comment thread requirements.txt Outdated
@@ -1,5 +1,6 @@
pandas
six
retry
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this hasn't been updated in 3 years. Is this something that we really need as a dependency? If so, is there a better maintained alternative.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could change it over to tenacity. I think I left this here by accident while I had GCS code in my dev, so might not need it in this repo at all.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Let's double check and remove if possible.

@MSeal MSeal requested a review from captainsafia April 7, 2019 21:51
@todo
Copy link
Copy Markdown

todo Bot commented Apr 7, 2019

Handle missing depedency as papermill does

# TODO: Handle missing depedency as papermill does
class S3(object): pass
from .scraps import scrap_to_payload
from .exceptions import ScrapbookMissingEncoder, ScrapbookMissingStore


This comment was generated by todo based on a TODO comment in 3d24c67 in #37. cc @MSeal.

Comment thread scrapbook/managers.py

for manager in reversed(list(self.values())):
# Only check for managers here that can both encode and store
if not (manager.encoder_name and manager.store_name):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this function a little hard to read. I feel like it would be easier to read if this if-statement was inverted and the logic inside the try statement was placed inside it. Something like:

if manager.encoder_name and manager.store_name:
    blah

Assuming I understand that the intent of this function is to run through the list of valid managers and retrieve the ones that work with the scrap passed as a parameter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll take a close look at this section next pass

Comment thread scrapbook/managers.py
if scrap.encoder is not None:
return self.get((scrap.encoder, None))

for encoder in reversed(list(self.values())):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: is there a reason you reverse the list of encoders here and elsewhere? Does it have something to do with the properties of an OrderedDict?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. You can't prefix to OrderedDicts easily, which we need to be able to do, it requires rebuilding the whole structure.

@MSeal
Copy link
Copy Markdown
Contributor Author

MSeal commented Dec 4, 2019

Found a few issues in the PR implementation when I tried to get the conflicts resolved. I'm going to close break this into some smaller PRs instead.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants