[WIP] Adds data stores and references by MSeal · Pull Request #37 · nteract/scrapbook

MSeal · 2019-04-01T03:29:27Z

This is still a WIP, more testing and internal API contracts need to be evaluated

In the glue api there is now a store and data_path optional arguments. These allow for specifying the path one wishes to use in saving data, as well as the storage mechanism for that data (default "notebook"). In most cases to support remote saves the caller simply has to add data_path="s3://mybucket/my/path/to/data.json to the glue call to save data to that remote path. When recalling the named data scrap the library will automatically know how to fetch and translate the data saved at the referenced path.

Internally, In place of just specifying an encoder, there's now an encoder_name and a store_name ("json", "notebook") associated with data managers. Data managers can implement encoding or storing, or both capabilities. One can implement ("text", None) and (None, "s3") independently. This lets simple encoders return string or bytes that the stores can save while allowing for more complex store and recall mechanism to encode and store at the same time (e.g. dataframe to multi-file s3 parquet via ("arrow", "s3")).

To facilitate the contract changes there is now a scrapbook v2 schema. Loading v1 or v2 data is transparent to the library user.

willingc · 2019-04-05T12:03:28Z

@@ -1,5 +1,6 @@
 pandas
 six
+retry


Looks like this hasn't been updated in 3 years. Is this something that we really need as a dependency? If so, is there a better maintained alternative.

I could change it over to tenacity. I think I left this here by accident while I had GCS code in my dev, so might not need it in this repo at all.

Cool. Let's double check and remove if possible.

todo · 2019-04-07T22:09:52Z

Handle missing depedency as papermill does

scrapbook/scrapbook/managers.py

Lines 20 to 25 in 3d24c67

    
               # TODO: Handle missing depedency as papermill does 
        
               class S3(object): pass 
        
           from .scraps import scrap_to_payload 
        
           from .exceptions import ScrapbookMissingEncoder, ScrapbookMissingStore

This comment was generated by todo based on a `TODO` comment in `3d24c67` in #37. cc @MSeal.

captainsafia · 2019-04-07T23:33:21Z

+
+        for manager in reversed(list(self.values())):
+            # Only check for managers here that can both encode and store
+            if not (manager.encoder_name and manager.store_name):


I found this function a little hard to read. I feel like it would be easier to read if this if-statement was inverted and the logic inside the try statement was placed inside it. Something like:

if manager.encoder_name and manager.store_name: blah

Assuming I understand that the intent of this function is to run through the list of valid managers and retrieve the ones that work with the scrap passed as a parameter.

Good point. I'll take a close look at this section next pass

captainsafia · 2019-04-07T23:35:42Z

+        if scrap.encoder is not None:
+            return self.get((scrap.encoder, None))
+
+        for encoder in reversed(list(self.values())):


Curious: is there a reason you reverse the list of encoders here and elsewhere? Does it have something to do with the properties of an OrderedDict?

Yes. You can't prefix to OrderedDicts easily, which we need to be able to do, it requires rebuilding the whole structure.

MSeal · 2019-12-04T21:02:07Z

Found a few issues in the PR implementation when I tried to get the conflicts resolved. I'm going to close break this into some smaller PRs instead.

Initial pass on adding data references

03d8761

MSeal requested review from mpacer, rgbkrk and willingc April 1, 2019 03:29

mpacer mentioned this pull request Apr 1, 2019

Dev Docs #38

Open

willingc reviewed Apr 5, 2019

View reviewed changes

MSeal requested a review from captainsafia April 7, 2019 21:51

Fixed flake issues. Fixed several code issues found as a result.

3d24c67

captainsafia reviewed Apr 7, 2019

View reviewed changes

MSeal mentioned this pull request May 12, 2019

How to connect papermill to a remote server? nteract/papermill#361

Closed

MSeal mentioned this pull request Oct 22, 2019

Allow saving of dataframes #59

Closed

MSeal closed this Dec 4, 2019

MSeal mentioned this pull request Dec 15, 2019

Pandas glue support #62

Merged

MSeal mentioned this pull request Feb 4, 2020

Workflows: Executing notebooks as a DAG? nteract/papermill#468

Closed

MSeal mentioned this pull request Apr 3, 2020

Scrap of type Int not supported #71

Closed

MSeal mentioned this pull request Jul 7, 2020

Add complete data ref for basic data payload #25

Open

MSeal deleted the references branch August 18, 2020 00:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Adds data stores and references #37

[WIP] Adds data stores and references #37
MSeal wants to merge 2 commits into
nteract:masterfrom
MSeal:references

MSeal commented Apr 1, 2019

Uh oh!

willingc Apr 5, 2019

Uh oh!

MSeal Apr 5, 2019

Uh oh!

willingc Apr 5, 2019

Uh oh!

todo Bot commented Apr 7, 2019

Uh oh!

captainsafia Apr 7, 2019

Uh oh!

MSeal Apr 8, 2019

Uh oh!

captainsafia Apr 7, 2019

Uh oh!

MSeal Apr 8, 2019

Uh oh!

MSeal commented Dec 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

@@ @@ -1,5 +1,6 @@ @@
               pandas
               six
+              retry

Conversation

MSeal commented Apr 1, 2019

Uh oh!

willingc Apr 5, 2019

Choose a reason for hiding this comment

Uh oh!

MSeal Apr 5, 2019

Choose a reason for hiding this comment

Uh oh!

willingc Apr 5, 2019

Choose a reason for hiding this comment

Uh oh!

todo Bot commented Apr 7, 2019

Handle missing depedency as papermill does

This comment was generated by todo based on a TODO comment in 3d24c67 in #37. cc @MSeal.

Uh oh!

captainsafia Apr 7, 2019

Choose a reason for hiding this comment

Uh oh!

MSeal Apr 8, 2019

Choose a reason for hiding this comment

Uh oh!

captainsafia Apr 7, 2019

Choose a reason for hiding this comment

Uh oh!

MSeal Apr 8, 2019

Choose a reason for hiding this comment

Uh oh!

MSeal commented Dec 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

This comment was generated by todo based on a `TODO` comment in `3d24c67` in #37. cc @MSeal.