-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added non-git source puller functionality #194
base: main
Are you sure you want to change the base?
Conversation
The optional third command-line argument, repo_dir, is now a keyword argument and the name is changed to target-dir; this allows us to avoid having to pass an empty branch_name as the second argument if you want to pass in the repo_dir(now target-dir) as the third argument. In the previous configuration, we used positional arguments: gitpuller git_url branch_name repo_dir Now, we use a keyword argument for target-dir: gitpuller git_url [branch_name] --target-dir [TARGET_DIR]
d02f6cd
to
656420e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very quick review, particularly around plugin discovery with pluggy. I'll provide another pass early next week.
Excited to get this landed!
Handles non-git source compressed archives from google drive, dropbox, and any publicly available web address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more thorough review! I wrote this up yesterday but GitHub ate it :(
hey all - what's the status on this one? @sean-morris do you have time to finish this up? |
@choldgraf somehow this message slipped by me .... yes I have been talking to @yuvipanda. It is embarrassing how long this took. The move to async was rough ... ! |
480828a
to
387ead8
Compare
Ok, I have a few pieces:
|
@sean-morris great! I'll try to do a review this week. From now on, can we just add additional commits and not force push to make reviewing changes easier? You also need to rebase against master locally at some point to resolve the conflicts |
@yuvipanda Maybe I blew this but I think all the changes since the last review are in the second commit on September 12th? Yes on the rebase -- of course. I read the error wrong. good deal. |
This includes moving most of the functions in plugin_helper.py to async generators. The GitSyncView also opens the console view whenever anything is written to it -- error or just progress output.
61ce7c5
to
ab80daf
Compare
@sean-morris wow this looks like a very thorough PR with lots of functionality added! Could you write an overview of this PR to help me review it? To help you, help me, help you, help everyone with this contribution, I considered what I would like to learn from such overview, and here are some guiding questions that would be helpful to me and perhaps other reviewers.
|
I wondered about doing this. Yes, no problem. I started writing something
up. I can work on it. I had similar thoughts -- a ton of changes to make it
all come together. Although that said, there is very little that changed in
regards to pulling sources based in git.
…On Tue, Oct 26, 2021 at 4:01 PM Erik Sundell ***@***.***> wrote:
@sean-morris <https://github.com/sean-morris> wow this looks like a very
thorough PR with lots of functionality added! Could you write an overview
of this PR to help me review it?
To help you, help me, help you, help everyone with this contribution, I
considered what I would like to learn from such overview, and here are some
guiding questions that would be helpful to me and perhaps other reviewers.
- About the goal outcome as understandable from an end user of
nbgitpuller
- About what changes you've opted to make towards the goal
- Your expectations about it being a breaking change or not, and in
what regard you could suspect it could be breaking if it would be breaking
in any way.
- About how you test the functionality
- About anything not included in PR you consider relevant to
strengthen in the code to make it more robust and maintainable
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#194 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALR342SNYZXLXND7J3QK6DUI46NHANCNFSM5AYFIAUQ>
.
|
About the goal outcome as understandable from an end-user of nbgitpuller:
About what changes you've opted to make towards the goal
Your expectations about it being a breaking change or not, and in what regard you could suspect it could be breaking if it would be breaking in any way.
About how you test the functionality
About anything not included in PR you consider relevant to strengthen in the code to make it more robust and maintainable A summary of what is involved: If it does exist:
From here, the original Gitpuller takes over.
I could also get on zoom and move through what I did to pull all this off. Please tell me if I can add more detail. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks amazing! I felt I left very few comments on changes to consider given how large this PR is.
Other suggestions
- Add a README.md file to the root of the plugin folder, describing the plugin systems use a bit.
- Clarify if these plugins are installed by default, or if they are something you need to enable or similar.
- Add a README.md file within each plugin folder where the plugin is very briefly described. I'm thinking for example one paragraph with 3-4 sentences and perhaps an example of what kind of URL you may specify to use that specific plugin.
As you may note, I put quite a bit of focus here on documentation for comprehension.
What are your thoughts on how this is tested and maintained long term- do you think the tests you've added are sufficient for us to release future versions, or merge PRs not directly related to this functionality, without worrying about things accidentally breaking? Or would you recommend some manual testing? |
- removed the global declaration of TEMP_DOWNLOAD_REPO_DIR - the directory is now created by tempfile.TemporaryDirectory
@manics @consideRatio
I got this in -- good deal |
@sean-morris I think |
- can now pass in a custom download function and/or custom download function parameters. - the temp_download_file is added to custom download_func_params -- I did this so that a plugin does not need to know about the temp_download_file nor try to handle it. - the download_archive function now uses keywords: repo and temp_download_file - moved the tempfile cleanup to a finally - removed the dir specification for tempfile
Removing the TEMP_DOWNLOAD_DIR from the plugin_helper.py meant that the plugins did not have access to the temporary directory created by plugin_helper.handle_files_helper. It got a little confusing as to how to handle this. I ended up having the plugin_helper handle adding the temp_download_dir to the download_func_params. I am not sure this is the best method. I was trying to give plugins' the flexibility to customize their download functionality but leverage the other pieces of the download process against what I had written in plugin_helper. It ends up getting a bit complicated on the plugin developer side. Open to ideas here. |
We needed this to deal with the dir_names being set in the async generator but needing to be returned by handle_files_helper
@manics @consideRatio |
Seems ok i think! Seems like a more robust and easy to understand logic. / Brief review check fron mobile |
@sean-morris wrote:
I'm thinking that a lot of the complexity stem from not having a clear abstraction layer. Ideally in my mind, the default git content provider should be able to act as a plugin - the default one. It could be a feature creep, it could also be a helpful strategy on how to arrive at a clear logical separation of concern for the various parts of code we work with. I don't yet understand the technical details or hurdles well enough so think clearly about this though. Perhaps we can have a video meet at the end of the next week @sean-morris to sync a bit? I'd like to make sure we get this to land without you overworking yourself. I'm projecting, but if I'd worked so thoroughly and for so long on a feature, I'd be quite tired on feeling the goal post moving around. I'd like to ensure we get into the goal while also ensuring you are able to not feel alone with the responsibility of making it happen. |
This makes sense to me to have all content providers including git sources be a plug-in although it doesn't solve some of these temporary download directory problems; the git sources don't need it. Side note, I actually think I have a solution to my problem with the temporary download directory that will be clear. But generally, it makes sense to me to bring git sources into a plugin as well. Thanks for your concern. It is definitely a bit of a push. I felt pressure to keep it moving in the face of bunches of other work piling up! Yes, I can definitely talk over video. I had wanted to suggest this. Some of the details are hard to get clear in writing. In the meantime, I am going to move git sources to a plugin and implement the temp download directory solution. We can see what it looks like. So you are nine hours ahead of me. I could do a 6:00 in the morning meeting which puts you around 3:00 and gives me enough flexibility to not screw up the rest of my life! name the day. |
@sean-morris ❤️ I'm contacting you on slack! |
The downloader-plugin utilities and tests are now in their own repo with the other downloader plugins.
The downloader-plugin utilities and tests are now in their own repo with the other downloader plugins.
nbgitpuller/__init__.py
Outdated
# this allows us to nest usage of the event_loop from asyncio | ||
# being used by tornado in jupyter distro | ||
# Ref: https://medium.com/@vyshali.enukonda/how-to-get-around-runtimeerror-this-event-loop-is-already-running-3f26f67e762e | ||
nest_asyncio.apply() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This monkeypatches the asyncio event loop, is this event loop shared across the whole of jupyter-server? If so are there any potential issues that might be caused in jupyter-server, extensions, or kernels that we should be aware of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@consideRatio @manics Erik do you have any sense of this? I completely based this solution on the reference I linked to in the comments. He seemed to be trying to solve a similar nested event loop issue in Jupyter.
I read a bunch more pieces.
- Here is a discussion about where the error occurs(Runtime error related to trying to start an already running loop); the ipykernel seems to be the issue. The nested_asyncio solution becomes a part of the thread in August 2020.
- Here is another from the jupyter blog in 2018; at this point, they don't want to integrate the nested_asyncio package until there is more testing.
- I also tried upgrading the tornado and ipykernel packages; a random comment seemed to indicate it would solve the problem; after upgrading tornado to version 6.1(no change in my environment) and ipykernel to 6.6.1(up from 6.6.0) -- it didn't work. I still need the nested_asyncio package.
pip3 install --upgrade tornado
pip3 install --upgrade ipykernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the nest_asyncio calls to the nbgitpuller-downloader-plugins themselves as well as added notes to the documentation that the import and call to nest_asyncio need to be included in any nbgitpuller-downloader-plugin.
The nested_asyncio import and call are moved to the nbgitpuller downloader plugins
The logic related to downloading compressed archives now is initiated by the the same Thread GitPuller uses.
We moved all the logic related to pulling compressed archives to the GitPuller class;
These changes reflect changes to the downloader-plugins that now encapsulate classes and store the handle_files outputs in an instance variable that we access from nbgitpuller when the downloader plugin is complete. This required the addition of a function to automatically register nbgitpuller-downloader-plugin classes with pluggy.
Where is this project? Users want the cool new functionality! |
It is swallowed up in edx. I am looking forward this getting back to it.
…On Thu, 6 Oct 2022 at 16:41 Eric Van Dusen ***@***.***> wrote:
Where is this project? Users want the cool new functionality!
—
Reply to this email directly, view it on GitHub
<#194 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALR347RYQ4W2XAOCRLXP33WB5PSRANCNFSM5AYFIAUQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Just another note that I think this would be useful as well. Jupyter book creates ipynb files that are bundled with the websites that are generated, and I think it'd be useful if we could generate interact links that pointed to those ipynb files rather than the ones that are on GitHub. |
It is one the goals for the new year! We got buried trying to get EdX Data
8x up and then a bunch of Community College pilots over the last 12 months.
…On Sat, Dec 31, 2022 at 12:12 AM Chris Holdgraf ***@***.***> wrote:
Just another note that I think this would be useful as well. Jupyter book
creates ipynb files that are bundled with the websites that are generated,
and I think it'd be useful if we could generate interact links that pointed
to those ipynb files rather than the ones that are on GitHub.
—
Reply to this email directly, view it on GitHub
<#194 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALR346NEYJKUC2IOVPL3J3WP7THTANCNFSM5AYFIAUQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I thought I stopped by this old issue once a year to say hi, but, it looks like its been two whole years! |
This PR adds support for downloading non-git-based compressed archives(zip and tar-compatible) of jupyter notebooks and files to nbgitpuller. The documentation of plugin development, as well as examples of plugins that download from Google Drive, Dropbox, and any publicly available web address, are found in this issue.