Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added non-git source puller functionality #194

Open
wants to merge 45 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
ea87f2b
Command-line argument repo_dir is changed
sean-morris Jun 24, 2021
10385bb
Added non-git source puller functionality
sean-morris Jun 24, 2021
ab80daf
Added async functionality to non-git archives
sean-morris Aug 11, 2021
71ca2f4
Update nbgitpuller/plugin_helper.py
sean-morris Nov 3, 2021
ae66e53
Update nbgitpuller/hookspecs.py
sean-morris Nov 3, 2021
8934f5f
renamed and simplified the test_files
sean-morris Nov 4, 2021
ac2072c
added README to plugins
sean-morris Nov 4, 2021
a84096d
added docstring to progress_loop function
sean-morris Nov 4, 2021
86fd7bf
Update tests/test_download_puller.py
sean-morris Nov 4, 2021
c686651
Update tests/test_download_puller.py
sean-morris Nov 4, 2021
f8e04f1
Removed Downloader Plugins from Repo
sean-morris Nov 6, 2021
958b0b1
Added Custom Exception for Bad Provider
sean-morris Nov 6, 2021
2048e8d
Merge branch 'main' of https://github.com/jupyterhub/nbgitpuller
sean-morris Nov 8, 2021
398a03f
merged from master and fixed conflicts
sean-morris Nov 8, 2021
9a8fcab
Removed unused import from test file
sean-morris Nov 8, 2021
78e31c3
Added packages to dev-requirements.txt
sean-morris Nov 8, 2021
a131b93
Moved the two constants and REPO_PARENT_DIR out of __init__.py
sean-morris Nov 10, 2021
55da5e1
Revert some trivial formatting changes
consideRatio Nov 17, 2021
0ca6cf9
Apply suggestions from code review
sean-morris Nov 17, 2021
9e808e5
Changes from code review
sean-morris Nov 17, 2021
8d63ee4
Apply suggestions from code review
sean-morris Nov 19, 2021
deecc7b
Removed setTerminalVisibility from automatically opening in UI
sean-morris Nov 23, 2021
a9e08c4
Reverted a mistaken change to command-line args
sean-morris Nov 23, 2021
09c9249
Hookspecs renamed and documented
sean-morris Nov 23, 2021
0085fab
Hookspecs name and seperate helper_args
sean-morris Nov 23, 2021
88ec806
Renamed for clarity
sean-morris Nov 24, 2021
8592d1f
Seperated actual query_line_args from helper_args
sean-morris Nov 24, 2021
21d8f0f
fixed conflicts
sean-morris Nov 24, 2021
ab5dd10
Fixed tests
sean-morris Nov 24, 2021
e8ae5ca
Removed changes not meant to merged
sean-morris Nov 26, 2021
56ad1ee
Apply suggestions from code review
sean-morris Nov 29, 2021
af567ca
Refactored docstrings
sean-morris Nov 29, 2021
782a35b
Refactored docstrings
sean-morris Nov 29, 2021
d034d37
Merge branch 'non-git' of https://github.com/sean-morris/nbgitpuller …
sean-morris Nov 29, 2021
9729464
Fix temp download dir to use the package tempfile
sean-morris Nov 30, 2021
602ef01
provider is now contentProvider in the html/js/query parameters
sean-morris Nov 30, 2021
3ebdc7e
The download_func and download_func_params brought in separately
sean-morris Nov 30, 2021
e22d076
Moved the handle_files_helper in Class
sean-morris Dec 1, 2021
3b14405
Moved downloader-plugin util to own repo
sean-morris Dec 20, 2021
613f863
Moved downloader-plugin util to own repo
sean-morris Dec 20, 2021
5f39c68
Merge branch 'non-git' of https://github.com/sean-morris/nbgitpuller …
sean-morris Dec 20, 2021
f618560
Removed nested_asyncio from init.py
sean-morris Jan 11, 2022
367f3c7
Moved downloader-plugin handling to puller thread
sean-morris Jan 15, 2022
8893970
Moved downloader plugins handling to pull.py
sean-morris Jan 19, 2022
7590c38
Access downloader-plugin results from plugin instance variable
sean-morris Jan 19, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,16 @@ data8assets/
summer/
test-repo/
venv/
.idea/

.ipynb_checkpoints
docs/_build

jupyterhub.sqlite
jupyterhub_cookie_secret
/jupyterhub-proxy.pid

node_modules/
package-lock.json
nbgitpuller/static/dist

nbgitpuller/static/dist
2 changes: 1 addition & 1 deletion dev-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ jupyter-packaging>=0.10
pytest
pytest-cov
flake8
nbclassic
nbclassic
82 changes: 51 additions & 31 deletions nbgitpuller/handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,14 @@
import threading
import json
import os
sean-morris marked this conversation as resolved.
Show resolved Hide resolved
from queue import Queue, Empty
from queue import Queue
import jinja2

from .pull import GitPuller
from .version import __version__



class SyncHandler(IPythonHandler):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
Expand All @@ -38,6 +39,37 @@ def emit(self, data):
self.write('data: {}\n\n'.format(serialized_data))
yield self.flush()

@gen.coroutine
def _wait_for_sync_progress_queue(self, queue):
"""
The loop below constantly checks the queue parameter for messages
that are being sent to the UI so the user is kept aware of progress related to
the downloading of archives and the merging of files into the user's home folder

:param queue: download_queue or the original pull queue
"""
while True:
if queue.empty():
yield gen.sleep(0.5)
continue
progress = queue.get_nowait()
if progress is None:
return
if isinstance(progress, Exception):
self.emit({
'phase': 'error',
'message': str(progress),
'output': '\n'.join([
line.strip()
for line in traceback.format_exception(
type(progress), progress, progress.__traceback__
)
])
})
return

self.emit({'output': progress, 'phase': 'syncing'})

@web.authenticated
@gen.coroutine
def get(self):
Expand All @@ -51,8 +83,11 @@ def get(self):
return

try:
q = Queue()

repo = self.get_argument('repo')
branch = self.get_argument('branch', None)
content_provider = self.get_argument('contentProvider', None)
depth = self.get_argument('depth', None)
if depth:
depth = int(depth)
Expand All @@ -73,11 +108,8 @@ def get(self):
self.set_header('content-type', 'text/event-stream')
self.set_header('cache-control', 'no-cache')

gp = GitPuller(repo, repo_dir, branch=branch, depth=depth, parent=self.settings['nbapp'])

q = Queue()

def pull():
gp = GitPuller(repo, repo_dir, branch=branch, depth=depth, parent=self.settings['nbapp'], content_provider=content_provider, repo_parent_dir=repo_parent_dir, other_kw_args=self.request.arguments.items())
try:
for line in gp.pull():
q.put_nowait(line)
Expand All @@ -86,34 +118,12 @@ def pull():
except Exception as e:
q.put_nowait(e)
raise e
self.gp_thread = threading.Thread(target=pull)

self.gp_thread = threading.Thread(target=pull)
self.gp_thread.start()

while True:
try:
progress = q.get_nowait()
except Empty:
yield gen.sleep(0.5)
continue
if progress is None:
break
if isinstance(progress, Exception):
self.emit({
'phase': 'error',
'message': str(progress),
'output': '\n'.join([
line.strip()
for line in traceback.format_exception(
type(progress), progress, progress.__traceback__
)
])
})
return

self.emit({'output': progress, 'phase': 'syncing'})

yield self._wait_for_sync_progress_queue(q)
self.emit({'phase': 'finished'})

except Exception as e:
self.emit({
'phase': 'error',
Expand Down Expand Up @@ -151,6 +161,7 @@ def get(self):
repo = self.get_argument('repo')
consideRatio marked this conversation as resolved.
Show resolved Hide resolved
branch = self.get_argument('branch', None)
depth = self.get_argument('depth', None)
content_provider = self.get_argument('contentProvider', None)
urlPath = self.get_argument('urlpath', None) or \
self.get_argument('urlPath', None)
subPath = self.get_argument('subpath', None) or \
Expand All @@ -171,10 +182,19 @@ def get(self):
else:
path = 'tree/' + path

if content_provider is not None:
path = "tree/"

self.write(
self.render_template(
'status.html',
repo=repo, branch=branch, path=path, depth=depth, targetpath=targetpath, version=__version__
repo=repo,
branch=branch,
path=path,
depth=depth,
contentProvider=content_provider,
targetpath=targetpath,
version=__version__
))
self.flush()

Expand Down
48 changes: 48 additions & 0 deletions nbgitpuller/plugin_hook_specs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import pluggy

# this hookspec is decorating the handle_files function below. The decorator defines
# the interface(hook specifications) for any implementing content-provider plugins. The project name, nbgitpuller,
# is passed to the constructor for HookspecMarker and HookimplMarker as well as to the constructor for the
# PluginManager in handlers.py in order to allow the PluginManager.add_hookspecs method to automatically discover
# all marked functions.
hookspec = pluggy.HookspecMarker("nbgitpuller")

# As a convenience the hookimpl field can be used by content-provider plugins to decorate the implementations of the
# handle_files function. A content-provider plugin could create the HookImplMarker itself but in order to register
# with the PluginManager the name('nbgitpuller') must be used as we do here.
hookimpl = pluggy.HookimplMarker("nbgitpuller")


@hookspec(firstresult=True)
def handle_files(repo_parent_dir, other_kw_args):
"""
This function must be implemented by content-provider plugins in order to handle the downloading and decompression
of a non-git sourced compressed archive.

The repo_parent_dir is where you will save your downloaded archive

The parameter, other_kw_args, contains all the arguments you put on the nbgitpuller URL link or passed to GitPuller
via CLI. This allows you flexibility to pass information your content-provider download plugin may need to
successfully download source files.

This function needs to return two pieces of information as a json object:
- output_dir -- the is the name of the directory that will hold all the files you want GitPuller to expose
for comparison, when git is the source, this is name of git repository you are pulling
- origin_repo_path -- this is path to the local git repo that "acts" like the remote origin you would use
if the content-provider is git.

Once the files are saved to the directory, git puller can handle all the standard functions needed to make sure
source files are updated or created as needed.

I suggest you study the function handle_files_helper in file plugin_helper.py found in the
nbgitpuller-downloader-plugins repository to get a deep sense of how
we handle the downloading of compressed archives. There is also more documentation in the docs section of
nbgitpuller. Finally, you can always implement the entire download process yourself and not use the
handle_files_helper function but please to sure understand what is being passed into and back to the nbgitpuller
handlers.

:param str repo_parent_dir: save your downloaded archive here
:param dict other_kw_args: this includes any argument you put on the nbgitpuller URL or pass via CLI as a dict
:return two parameter json output_dir and origin_repo_path
:rtype json object
"""
97 changes: 87 additions & 10 deletions nbgitpuller/pull.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,23 @@
import time
import argparse
import datetime
import pluggy
import importlib_metadata
import inspect
from traitlets import Integer, default
from traitlets.config import Configurable
from functools import partial
import plugin_hook_specs


class ContentProviderException(Exception):
"""
Custom Exception thrown when the content_provider key specifying
the downloader plugin is not installed or can not be found by the
name given
"""
def __init__(self, response=None):
self.response = response


def execute_cmd(cmd, **kwargs):
Expand Down Expand Up @@ -45,6 +59,40 @@ def flush():
raise subprocess.CalledProcessError(ret, cmd)


def load_downloader_plugin_classes_from_entrypoints(group, content_provider):
for dist in list(importlib_metadata.distributions()):
for ep in dist.entry_points:
if ep.group == group:
plugin = ep.load()
for name, cls in inspect.getmembers(plugin, inspect.isclass):
if cls.__module__ == ep.value and ep.name == content_provider:
for fn_name, fn in inspect.getmembers(cls, inspect.isfunction):
if fn_name == "handle_files":
return cls
return None


def setup_plugins(content_provider):
"""
This automatically searches for and loads packages whose entrypoint is nbgitpuller. If found,
the plugin manager object is returned and used to execute the hook implemented by
the plugin.
:param content_provider: this is the name of the content_provider; each plugin is named to identify the
content_provider of the archive to be loaded(e.g. googledrive, dropbox, etc)
:return: returns the PluginManager object used to call the implemented hooks of the plugin
:raises: ContentProviderException -- this occurs when the content_provider parameter is not found
"""
plugin_manager = pluggy.PluginManager("nbgitpuller")
plugin_manager.add_hookspecs(plugin_hook_specs)
download_class = load_downloader_plugin_classes_from_entrypoints("nbgitpuller", content_provider)
downloader_obj = download_class()
#num_loaded = plugin_manager.load_setuptools_entrypoints("nbgitpuller", name=content_provider)
if download_class is None:
raise ContentProviderException(f"The content_provider key you supplied in the URL could not be found: {content_provider}")
plugin_manager.register(downloader_obj)
return {"plugin_manager": plugin_manager, "downloader_obj": downloader_obj }


class GitPuller(Configurable):
depth = Integer(
config=True,
Expand All @@ -71,12 +119,9 @@ def __init__(self, git_url, repo_dir, **kwargs):

self.git_url = git_url
self.branch_name = kwargs.pop("branch")

if self.branch_name is None:
self.branch_name = self.resolve_default_branch()
elif not self.branch_exists(self.branch_name):
raise ValueError(f"Branch: {self.branch_name} -- not found in repo: {self.git_url}")

self.content_provider = kwargs.pop("content_provider")
self.repo_parent_dir = kwargs.pop("repo_parent_dir")
self.other_kw_args = kwargs.pop("other_kw_args")
self.repo_dir = repo_dir
newargs = {k: v for k, v in kwargs.items() if v is not None}
super(GitPuller, self).__init__(**newargs)
Expand Down Expand Up @@ -135,11 +180,37 @@ def resolve_default_branch(self):
logging.exception(m)
raise ValueError(m)

def handle_archive_download(self):
try:
plugin_info = setup_plugins(self.content_provider)
plugin_manager = plugin_info["plugin_manager"]
downloader_obj = plugin_info["downloader_obj"]
other_kw_args = {k: v[0].decode() for k, v in self.other_kw_args}
yield from plugin_manager.hook.handle_files(repo_parent_dir=self.repo_parent_dir,other_kw_args=other_kw_args)
results = downloader_obj.handle_files_results
self.repo_dir = self.repo_parent_dir + results["output_dir"]
self.git_url = "file://" + results["origin_repo_path"]
except ContentProviderException as c:
raise c

def handle_branch_name(self):
if self.branch_name is None:
self.branch_name = self.resolve_default_branch()
elif not self.branch_exists(self.branch_name):
raise ValueError(f"Branch: {self.branch_name} -- not found in repo: {self.git_url}")

def pull(self):
"""
Pull selected repo from a remote git repository,
if compressed archive download first.
Execute pull of repo from a git repository(remote or temporary local created for compressed archives),
while preserving user changes
"""
# if content_provider is specified then we are dealing with compressed archive and not a git repo
if self.content_provider is not None:
yield from self.handle_archive_download()

self.handle_branch_name()

if not os.path.exists(self.repo_dir):
yield from self.initialize_repo()
else:
Expand Down Expand Up @@ -303,14 +374,20 @@ def main():

parser = argparse.ArgumentParser(description='Synchronizes a github repository with a local repository.')
parser.add_argument('git_url', help='Url of the repo to sync')
parser.add_argument('branch_name', default=None, help='Branch of repo to sync', nargs='?')
parser.add_argument('repo_dir', default='.', help='Path to clone repo under', nargs='?')
parser.add_argument('repo_dir', help='Path to clone repo under', nargs='?')
parser.add_argument('--branch_name', default=None, help='Branch of repo to sync', nargs='?')
parser.add_argument('--content_provider', default=None, help='If downloading compressed archive instead of using git repo set this(e.g. dropbox, googledrive, generic_web)', nargs='?')
parser.add_argument('--repo_parent_dir', default='.', help='Only used if downloading compressed archive, location of download', nargs='?')
parser.add_argument('--other_kw_args', default=None, help='you can pass any keyword args you want as a dict{"arg1":"value1","arg2":"value2"} -- could be used in downloader plugins', nargs='?')
args = parser.parse_args()

for line in GitPuller(
args.git_url,
args.repo_dir,
branch=args.branch_name if args.branch_name else None
branch=args.branch_name if args.branch_name else None,
content_provider=args.content_provider if args.content_provider else None,
repo_parent_dir=args.repo_parent_dir if args.repo_parent_dir else None,
other_kw_args=args.other_kw_args if args.other_kw_args else None
).pull():
print(line)

Expand Down
Loading