Skip to content

Make better use of OSF capabilties #106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Jul 15, 2020
Merged

Make better use of OSF capabilties #106

merged 25 commits into from
Jul 15, 2020

Conversation

mih
Copy link
Member

@mih mih commented Jun 27, 2020

This PR enables all kind of things. Here is a demo of the most important aspect

(datalad3-dev) mih@meiner /tmp % datalad create mytest
[INFO   ] Creating a new annex repo at /tmp/mytest 
[INFO   ] Scanning for unlocked files (this may take some time) 
create(ok): /tmp/mytest (dataset)

(datalad3-dev) mih@meiner /tmp % cd mytest
(datalad3-dev) mih@meiner /tmp/mytest (git)-[master] % echo 123 > dummy
(datalad3-dev) mih@meiner /tmp/mytest (git)-[master] % datalad save
add(ok): /tmp/mytest/dummy (file)
save(ok): /tmp/mytest (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)

(datalad3-dev) mih@meiner /tmp/mytest (git)-[master] % datalad create-sibling-osf
create-sibling-osf(ok): https://osf.io/yjmwb/
[INFO   ] Configure additional publication dependency on "osf-storage" 
configure-sibling(ok): /tmp/mytest (sibling)
(datalad3-dev) mih@meiner /tmp/mytest (git)-[master] % datalad push --to osf
copy(ok): /tmp/mytest/dummy (file) [to osf-storage...]                                       
publish(ok): /tmp/mytest (dataset) [refs/heads/master->osf:refs/heads/master [new branch]]   
publish(ok): /tmp/mytest (dataset) [refs/heads/git-annex->osf:refs/heads/git-annex [new branch]]                                                                                          

image

and importantly enables this

(datalad3-dev) mih@meiner /tmp % datalad clone osf://yjmwb mytest_clone
[INFO   ] Scanning for unlocked files (this may take some time)                              
install(ok): /tmp/mytest_clone (dataset)

(datalad3-dev) mih@meiner /tmp % cd mytest_clone
(datalad3-dev) mih@meiner /tmp/mytest_clone (git)-[master] % datalad get dummy
get(ok): /tmp/mytest_clone/dummy (file) [from osf-storage...]
(datalad3-dev) mih@meiner /tmp/mytest_clone (git)-[master] % cat dummy
123

None of this here is polished or optimized, but it works in principle!

I would very much appreciate help writing tests for this new functionality and also for adjusting the docs.

mih added 12 commits June 20, 2020 20:23
…rojects

It essentially copies and adjusts https://github.com/datalad/git-remote-rclone
in that it uses a local repo mirror to push and fetch refs to and from,
and uploads a compressed archive to `.git/` of an OSF project that is
identified by a URL of type `osf://<projectid>`.

Because request latency is high, the entire repo is represented as two
files:

- a small text file listing the refs in the repo
- a 7z archive containing all of the actual content

Here is what it can do:

```
% mkdir newrepo
% cd newrepo
% git init
Initialized empty Git repository in /tmp/newrepo/.git/
% touch some
% git add some
% git commit -m initial
[master (root-commit) c552b2b] initial
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 some
% git remote add osf osf://vtha6
% git push --set-upstream osf master
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Writing objects: 100% (3/3), done.
Building bitmaps: 100% (1/1), done.
Total 3 (delta 0), reused 0 (delta 0)
Computing commit graph generation numbers: 100% (1/1), done.
Upload repository archive
To osf://vtha6
 * [new branch]      master -> master
Branch 'master' set up to track remote branch 'master' from 'osf'.

% cd ..
% git clone osf://vtha6 newrepoclone
Cloning into 'newrepoclone'...
fatal: bad revision 'HEAD'
100%|██████████████████████████████████████████████████| 83.0/83.0 [00:00<00:00, 519kbytes/s]
Downloading repository archive
100%|███████████████████████████████████████████████| 7.99k/7.99k [00:00<00:00, 1.04Mbytes/s]
Extracting repository archive

% git -C newrepoclone log -1 --oneline |cat
c552b2b initial
% git -C newrepo log -1 --oneline |cat
c552b2b initial
```

TODO:

- there is substantial code overlap with https://github.com/datalad/git-remote-rclone
  that should refactored, ideally
- there is also some overlap with the special remote implementation
- a `clone` yields an immediate `fatal: bad revision 'HEAD'` output,
  that seems to come before any of this code is executed, no idea where
  this is coming from
and generate a default one, based on dataset ID and root path name
Will auto-add 'DataLad dataset', and the dataset ID to improve
searchability on OSF.
A project is no more than a node of a particular category. With child-nodes
and a more comprehensive use of OSF capabilities the discrepancy between
terminologies becomes more and more problematic. This change replaces
'project' with 'node' in all internal places, and the name of the
'project' parameter of the special remote.

This is also a sensible move as the created "projects" are actually
nodes of type "data" by default (not "project"), and the category is
configurable.
@mih mih changed the title Make OSF project title optional Make better use of OSF capabilties Jun 27, 2020
mih added 3 commits June 27, 2020 17:37
This enables storing the entire VCS. This can be combined with any
mode of the git-annex special remote, and a publication dependency
as set up automatically.
@mih
Copy link
Member Author

mih commented Jul 2, 2020

Both tarfile and zipfile in the stdlib support lzma compression. Use them instead of an external dependency on 7z.

mih added 2 commits July 4, 2020 10:55
This is achieved by replacing the `repo.7z` at the remote end with a
LZMA-compressed `repo.zip`. This has two advantages:

- we no longer require users to install a 3rd-party tools, but stay
  within the capabilities of the standard lib

- OSF is capable of inspecting ZIP files, so users have the ability
  to explore their content, instead of seeing only an opaque blob.
@mih
Copy link
Member Author

mih commented Jul 4, 2020

So it seems that the code has some kind of line-ending issues (oh how I love windows).

In the logs I see:

[DEBUG] Non-progress stderr: b'fatal: This version of fast-import does not support feature done\r'

Somehow a carriage return makes it into the stream that fast-export sends to fast-import(?). I failed to discover how and why.

I am giving up here. If someone stumbles upon this and can figure out why this is happening on windows, please share! Thx.

@sappelhoff
Copy link
Contributor

Can we get this feature for linux/osx anyhow? :-) or is it a strong blocker if windows tests are not passing?

I have no idea what is going on. This needs a Git-Windows person to
figure it out.
@mih
Copy link
Member Author

mih commented Jul 15, 2020

OK, so the remaining test failures are not unique to this PR.

@mih mih merged commit 20bae98 into master Jul 15, 2020
@mih mih deleted the enh-notitle branch July 15, 2020 06:16
@adswa
Copy link
Member

adswa commented Jul 15, 2020

FTR: I will overhaul the documentation today

@sappelhoff
Copy link
Contributor

Let me know if you need/could use help @adswa

@adswa
Copy link
Member

adswa commented Jul 15, 2020

thx much @sappelhoff, I'm still typing, but will create a PR later and appreciate feedback and ideas & commits with improvements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants