-
Notifications
You must be signed in to change notification settings - Fork 12
iterable_subprocess
based annexworktree()
#539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #539 +/- ##
==========================================
+ Coverage 92.73% 92.87% +0.14%
==========================================
Files 137 143 +6
Lines 10157 10365 +208
Branches 1103 1141 +38
==========================================
+ Hits 9419 9627 +208
Misses 714 714
Partials 24 24 ☔ View full report in Codecov by Sentry. |
This comment was marked as outdated.
This comment was marked as outdated.
This commit renames the module `datalad_next.processors` to `datalad_next.itertools`. This makes more sense since the functions that are defined in the module operate on iterables and their results are themselves iterable. Renaming was suggested in the review comment: datalad#539 (comment)
After normalizing the benchmark conditions further, I have the following stats:
This compared to a shell pipe (using
~8% difference, and nearly within the margin of error. This works for me! |
This comment was marked as outdated.
This comment was marked as outdated.
This commit renames the module `datalad_next.processors` to `datalad_next.itertools`. This makes more sense since the functions that are defined in the module operate on iterables and their results are themselves iterable. Renaming was suggested in the review comment: datalad#539 (comment)
This comment was marked as outdated.
This comment was marked as outdated.
49765e5
to
e0e21dd
Compare
This addresses an issues brought up in datalad#539 (comment) This changeset removes the default argument to avoid the impression that "line-processing" is the main target. The code does not imply that, and the existing usage also not. The possibility to do line-splitting is not touched (or removed). The documentation needs no adjustment.
Thie commit uses a `b'\n'` separator when itemizing the output of `git annex find` and does not keep line endings. This simplifies the call to `itemize` and the test for a non-empty key in the splitter-function of the enclosing `route_out`. It also requires to add `b'\n'` to drive the consuming `git annex examinekey`-subprocess. This is done with `intersperse`.
This commit reduces the number of concepts in the implementation of `route_in` and `route_out`. It removes the additional use of booleans in favor of solely using `StoreOnly`. It also replaces tuple indices with semantically named variables.
This commit does: - add a docstring for `iter_annexworktree`, - include `iter_annexworktree`-documentation in the module documentation.
It is modeled after that of `iter_gitworktree()`, but aims to avoid duplication with it. The change also fixes various issues in the source documentation, discovered in this process.
Documents what is TODO.
2d34199
to
0fd61d6
Compare
Previously, it would only open annex objects. Now regular files (tracked or untracked) and symlink targets (via the symlink) are also opened, if they actually exist. The corresponding test is extended appropriately.
Minimal change, because we just pass it on to `iter_gitworktree()`. Still added a smoke test. This is now ready for use in Gooey. Ping datalad#323
aa1b4b3
to
7a6deed
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This generalizes an approach from datalad#539. It is implemented in a way that enables reuse of the helpers in that PR too. With this change regular files (tracked or untracked) and symlink targets (via the symlink) are also opened, if they actually exist. Closes datalad#553
#555 brings helpers that can be used to remove duplication of |
This generalizes an approach from datalad#539. It is implemented in a way that enables reuse of the helpers in that PR too. With this change regular files (tracked or untracked) and symlink targets (via the symlink) are also opened, if they actually exist. Closes datalad#553
This commit fixes two issues with tests under Windows: 1. Test files were windows-1252 encoded 2. Line ending in saved test files do not match the line endings that were provided in `Path.write_text`, if executed under Windows The issues are fixed by: 1. Specifying encoding='utf-8' in `Path.write_text()` 2. Not using line-endings in test-file content
This commit fixes a format string in a git annex find command. The documentation states that a ``\n´´, i.e. two characters: a backslash and an ``n´´, instructs git annex to write a newline-character. We have used the python string '\n', i.e. a single character: newline. Although git annex seems to accept newline and emits a newline, that is undocumented behavior and should therefore not be used.
This is done! A monumental effort. Thanks @christian-monch ! |
This is moving PR mih#3 here, to get it tested properly. Below is the original description by @christian-monch
It includes (sits on top of) #538 (merged now)
TODO
iter_gitworktree()
, plus fixes #552keep_ends=True str.strip()
combo obsolete (and ease debugging why @mih cannot do it, ideally)fp=True
currently only does meaningful things for annexed files, but it should also act properly for any other file. This aspect of the functionality is still undertested.This PR adds an implementation of
iter_annexworktree
that is based oniterable_subprocesses
and the ideas laid out in issue #537.This PR included a collection of data processors that are basically generator-wrapper.
The PR also modifies
iter_gitworktree
to useiterable_subprocesses
instead of the datalad-core runner.The current implementation of
iter_annexworktree
iterates over a dataset with 33k annex files in less than 5 seconds on my machine.