Skip to content

Commit

Permalink
Merge pull request #1 from iterative/master
Browse files Browse the repository at this point in the history
Update
  • Loading branch information
kaiogu authored Feb 23, 2020
2 parents 6dba652 + f673144 commit ffa8fe5
Show file tree
Hide file tree
Showing 32 changed files with 726 additions and 340 deletions.
8 changes: 8 additions & 0 deletions .zenodo.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"title": "DVC: Data Version Control - Git for Data & Models",
"keywords": [
"data-science", "data-version-control", "machine-learning", "git",
"developer-tools", "reproducibility", "collaboration", "ai", "python"],
"contributors": [
{"name": "DVC team", "type": "Other", "affiliation": "Iterative"}]
}
123 changes: 74 additions & 49 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
.. image:: https://dvc.org/static/img/logo-github-readme.png
:target: https://dvc.org
:alt: DVC logo
|Banner|

`Website <https://dvc.org>`_
• `Docs <https://dvc.org/doc>`_
Expand All @@ -10,33 +8,7 @@
• `Tutorial <https://dvc.org/doc/get-started>`_
• `Mailing List <https://sweedom.us10.list-manage.com/subscribe/post?u=a08bf93caae4063c4e6a351f6&id=24c0ecc49a>`_

.. image:: https://img.shields.io/badge/release-ok-brightgreen
:target: https://travis-ci.com/iterative/dvc
:alt: Release

.. image:: https://img.shields.io/travis/com/iterative/dvc/master?label=dev
:target: https://travis-ci.com/iterative/dvc
:alt: Travis dev branch

.. image:: https://codeclimate.com/github/iterative/dvc/badges/gpa.svg
:target: https://codeclimate.com/github/iterative/dvc
:alt: Code Climate

.. image:: https://codecov.io/gh/iterative/dvc/branch/master/graph/badge.svg
:target: https://codecov.io/gh/iterative/dvc
:alt: Codecov

.. image:: https://img.shields.io/badge/patreon-donate-green.svg
:target: https://www.patreon.com/DVCorg/overview
:alt: Donate

.. image:: https://anaconda.org/conda-forge/dvc/badges/version.svg
:target: https://anaconda.org/conda-forge/dvc
:alt: Conda-forge

.. image:: https://img.shields.io/badge/snap-install-82BEA0.svg?logo=snapcraft
:target: https://snapcraft.io/dvc
:alt: Snapcraft
|Release| |CI| |Maintainability| |Coverage| |Donate| |Conda| |Snap| |DOI|

|
Expand Down Expand Up @@ -79,9 +51,7 @@ to store data and model files seamlessly out of Git, while preserving almost the
were stored in Git itself. To store and share the data cache, DVC supports multiple remotes - any cloud (S3, Azure,
Google Cloud, etc) or any on-premise network storage (via SSH, for example).

.. image:: https://dvc.org/static/img/flow.gif
:target: https://dvc.org/static/img/flow.gif
:alt: how_dvc_works
|Flowchart|

The DVC pipelines (computational graph) feature connects code and data together. It is possible to explicitly
specify all steps required to produce a model: input dependencies including data, commands to run,
Expand Down Expand Up @@ -148,6 +118,8 @@ Homebrew
Conda (Anaconda)
----------------

|Conda|

.. code-block:: bash
conda install -c conda-forge dvc
Expand All @@ -157,6 +129,8 @@ Currently, this includes support for Python versions 2.7, 3.6 and 3.7.
Snap (Snapcraft)
----------------

|Snap|

.. code-block:: bash
snap install dvc --classic
Expand Down Expand Up @@ -206,40 +180,43 @@ Comparison to related technologies

Contributing
============

|Maintainability| |Donate|

Contributions are welcome! Please see our `Contributing Guide <https://dvc.org/doc/user-guide/contributing/core>`_ for more
details.

.. image:: https://sourcerer.io/fame/efiop/iterative/dvc/images/0
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/0
:alt: 0
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/0
:alt: 0

.. image:: https://sourcerer.io/fame/efiop/iterative/dvc/images/1
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/1
:alt: 1
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/1
:alt: 1

.. image:: https://sourcerer.io/fame/efiop/iterative/dvc/images/2
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/2
:alt: 2
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/2
:alt: 2

.. image:: https://sourcerer.io/fame/efiop/iterative/dvc/images/3
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/3
:alt: 3
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/3
:alt: 3

.. image:: https://sourcerer.io/fame/efiop/iterative/dvc/images/4
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/4
:alt: 4
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/4
:alt: 4

.. image:: https://sourcerer.io/fame/efiop/iterative/dvc/images/5
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/5
:alt: 5
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/5
:alt: 5

.. image:: https://sourcerer.io/fame/efiop/iterative/dvc/images/6
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/6
:alt: 6
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/6
:alt: 6

.. image:: https://sourcerer.io/fame/efiop/iterative/dvc/images/7
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/7
:alt: 7
:target: https://sourcerer.io/fame/efiop/iterative/dvc/links/7
:alt: 7

Mailing List
============
Expand All @@ -253,3 +230,51 @@ This project is distributed under the Apache license version 2.0 (see the LICENS

By submitting a pull request to this project, you agree to license your contribution under the Apache license version
2.0 to this project.

Citation
========

|DOI|

Iterative, *DVC: Data Version Control - Git for Data & Models* (2020)
`DOI:10.5281/zenodo.012345 <https://doi.org/10.5281/zenodo.3677553>`_.

.. |Banner| image:: https://dvc.org/static/img/logo-github-readme.png
:target: https://dvc.org
:alt: DVC logo

.. |Release| image:: https://img.shields.io/badge/release-ok-brightgreen
:target: https://travis-ci.com/iterative/dvc
:alt: Release

.. |CI| image:: https://img.shields.io/travis/com/iterative/dvc/master?label=dev
:target: https://travis-ci.com/iterative/dvc
:alt: Travis dev branch

.. |Maintainability| image:: https://codeclimate.com/github/iterative/dvc/badges/gpa.svg
:target: https://codeclimate.com/github/iterative/dvc
:alt: Code Climate

.. |Coverage| image:: https://codecov.io/gh/iterative/dvc/branch/master/graph/badge.svg
:target: https://codecov.io/gh/iterative/dvc
:alt: Codecov

.. |Donate| image:: https://img.shields.io/badge/patreon-donate-green.svg
:target: https://www.patreon.com/DVCorg/overview
:alt: Donate

.. |Conda| image:: https://anaconda.org/conda-forge/dvc/badges/version.svg
:target: https://anaconda.org/conda-forge/dvc
:alt: Conda-forge

.. |Snap| image:: https://img.shields.io/badge/snap-install-82BEA0.svg?logo=snapcraft
:target: https://snapcraft.io/dvc
:alt: Snapcraft

.. |DOI| image:: https://img.shields.io/badge/DOI-10.5281/zenodo.3677553-blue.svg
:target: https://doi.org/10.5281/zenodo.3677553
:alt: DOI

.. |Flowchart| image:: https://dvc.org/static/img/flow.gif
:target: https://dvc.org/static/img/flow.gif
:alt: how_dvc_works
12 changes: 6 additions & 6 deletions dvc/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ def __init__(self, url):


def get_url(path, repo=None, rev=None, remote=None):
"""
Returns the full URL to the data artifact specified by its `path` in a
`repo`.
NOTE: There is no guarantee that the file actually exists in that location.
"""Returns URL to the storage location of a data artifact tracked
by DVC, specified by its path in a repo.
NOTE: There's no guarantee that the file actually exists in that location.
"""
with _make_repo(repo, rev=rev) as _repo:
_require_dvc(_repo)
Expand All @@ -31,7 +31,7 @@ def get_url(path, repo=None, rev=None, remote=None):


def open(path, repo=None, rev=None, remote=None, mode="r", encoding=None):
"""Context manager to open a file artifact as a file object."""
"""Context manager to open a tracked file as a file object."""
args = (path,)
kwargs = {
"repo": repo,
Expand Down Expand Up @@ -63,7 +63,7 @@ def _open(path, repo=None, rev=None, remote=None, mode="r", encoding=None):


def read(path, repo=None, rev=None, remote=None, mode="r", encoding=None):
"""Returns the contents of a file artifact."""
"""Returns the contents of a tracked file."""
with open(
path, repo=repo, rev=rev, remote=remote, mode=mode, encoding=encoding
) as fd:
Expand Down
10 changes: 5 additions & 5 deletions dvc/command/diff.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ def _format(diff):
dir/
dir/1
An example of a diff formatted when entries contain checksums:
An example of a diff formatted when entries contain hash:
Added:
d3b07384 foo
Expand Down Expand Up @@ -66,7 +66,7 @@ def _digest(checksum):

for entry in entries:
path = entry["path"]
checksum = entry.get("checksum")
checksum = entry.get("hash")
summary[state] += 1 if not path.endswith(os.sep) else 0
content.append(
"{space}{checksum}{separator}{path}".format(
Expand Down Expand Up @@ -100,10 +100,10 @@ def run(self):
if not any(diff.values()):
return 0

if not self.args.checksums:
if not self.args.show_hash:
for _, entries in diff.items():
for entry in entries:
del entry["checksum"]
del entry["hash"]

if self.args.show_json:
res = json.dumps(diff)
Expand Down Expand Up @@ -149,7 +149,7 @@ def add_parser(subparsers, parent_parser):
default=False,
)
diff_parser.add_argument(
"--checksums",
"--show-hash",
help="Display hash value for each entry",
action="store_true",
default=False,
Expand Down
18 changes: 18 additions & 0 deletions dvc/command/gc.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,16 @@

class CmdGC(CmdBase):
def run(self):
from dvc.repo.gc import _raise_error_if_all_disabled

_raise_error_if_all_disabled(
all_branches=self.args.all_branches,
all_tags=self.args.all_tags,
all_commits=self.args.all_commits,
workspace=self.args.workspace,
cloud=self.args.cloud,
)

msg = "This will remove all cache except items used in "

msg += "the working tree"
Expand Down Expand Up @@ -47,6 +57,7 @@ def run(self):
force=self.args.force,
jobs=self.args.jobs,
repos=self.args.repos,
workspace=self.args.workspace,
)
return 0

Expand All @@ -64,6 +75,13 @@ def add_parser(subparsers, parent_parser):
help=GC_HELP,
formatter_class=argparse.RawDescriptionHelpFormatter,
)
gc_parser.add_argument(
"-w",
"--workspace",
action="store_true",
default=False,
help="Keep data files used in the current workspace.",
)
gc_parser.add_argument(
"-a",
"--all-branches",
Expand Down
11 changes: 9 additions & 2 deletions dvc/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,13 @@ class RelPath(str):
"shared": All(Lower, Choices("group")),
Optional("slow_link_warning", default=True): Bool,
}
HTTP_COMMON = {
"auth": All(Lower, Choices("basic", "digest", "custom")),
"custom_auth_header": str,
"user": str,
"password": str,
"ask_password": Bool,
}
SCHEMA = {
"core": {
"remote": Lower,
Expand Down Expand Up @@ -169,8 +176,8 @@ class RelPath(str):
"gdrive_user_credentials_file": str,
**REMOTE_COMMON,
},
"http": REMOTE_COMMON,
"https": REMOTE_COMMON,
"http": {**HTTP_COMMON, **REMOTE_COMMON},
"https": {**HTTP_COMMON, **REMOTE_COMMON},
"remote": {str: object}, # Any of the above options are valid
}
)
Expand Down
4 changes: 4 additions & 0 deletions dvc/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ class DvcException(Exception):
"""Base class for all dvc exceptions."""


class InvalidArgumentError(ValueError, DvcException):
"""Thrown if arguments are invalid."""


class OutputDuplicationError(DvcException):
"""Thrown if a file/directory is specified as an output in more than one
stage.
Expand Down
21 changes: 16 additions & 5 deletions dvc/output/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -409,12 +409,23 @@ def get_used_cache(self, **kwargs):
cache.external[dep.repo_pair].add(dep.def_path)
return cache

if not self.info:
logger.warning(
"Output '{}'({}) is missing version info. Cache for it will "
"not be collected. Use `dvc repro` to get your pipeline up to "
"date.".format(self, self.stage)
if not self.checksum:
msg = (
"Output '{}'({}) is missing version info. "
"Cache for it will not be collected. "
"Use `dvc repro` to get your pipeline up to date.".format(
self, self.stage
)
)
if self.exists:
msg += (
"\n"
"You can also use `dvc commit {stage}` to associate "
"existing '{out}' with '{stage}'.".format(
out=self, stage=self.stage.relpath
)
)
logger.warning(msg)
return NamedCache()

ret = NamedCache.make(self.scheme, self.checksum, str(self))
Expand Down
Loading

0 comments on commit ffa8fe5

Please sign in to comment.