Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api ref: better explanation on disc and memory usage for read/open #1037

Merged
merged 9 commits into from
Mar 9, 2020
Prev Previous commit
Next Next commit
api ref: more impros to desc and examples of open and read
jorgeorpinel committed Mar 9, 2020
commit 958939b724638a5373e0f6910ced2bee7942a925
18 changes: 9 additions & 9 deletions public/static/docs/api-reference/open.md
Original file line number Diff line number Diff line change
@@ -39,14 +39,14 @@ file can be tracked by DVC or by Git.
[context manager](https://www.python.org/dev/peps/pep-0343/#context-managers-in-the-standard-library)
(using the `with` keyword, as shown in the examples).

> Use `dvc.api.read()` to get the complete file contents in a single function
> call – no _context manager_ involved.

This function makes a direct connection to the
[remote storage](/doc/command-reference/remote/add#supported-storage-types)
(except for Google Drive), so the file contents can be streamed as they are
downloaded. No disc space and very little memory are needed to save the file
before making it accessible.
(except for Google Drive), so the file contents can be streamed. Your code can
process the data [buffer](https://docs.python.org/3/c-api/buffer.html) as it's
streamed, which optimizes memory usage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


> Use `dvc.api.read()` to load the complete file contents in a single function
> call – no _context manager_ involved. Neither function utilizes disc space.

## Parameters

@@ -90,9 +90,9 @@ before making it accessible.

## Example: Use data or models from DVC repositories

Any <abbr>data artifact</abbr> hosted online can be employed directly in your
Python app (no disc space needed) with this API. For example, an XML file
tracked in a public DVC repo on Github can be processed like this:
Any <abbr>data artifact</abbr> hosted online can be processed directly in your
Python app with this API. For example, an XML file tracked in a public DVC repo
on Github can be processed like this:

```py
from xml.sax import parse
14 changes: 6 additions & 8 deletions public/static/docs/api-reference/read.md
Original file line number Diff line number Diff line change
@@ -28,19 +28,17 @@ This function wraps [`dvc.api.open()`](/doc/api-reference/open), for a simple
way to return the complete contents of a file tracked in a <abbr>DVC
project</abbr>. The file can be tracked by DVC or by Git.

> This is similar to the `dvc get` command in our CLI.

The returned contents can be a
[string](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)
or a [bytearray](https://docs.python.org/3/library/stdtypes.html#bytearray).
These are loaded to memory directly (without using any disc space).

> The type returned depends on the `mode` used. For more details, please refer
> to Python's [`open()`](https://docs.python.org/3/library/functions.html#open)
> built-in, which is used under the hood.

> This is similar to the `dvc get` command in our CLI.

No disc space is needed to save the file before loading it to memory in order to
make the file accessible.

## Parameters

- **`path`** - location and file name of the file in `repo`, relative to the
@@ -83,9 +81,9 @@ make the file accessible.

## Example: Load data from a DVC repository

Any <abbr>data artifact</abbr> hosted online can be employed directly in your
Python app (no disc space needed) with this API. For example, let's say that you
want to load and unserialize a binary model from a repo on Github:
Any <abbr>data artifact</abbr> hosted online can be loaded directly in your
Python app with this API. For example, let's say that you want to load and
unserialize a binary model from a repo on Github:

```py
import pickle