Skip to content

Commit

Permalink
Update doc
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 247115028
  • Loading branch information
Conchylicultor authored and copybara-github committed May 7, 2019
1 parent fdbc06b commit 37d2774
Show file tree
Hide file tree
Showing 50 changed files with 1,604 additions and 1,417 deletions.
4 changes: 4 additions & 0 deletions docs/api_docs/python/_toc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ toc:
path: /datasets/api_docs/python/tfds/core/get_tfds_path
- title: lazy_imports
path: /datasets/api_docs/python/tfds/core/lazy_imports
- title: Metadata
path: /datasets/api_docs/python/tfds/core/Metadata
- title: MetadataDict
path: /datasets/api_docs/python/tfds/core/MetadataDict
- title: NamedSplit
path: /datasets/api_docs/python/tfds/core/NamedSplit
- title: SplitBase
Expand Down
2 changes: 2 additions & 0 deletions docs/api_docs/python/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
* <a href="./tfds/core/DatasetBuilder.md"><code>tfds.core.DatasetBuilder</code></a>
* <a href="./tfds/core/DatasetInfo.md"><code>tfds.core.DatasetInfo</code></a>
* <a href="./tfds/core/GeneratorBasedBuilder.md"><code>tfds.core.GeneratorBasedBuilder</code></a>
* <a href="./tfds/core/Metadata.md"><code>tfds.core.Metadata</code></a>
* <a href="./tfds/core/MetadataDict.md"><code>tfds.core.MetadataDict</code></a>
* <a href="./tfds/core/NamedSplit.md"><code>tfds.core.NamedSplit</code></a>
* <a href="./tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>
* <a href="./tfds/core/SplitDict.md"><code>tfds.core.SplitDict</code></a>
Expand Down
5 changes: 0 additions & 5 deletions docs/api_docs/python/tfds.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,7 @@ The main library entrypoints are:
* <a href="./tfds/load.md"><code>tfds.load</code></a>: convenience method to construct a builder, download the data, and
create an input pipeline, returning a `tf.data.Dataset`.

Documentation:

* These API docs
* [Available datasets](https://github.com/tensorflow/datasets/tree/master/docs/datasets.md)
* [Colab tutorial](https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/overview.ipynb)
* [Add a dataset](https://github.com/tensorflow/datasets/tree/master/docs/add_dataset.md)

## Modules

Expand Down
389 changes: 193 additions & 196 deletions docs/api_docs/python/tfds/_api_cache.json

Large diffs are not rendered by default.

5 changes: 2 additions & 3 deletions docs/api_docs/python/tfds/as_numpy.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,9 @@ and `tf.Tensor`s to iterables of NumPy arrays and NumPy arrays, respectively.

#### Args:

* <b>`dataset`</b>: a possibly nested structure of `tf.data.Dataset`s and/or
* <b>`dataset`</b>: a possibly nested structure of `tf.data.Dataset`s and/or
`tf.Tensor`s.
* <b>`graph`</b>: `tf.Graph`, optional, explicitly set the graph to use.

* <b>`graph`</b>: `tf.Graph`, optional, explicitly set the graph to use.

#### Returns:

Expand Down
24 changes: 11 additions & 13 deletions docs/api_docs/python/tfds/builder.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,19 +24,17 @@ Defined in [`core/registered.py`](https://github.com/tensorflow/datasets/tree/ma

#### Args:

* <b>`name`</b>: `str`, the registered name of the `DatasetBuilder` (the snake case
version of the class name). This can be either `"dataset_name"` or
`"dataset_name/config_name"` for datasets with `BuilderConfig`s.
As a convenience, this string may contain comma-separated keyword
arguments for the builder. For example `"foo_bar/a=True,b=3"` would use
the `FooBar` dataset passing the keyword arguments `a=True` and `b=3`
(for builders with configs, it would be `"foo_bar/zoo/a=True,b=3"` to
use the `"zoo"` config and pass to the builder keyword arguments `a=True`
and `b=3`).
* <b>`**builder_init_kwargs`</b>: `dict` of keyword arguments passed to the
`DatasetBuilder`. These will override keyword arguments passed in `name`,
if any.

* <b>`name`</b>: `str`, the registered name of the `DatasetBuilder` (the snake
case version of the class name). This can be either `"dataset_name"` or
`"dataset_name/config_name"` for datasets with `BuilderConfig`s. As a
convenience, this string may contain comma-separated keyword arguments for
the builder. For example `"foo_bar/a=True,b=3"` would use the `FooBar`
dataset passing the keyword arguments `a=True` and `b=3` (for builders with
configs, it would be `"foo_bar/zoo/a=True,b=3"` to use the `"zoo"` config
and pass to the builder keyword arguments `a=True` and `b=3`).
* <b>`**builder_init_kwargs`</b>: `dict` of keyword arguments passed to the
`DatasetBuilder`. These will override keyword arguments passed in `name`, if
any.

#### Returns:

Expand Down
7 changes: 7 additions & 0 deletions docs/api_docs/python/tfds/core.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,13 @@ Defined in [`core/__init__.py`](https://github.com/tensorflow/datasets/tree/mast

[`class NamedSplit`](../tfds/core/NamedSplit.md): Descriptor corresponding to a named split (train, test, ...).

[`class Metadata`](../tfds/core/Metadata.md): Abstract base class for
DatasetInfo metadata container.

[`class MetadataDict`](../tfds/core/MetadataDict.md): A
<a href="../tfds/core/Metadata.md"><code>tfds.core.Metadata</code></a> object
that acts as a `dict`.

[`class SplitBase`](../tfds/core/SplitBase.md): Abstract base class for Split compositionality.

[`class SplitDict`](../tfds/core/SplitDict.md): Split info object.
Expand Down
38 changes: 18 additions & 20 deletions docs/api_docs/python/tfds/core/BeamBasedBuilder.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,23 +88,21 @@ Callers must pass arguments as keyword arguments.

#### Args:

* <b>`split`</b>: <a href="../../tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>, which subset(s) of the data to read. If None
(default), returns all splits in a dict
`<key: tfds.Split, value: tf.data.Dataset>`.
* <b>`batch_size`</b>: `int`, batch size. Note that variable-length features will
be 0-padded if `batch_size > 1`. Users that want more custom behavior
should use `batch_size=1` and use the `tf.data` API to construct a
custom pipeline. If `batch_size == -1`, will return feature
dictionaries of the whole dataset with `tf.Tensor`s instead of a
`tf.data.Dataset`.
* <b>`shuffle_files`</b>: `bool`, whether to shuffle the input files.
Defaults to `True` if `split == tfds.Split.TRAIN` and `False` otherwise.
* <b>`as_supervised`</b>: `bool`, if `True`, the returned `tf.data.Dataset`
* <b>`split`</b>:
<a href="../../tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>,
which subset(s) of the data to read. If None (default), returns all splits
in a dict `<key: tfds.Split, value: tf.data.Dataset>`.
* <b>`batch_size`</b>: `int`, batch size. Note that variable-length features
will be 0-padded if `batch_size > 1`. Users that want more custom behavior
should use `batch_size=1` and use the `tf.data` API to construct a custom
pipeline. If `batch_size == -1`, will return feature dictionaries of the
whole dataset with `tf.Tensor`s instead of a `tf.data.Dataset`.
* <b>`shuffle_files`</b>: `bool`, whether to shuffle the input files. Defaults
to `True` if `split == tfds.Split.TRAIN` and `False` otherwise.
* <b>`as_supervised`</b>: `bool`, if `True`, the returned `tf.data.Dataset`
will have a 2-tuple structure `(input, label)` according to
`builder.info.supervised_keys`. If `False`, the default,
the returned `tf.data.Dataset` will have a dictionary with all the
features.

`builder.info.supervised_keys`. If `False`, the default, the returned
`tf.data.Dataset` will have a dictionary with all the features.

#### Returns:

Expand All @@ -127,11 +125,11 @@ Downloads and prepares dataset for reading.

#### Args:

* <b>`download_dir`</b>: `str`, directory where downloaded files are stored.
* <b>`download_dir`</b>: `str`, directory where downloaded files are stored.
Defaults to "~/tensorflow-datasets/downloads".
* <b>`download_config`</b>: <a href="../../tfds/download/DownloadConfig.md"><code>tfds.download.DownloadConfig</code></a>, further configuration for
downloading and preparing dataset.

* <b>`download_config`</b>:
<a href="../../tfds/download/DownloadConfig.md"><code>tfds.download.DownloadConfig</code></a>,
further configuration for downloading and preparing dataset.

#### Raises:

Expand Down
38 changes: 18 additions & 20 deletions docs/api_docs/python/tfds/core/DatasetBuilder.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,23 +119,21 @@ Callers must pass arguments as keyword arguments.

#### Args:

* <b>`split`</b>: <a href="../../tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>, which subset(s) of the data to read. If None
(default), returns all splits in a dict
`<key: tfds.Split, value: tf.data.Dataset>`.
* <b>`batch_size`</b>: `int`, batch size. Note that variable-length features will
be 0-padded if `batch_size > 1`. Users that want more custom behavior
should use `batch_size=1` and use the `tf.data` API to construct a
custom pipeline. If `batch_size == -1`, will return feature
dictionaries of the whole dataset with `tf.Tensor`s instead of a
`tf.data.Dataset`.
* <b>`shuffle_files`</b>: `bool`, whether to shuffle the input files.
Defaults to `True` if `split == tfds.Split.TRAIN` and `False` otherwise.
* <b>`as_supervised`</b>: `bool`, if `True`, the returned `tf.data.Dataset`
* <b>`split`</b>:
<a href="../../tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>,
which subset(s) of the data to read. If None (default), returns all splits
in a dict `<key: tfds.Split, value: tf.data.Dataset>`.
* <b>`batch_size`</b>: `int`, batch size. Note that variable-length features
will be 0-padded if `batch_size > 1`. Users that want more custom behavior
should use `batch_size=1` and use the `tf.data` API to construct a custom
pipeline. If `batch_size == -1`, will return feature dictionaries of the
whole dataset with `tf.Tensor`s instead of a `tf.data.Dataset`.
* <b>`shuffle_files`</b>: `bool`, whether to shuffle the input files. Defaults
to `True` if `split == tfds.Split.TRAIN` and `False` otherwise.
* <b>`as_supervised`</b>: `bool`, if `True`, the returned `tf.data.Dataset`
will have a 2-tuple structure `(input, label)` according to
`builder.info.supervised_keys`. If `False`, the default,
the returned `tf.data.Dataset` will have a dictionary with all the
features.

`builder.info.supervised_keys`. If `False`, the default, the returned
`tf.data.Dataset` will have a dictionary with all the features.

#### Returns:

Expand All @@ -158,11 +156,11 @@ Downloads and prepares dataset for reading.

#### Args:

* <b>`download_dir`</b>: `str`, directory where downloaded files are stored.
* <b>`download_dir`</b>: `str`, directory where downloaded files are stored.
Defaults to "~/tensorflow-datasets/downloads".
* <b>`download_config`</b>: <a href="../../tfds/download/DownloadConfig.md"><code>tfds.download.DownloadConfig</code></a>, further configuration for
downloading and preparing dataset.

* <b>`download_config`</b>:
<a href="../../tfds/download/DownloadConfig.md"><code>tfds.download.DownloadConfig</code></a>,
further configuration for downloading and preparing dataset.

#### Raises:

Expand Down
43 changes: 25 additions & 18 deletions docs/api_docs/python/tfds/core/DatasetInfo.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
<meta itemprop="property" content="features"/>
<meta itemprop="property" content="full_name"/>
<meta itemprop="property" content="initialized"/>
<meta itemprop="property" content="metadata"/>
<meta itemprop="property" content="name"/>
<meta itemprop="property" content="redistribution_info"/>
<meta itemprop="property" content="size_in_bytes"/>
Expand Down Expand Up @@ -43,14 +44,15 @@ split is typically updated during data generation (i.e. on calling

<h2 id="__init__"><code>__init__</code></h2>

``` python
```python
__init__(
builder,
description=None,
features=None,
supervised_keys=None,
urls=None,
citation=None,
metadata=None,
redistribution_info=None
)
```
Expand All @@ -59,21 +61,24 @@ Constructs DatasetInfo.

#### Args:

* <b>`builder`</b>: `DatasetBuilder`, dataset builder for this info.
* <b>`description`</b>: `str`, description of this dataset.
* <b>`features`</b>: <a href="../../tfds/features/FeaturesDict.md"><code>tfds.features.FeaturesDict</code></a>, Information on the feature dict
of the `tf.data.Dataset()` object from the `builder.as_dataset()`
method.
* <b>`supervised_keys`</b>: `tuple`, Specifies the input feature and the label for
supervised learning, if applicable for the dataset.
* <b>`urls`</b>: `list(str)`, optional, the homepage(s) for this dataset.
* <b>`citation`</b>: `str`, optional, the citation to use for this dataset.
* <b>`redistribution_info`</b>: `dict`, optional, information needed for
redistribution, as specified in `dataset_info_pb2.RedistributionInfo`.
The content of the `license` subfield will automatically be written to a
LICENSE file stored with the dataset.


* <b>`builder`</b>: `DatasetBuilder`, dataset builder for this info.
* <b>`description`</b>: `str`, description of this dataset.
* <b>`features`</b>:
<a href="../../tfds/features/FeaturesDict.md"><code>tfds.features.FeaturesDict</code></a>,
Information on the feature dict of the `tf.data.Dataset()` object from the
`builder.as_dataset()` method.
* <b>`supervised_keys`</b>: `tuple`, Specifies the input feature and the label
for supervised learning, if applicable for the dataset.
* <b>`urls`</b>: `list(str)`, optional, the homepage(s) for this dataset.
* <b>`citation`</b>: `str`, optional, the citation to use for this dataset.
* <b>`metadata`</b>:
<a href="../../tfds/core/Metadata.md"><code>tfds.core.Metadata</code></a>,
additonal object which will be stored/restored with the dataset. This allows
for storing additional information with the dataset.
* <b>`redistribution_info`</b>: `dict`, optional, information needed for
redistribution, as specified in `dataset_info_pb2.RedistributionInfo`. The
content of the `license` subfield will automatically be written to a LICENSE
file stored with the dataset.

## Properties

Expand Down Expand Up @@ -105,6 +110,8 @@ Full canonical name: (<dataset_name>/<config_name>/<version>).

Whether DatasetInfo has been fully initialized.

<h3 id="metadata"><code>metadata</code></h3>

<h3 id="name"><code>name</code></h3>


Expand Down Expand Up @@ -168,8 +175,8 @@ This will overwrite all previous metadata.

#### Args:

* <b>`dataset_info_dir`</b>: `str` The directory containing the metadata file. This
should be the root directory of a specific dataset version.
* <b>`dataset_info_dir`</b>: `str` The directory containing the metadata file.
This should be the root directory of a specific dataset version.

<h3 id="update_splits_if_different"><code>update_splits_if_different</code></h3>

Expand Down
38 changes: 18 additions & 20 deletions docs/api_docs/python/tfds/core/GeneratorBasedBuilder.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,23 +97,21 @@ Callers must pass arguments as keyword arguments.

#### Args:

* <b>`split`</b>: <a href="../../tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>, which subset(s) of the data to read. If None
(default), returns all splits in a dict
`<key: tfds.Split, value: tf.data.Dataset>`.
* <b>`batch_size`</b>: `int`, batch size. Note that variable-length features will
be 0-padded if `batch_size > 1`. Users that want more custom behavior
should use `batch_size=1` and use the `tf.data` API to construct a
custom pipeline. If `batch_size == -1`, will return feature
dictionaries of the whole dataset with `tf.Tensor`s instead of a
`tf.data.Dataset`.
* <b>`shuffle_files`</b>: `bool`, whether to shuffle the input files.
Defaults to `True` if `split == tfds.Split.TRAIN` and `False` otherwise.
* <b>`as_supervised`</b>: `bool`, if `True`, the returned `tf.data.Dataset`
* <b>`split`</b>:
<a href="../../tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>,
which subset(s) of the data to read. If None (default), returns all splits
in a dict `<key: tfds.Split, value: tf.data.Dataset>`.
* <b>`batch_size`</b>: `int`, batch size. Note that variable-length features
will be 0-padded if `batch_size > 1`. Users that want more custom behavior
should use `batch_size=1` and use the `tf.data` API to construct a custom
pipeline. If `batch_size == -1`, will return feature dictionaries of the
whole dataset with `tf.Tensor`s instead of a `tf.data.Dataset`.
* <b>`shuffle_files`</b>: `bool`, whether to shuffle the input files. Defaults
to `True` if `split == tfds.Split.TRAIN` and `False` otherwise.
* <b>`as_supervised`</b>: `bool`, if `True`, the returned `tf.data.Dataset`
will have a 2-tuple structure `(input, label)` according to
`builder.info.supervised_keys`. If `False`, the default,
the returned `tf.data.Dataset` will have a dictionary with all the
features.

`builder.info.supervised_keys`. If `False`, the default, the returned
`tf.data.Dataset` will have a dictionary with all the features.

#### Returns:

Expand All @@ -136,11 +134,11 @@ Downloads and prepares dataset for reading.

#### Args:

* <b>`download_dir`</b>: `str`, directory where downloaded files are stored.
* <b>`download_dir`</b>: `str`, directory where downloaded files are stored.
Defaults to "~/tensorflow-datasets/downloads".
* <b>`download_config`</b>: <a href="../../tfds/download/DownloadConfig.md"><code>tfds.download.DownloadConfig</code></a>, further configuration for
downloading and preparing dataset.

* <b>`download_config`</b>:
<a href="../../tfds/download/DownloadConfig.md"><code>tfds.download.DownloadConfig</code></a>,
further configuration for downloading and preparing dataset.

#### Raises:

Expand Down
46 changes: 46 additions & 0 deletions docs/api_docs/python/tfds/core/Metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
<meta itemprop="name" content="tfds.core.Metadata" />
<meta itemprop="path" content="Stable" />
<meta itemprop="property" content="load_metadata"/>
<meta itemprop="property" content="save_metadata"/>
</div>

# tfds.core.Metadata

## Class `Metadata`

Abstract base class for DatasetInfo metadata container.

Defined in
[`core/dataset_info.py`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/core/dataset_info.py).

<!-- Placeholder for "Used in" -->

`builder.info.metadata` allows the dataset to expose additional general
information about the dataset which are not specific to a feature or individual
example.

To implement the interface, overwrite `save_metadata` and `load_metadata`.

See
<a href="../../tfds/core/MetadataDict.md"><code>tfds.core.MetadataDict</code></a>
for a simple implementation that acts as a dict that saves data to/from a JSON
file.

## Methods

<h3 id="load_metadata"><code>load_metadata</code></h3>

```python
load_metadata(data_dir)
```

Restore the metadata.

<h3 id="save_metadata"><code>save_metadata</code></h3>

```python
save_metadata(data_dir)
```

Save the metadata.
Loading

0 comments on commit 37d2774

Please sign in to comment.