Update doc

PiperOrigin-RevId: 247115028
kv3n · May 7, 2019 · 37d2774 · 37d2774
1 parent fdbc06b
commit 37d2774
Show file tree

Hide file tree

Showing 50 changed files with 1,604 additions and 1,417 deletions.
diff --git a/docs/api_docs/python/_toc.yaml b/docs/api_docs/python/_toc.yaml
@@ -38,6 +38,10 @@ toc:
       path: /datasets/api_docs/python/tfds/core/get_tfds_path
     - title: lazy_imports
       path: /datasets/api_docs/python/tfds/core/lazy_imports
+    - title: Metadata
+      path: /datasets/api_docs/python/tfds/core/Metadata
+    - title: MetadataDict
+      path: /datasets/api_docs/python/tfds/core/MetadataDict
     - title: NamedSplit
       path: /datasets/api_docs/python/tfds/core/NamedSplit
     - title: SplitBase

diff --git a/docs/api_docs/python/index.md b/docs/api_docs/python/index.md
@@ -11,6 +11,8 @@
 *   <a href="./tfds/core/DatasetBuilder.md"><code>tfds.core.DatasetBuilder</code></a>
 *   <a href="./tfds/core/DatasetInfo.md"><code>tfds.core.DatasetInfo</code></a>
 *   <a href="./tfds/core/GeneratorBasedBuilder.md"><code>tfds.core.GeneratorBasedBuilder</code></a>
+*   <a href="./tfds/core/Metadata.md"><code>tfds.core.Metadata</code></a>
+*   <a href="./tfds/core/MetadataDict.md"><code>tfds.core.MetadataDict</code></a>
 *   <a href="./tfds/core/NamedSplit.md"><code>tfds.core.NamedSplit</code></a>
 *   <a href="./tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>
 *   <a href="./tfds/core/SplitDict.md"><code>tfds.core.SplitDict</code></a>

diff --git a/docs/api_docs/python/tfds.md b/docs/api_docs/python/tfds.md
@@ -22,12 +22,7 @@ The main library entrypoints are:
 * <a href="./tfds/load.md"><code>tfds.load</code></a>: convenience method to construct a builder, download the data, and
   create an input pipeline, returning a `tf.data.Dataset`.
 
-Documentation:
 
-* These API docs
-* [Available datasets](https://github.com/tensorflow/datasets/tree/master/docs/datasets.md)
-* [Colab tutorial](https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/overview.ipynb)
-* [Add a dataset](https://github.com/tensorflow/datasets/tree/master/docs/add_dataset.md)
 
 ## Modules
 

diff --git a/docs/api_docs/python/tfds/_api_cache.json b/docs/api_docs/python/tfds/_api_cache.json
diff --git a/docs/api_docs/python/tfds/as_numpy.md b/docs/api_docs/python/tfds/as_numpy.md
@@ -25,10 +25,9 @@ and `tf.Tensor`s to iterables of NumPy arrays and NumPy arrays, respectively.
 
 #### Args:
 
-* <b>`dataset`</b>: a possibly nested structure of `tf.data.Dataset`s and/or
+*   <b>`dataset`</b>: a possibly nested structure of `tf.data.Dataset`s and/or
     `tf.Tensor`s.
-* <b>`graph`</b>: `tf.Graph`, optional, explicitly set the graph to use.
-
+*   <b>`graph`</b>: `tf.Graph`, optional, explicitly set the graph to use.
 
 #### Returns:
 

diff --git a/docs/api_docs/python/tfds/builder.md b/docs/api_docs/python/tfds/builder.md
@@ -24,19 +24,17 @@ Defined in [`core/registered.py`](https://github.com/tensorflow/datasets/tree/ma
 
 #### Args:
 
-* <b>`name`</b>: `str`, the registered name of the `DatasetBuilder` (the snake case
-    version of the class name). This can be either `"dataset_name"` or
-    `"dataset_name/config_name"` for datasets with `BuilderConfig`s.
-    As a convenience, this string may contain comma-separated keyword
-    arguments for the builder. For example `"foo_bar/a=True,b=3"` would use
-    the `FooBar` dataset passing the keyword arguments `a=True` and `b=3`
-    (for builders with configs, it would be `"foo_bar/zoo/a=True,b=3"` to
-    use the `"zoo"` config and pass to the builder keyword arguments `a=True`
-    and `b=3`).
-* <b>`**builder_init_kwargs`</b>: `dict` of keyword arguments passed to the
-    `DatasetBuilder`. These will override keyword arguments passed in `name`,
-    if any.
-
+*   <b>`name`</b>: `str`, the registered name of the `DatasetBuilder` (the snake
+    case version of the class name). This can be either `"dataset_name"` or
+    `"dataset_name/config_name"` for datasets with `BuilderConfig`s. As a
+    convenience, this string may contain comma-separated keyword arguments for
+    the builder. For example `"foo_bar/a=True,b=3"` would use the `FooBar`
+    dataset passing the keyword arguments `a=True` and `b=3` (for builders with
+    configs, it would be `"foo_bar/zoo/a=True,b=3"` to use the `"zoo"` config
+    and pass to the builder keyword arguments `a=True` and `b=3`).
+*   <b>`**builder_init_kwargs`</b>: `dict` of keyword arguments passed to the
+    `DatasetBuilder`. These will override keyword arguments passed in `name`, if
+    any.
 
 #### Returns:
 

diff --git a/docs/api_docs/python/tfds/core.md b/docs/api_docs/python/tfds/core.md
@@ -28,6 +28,13 @@ Defined in [`core/__init__.py`](https://github.com/tensorflow/datasets/tree/mast
 
 [`class NamedSplit`](../tfds/core/NamedSplit.md): Descriptor corresponding to a named split (train, test, ...).
 
+[`class Metadata`](../tfds/core/Metadata.md): Abstract base class for
+DatasetInfo metadata container.
+
+[`class MetadataDict`](../tfds/core/MetadataDict.md): A
+<a href="../tfds/core/Metadata.md"><code>tfds.core.Metadata</code></a> object
+that acts as a `dict`.
+
 [`class SplitBase`](../tfds/core/SplitBase.md): Abstract base class for Split compositionality.
 
 [`class SplitDict`](../tfds/core/SplitDict.md): Split info object.

diff --git a/docs/api_docs/python/tfds/core/BeamBasedBuilder.md b/docs/api_docs/python/tfds/core/BeamBasedBuilder.md
@@ -88,23 +88,21 @@ Callers must pass arguments as keyword arguments.
 
 #### Args:
 
-* <b>`split`</b>: <a href="../../tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>, which subset(s) of the data to read. If None
-    (default), returns all splits in a dict
-    `<key: tfds.Split, value: tf.data.Dataset>`.
-* <b>`batch_size`</b>: `int`, batch size. Note that variable-length features will
-    be 0-padded if `batch_size > 1`. Users that want more custom behavior
-    should use `batch_size=1` and use the `tf.data` API to construct a
-    custom pipeline. If `batch_size == -1`, will return feature
-    dictionaries of the whole dataset with `tf.Tensor`s instead of a
-    `tf.data.Dataset`.
-* <b>`shuffle_files`</b>: `bool`, whether to shuffle the input files.
-    Defaults to `True` if `split == tfds.Split.TRAIN` and `False` otherwise.
-* <b>`as_supervised`</b>: `bool`, if `True`, the returned `tf.data.Dataset`
+*   <b>`split`</b>:
+    <a href="../../tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>,
+    which subset(s) of the data to read. If None (default), returns all splits
+    in a dict `<key: tfds.Split, value: tf.data.Dataset>`.
+*   <b>`batch_size`</b>: `int`, batch size. Note that variable-length features
+    will be 0-padded if `batch_size > 1`. Users that want more custom behavior
+    should use `batch_size=1` and use the `tf.data` API to construct a custom
+    pipeline. If `batch_size == -1`, will return feature dictionaries of the
+    whole dataset with `tf.Tensor`s instead of a `tf.data.Dataset`.
+*   <b>`shuffle_files`</b>: `bool`, whether to shuffle the input files. Defaults
+    to `True` if `split == tfds.Split.TRAIN` and `False` otherwise.
+*   <b>`as_supervised`</b>: `bool`, if `True`, the returned `tf.data.Dataset`
     will have a 2-tuple structure `(input, label)` according to
-    `builder.info.supervised_keys`. If `False`, the default,
-    the returned `tf.data.Dataset` will have a dictionary with all the
-    features.
-
+    `builder.info.supervised_keys`. If `False`, the default, the returned
+    `tf.data.Dataset` will have a dictionary with all the features.
 
 #### Returns:
 
@@ -127,11 +125,11 @@ Downloads and prepares dataset for reading.
 
 #### Args:
 
-* <b>`download_dir`</b>: `str`, directory where downloaded files are stored.
+*   <b>`download_dir`</b>: `str`, directory where downloaded files are stored.
     Defaults to "~/tensorflow-datasets/downloads".
-* <b>`download_config`</b>: <a href="../../tfds/download/DownloadConfig.md"><code>tfds.download.DownloadConfig</code></a>, further configuration for
-    downloading and preparing dataset.
-
+*   <b>`download_config`</b>:
+    <a href="../../tfds/download/DownloadConfig.md"><code>tfds.download.DownloadConfig</code></a>,
+    further configuration for downloading and preparing dataset.
 
 #### Raises:
 

diff --git a/docs/api_docs/python/tfds/core/DatasetBuilder.md b/docs/api_docs/python/tfds/core/DatasetBuilder.md
@@ -119,23 +119,21 @@ Callers must pass arguments as keyword arguments.
 
 #### Args:
 
-* <b>`split`</b>: <a href="../../tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>, which subset(s) of the data to read. If None
-    (default), returns all splits in a dict
-    `<key: tfds.Split, value: tf.data.Dataset>`.
-* <b>`batch_size`</b>: `int`, batch size. Note that variable-length features will
-    be 0-padded if `batch_size > 1`. Users that want more custom behavior
-    should use `batch_size=1` and use the `tf.data` API to construct a
-    custom pipeline. If `batch_size == -1`, will return feature
-    dictionaries of the whole dataset with `tf.Tensor`s instead of a
-    `tf.data.Dataset`.
-* <b>`shuffle_files`</b>: `bool`, whether to shuffle the input files.
-    Defaults to `True` if `split == tfds.Split.TRAIN` and `False` otherwise.
-* <b>`as_supervised`</b>: `bool`, if `True`, the returned `tf.data.Dataset`
+*   <b>`split`</b>:
+    <a href="../../tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>,
+    which subset(s) of the data to read. If None (default), returns all splits
+    in a dict `<key: tfds.Split, value: tf.data.Dataset>`.
+*   <b>`batch_size`</b>: `int`, batch size. Note that variable-length features
+    will be 0-padded if `batch_size > 1`. Users that want more custom behavior
+    should use `batch_size=1` and use the `tf.data` API to construct a custom
+    pipeline. If `batch_size == -1`, will return feature dictionaries of the
+    whole dataset with `tf.Tensor`s instead of a `tf.data.Dataset`.
+*   <b>`shuffle_files`</b>: `bool`, whether to shuffle the input files. Defaults
+    to `True` if `split == tfds.Split.TRAIN` and `False` otherwise.
+*   <b>`as_supervised`</b>: `bool`, if `True`, the returned `tf.data.Dataset`
     will have a 2-tuple structure `(input, label)` according to
-    `builder.info.supervised_keys`. If `False`, the default,
-    the returned `tf.data.Dataset` will have a dictionary with all the
-    features.
-
+    `builder.info.supervised_keys`. If `False`, the default, the returned
+    `tf.data.Dataset` will have a dictionary with all the features.
 
 #### Returns:
 
@@ -158,11 +156,11 @@ Downloads and prepares dataset for reading.
 
 #### Args:
 
-* <b>`download_dir`</b>: `str`, directory where downloaded files are stored.
+*   <b>`download_dir`</b>: `str`, directory where downloaded files are stored.
     Defaults to "~/tensorflow-datasets/downloads".
-* <b>`download_config`</b>: <a href="../../tfds/download/DownloadConfig.md"><code>tfds.download.DownloadConfig</code></a>, further configuration for
-    downloading and preparing dataset.
-
+*   <b>`download_config`</b>:
+    <a href="../../tfds/download/DownloadConfig.md"><code>tfds.download.DownloadConfig</code></a>,
+    further configuration for downloading and preparing dataset.
 
 #### Raises:
 

diff --git a/docs/api_docs/python/tfds/core/DatasetInfo.md b/docs/api_docs/python/tfds/core/DatasetInfo.md
@@ -8,6 +8,7 @@
 <meta itemprop="property" content="features"/>
 <meta itemprop="property" content="full_name"/>
 <meta itemprop="property" content="initialized"/>
+<meta itemprop="property" content="metadata"/>
 <meta itemprop="property" content="name"/>
 <meta itemprop="property" content="redistribution_info"/>
 <meta itemprop="property" content="size_in_bytes"/>
@@ -43,14 +44,15 @@ split is typically updated during data generation (i.e. on calling
 
 <h2 id="__init__"><code>__init__</code></h2>
 
-``` python
+```python
 __init__(
     builder,
     description=None,
     features=None,
     supervised_keys=None,
     urls=None,
     citation=None,
+    metadata=None,
     redistribution_info=None
 )
 ```
@@ -59,21 +61,24 @@ Constructs DatasetInfo.
 
 #### Args:
 
-* <b>`builder`</b>: `DatasetBuilder`, dataset builder for this info.
-* <b>`description`</b>: `str`, description of this dataset.
-* <b>`features`</b>: <a href="../../tfds/features/FeaturesDict.md"><code>tfds.features.FeaturesDict</code></a>, Information on the feature dict
-    of the `tf.data.Dataset()` object from the `builder.as_dataset()`
-    method.
-* <b>`supervised_keys`</b>: `tuple`, Specifies the input feature and the label for
-    supervised learning, if applicable for the dataset.
-* <b>`urls`</b>: `list(str)`, optional, the homepage(s) for this dataset.
-* <b>`citation`</b>: `str`, optional, the citation to use for this dataset.
-* <b>`redistribution_info`</b>: `dict`, optional, information needed for
-    redistribution, as specified in `dataset_info_pb2.RedistributionInfo`.
-    The content of the `license` subfield will automatically be written to a
-    LICENSE file stored with the dataset.
-
-
+*   <b>`builder`</b>: `DatasetBuilder`, dataset builder for this info.
+*   <b>`description`</b>: `str`, description of this dataset.
+*   <b>`features`</b>:
+    <a href="../../tfds/features/FeaturesDict.md"><code>tfds.features.FeaturesDict</code></a>,
+    Information on the feature dict of the `tf.data.Dataset()` object from the
+    `builder.as_dataset()` method.
+*   <b>`supervised_keys`</b>: `tuple`, Specifies the input feature and the label
+    for supervised learning, if applicable for the dataset.
+*   <b>`urls`</b>: `list(str)`, optional, the homepage(s) for this dataset.
+*   <b>`citation`</b>: `str`, optional, the citation to use for this dataset.
+*   <b>`metadata`</b>:
+    <a href="../../tfds/core/Metadata.md"><code>tfds.core.Metadata</code></a>,
+    additonal object which will be stored/restored with the dataset. This allows
+    for storing additional information with the dataset.
+*   <b>`redistribution_info`</b>: `dict`, optional, information needed for
+    redistribution, as specified in `dataset_info_pb2.RedistributionInfo`. The
+    content of the `license` subfield will automatically be written to a LICENSE
+    file stored with the dataset.
 
 ## Properties
 
@@ -105,6 +110,8 @@ Full canonical name: (<dataset_name>/<config_name>/<version>).
 
 Whether DatasetInfo has been fully initialized.
 
+<h3 id="metadata"><code>metadata</code></h3>
+
 <h3 id="name"><code>name</code></h3>
 
 
@@ -168,8 +175,8 @@ This will overwrite all previous metadata.
 
 #### Args:
 
-* <b>`dataset_info_dir`</b>: `str` The directory containing the metadata file. This
-    should be the root directory of a specific dataset version.
+*   <b>`dataset_info_dir`</b>: `str` The directory containing the metadata file.
+    This should be the root directory of a specific dataset version.
 
 <h3 id="update_splits_if_different"><code>update_splits_if_different</code></h3>
 

diff --git a/docs/api_docs/python/tfds/core/GeneratorBasedBuilder.md b/docs/api_docs/python/tfds/core/GeneratorBasedBuilder.md
@@ -97,23 +97,21 @@ Callers must pass arguments as keyword arguments.
 
 #### Args:
 
-* <b>`split`</b>: <a href="../../tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>, which subset(s) of the data to read. If None
-    (default), returns all splits in a dict
-    `<key: tfds.Split, value: tf.data.Dataset>`.
-* <b>`batch_size`</b>: `int`, batch size. Note that variable-length features will
-    be 0-padded if `batch_size > 1`. Users that want more custom behavior
-    should use `batch_size=1` and use the `tf.data` API to construct a
-    custom pipeline. If `batch_size == -1`, will return feature
-    dictionaries of the whole dataset with `tf.Tensor`s instead of a
-    `tf.data.Dataset`.
-* <b>`shuffle_files`</b>: `bool`, whether to shuffle the input files.
-    Defaults to `True` if `split == tfds.Split.TRAIN` and `False` otherwise.
-* <b>`as_supervised`</b>: `bool`, if `True`, the returned `tf.data.Dataset`
+*   <b>`split`</b>:
+    <a href="../../tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>,
+    which subset(s) of the data to read. If None (default), returns all splits
+    in a dict `<key: tfds.Split, value: tf.data.Dataset>`.
+*   <b>`batch_size`</b>: `int`, batch size. Note that variable-length features
+    will be 0-padded if `batch_size > 1`. Users that want more custom behavior
+    should use `batch_size=1` and use the `tf.data` API to construct a custom
+    pipeline. If `batch_size == -1`, will return feature dictionaries of the
+    whole dataset with `tf.Tensor`s instead of a `tf.data.Dataset`.
+*   <b>`shuffle_files`</b>: `bool`, whether to shuffle the input files. Defaults
+    to `True` if `split == tfds.Split.TRAIN` and `False` otherwise.
+*   <b>`as_supervised`</b>: `bool`, if `True`, the returned `tf.data.Dataset`
     will have a 2-tuple structure `(input, label)` according to
-    `builder.info.supervised_keys`. If `False`, the default,
-    the returned `tf.data.Dataset` will have a dictionary with all the
-    features.
-
+    `builder.info.supervised_keys`. If `False`, the default, the returned
+    `tf.data.Dataset` will have a dictionary with all the features.
 
 #### Returns:
 
@@ -136,11 +134,11 @@ Downloads and prepares dataset for reading.
 
 #### Args:
 
-* <b>`download_dir`</b>: `str`, directory where downloaded files are stored.
+*   <b>`download_dir`</b>: `str`, directory where downloaded files are stored.
     Defaults to "~/tensorflow-datasets/downloads".
-* <b>`download_config`</b>: <a href="../../tfds/download/DownloadConfig.md"><code>tfds.download.DownloadConfig</code></a>, further configuration for
-    downloading and preparing dataset.
-
+*   <b>`download_config`</b>:
+    <a href="../../tfds/download/DownloadConfig.md"><code>tfds.download.DownloadConfig</code></a>,
+    further configuration for downloading and preparing dataset.
 
 #### Raises:
 

diff --git a/docs/api_docs/python/tfds/core/Metadata.md b/docs/api_docs/python/tfds/core/Metadata.md
@@ -0,0 +1,46 @@
+<div itemscope itemtype="http://developers.google.com/ReferenceObject">
+<meta itemprop="name" content="tfds.core.Metadata" />
+<meta itemprop="path" content="Stable" />
+<meta itemprop="property" content="load_metadata"/>
+<meta itemprop="property" content="save_metadata"/>
+</div>
+
+# tfds.core.Metadata
+
+## Class `Metadata`
+
+Abstract base class for DatasetInfo metadata container.
+
+Defined in
+[`core/dataset_info.py`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/core/dataset_info.py).
+
+<!-- Placeholder for "Used in" -->
+
+`builder.info.metadata` allows the dataset to expose additional general
+information about the dataset which are not specific to a feature or individual
+example.
+
+To implement the interface, overwrite `save_metadata` and `load_metadata`.
+
+See
+<a href="../../tfds/core/MetadataDict.md"><code>tfds.core.MetadataDict</code></a>
+for a simple implementation that acts as a dict that saves data to/from a JSON
+file.
+
+## Methods
+
+<h3 id="load_metadata"><code>load_metadata</code></h3>
+
+```python
+load_metadata(data_dir)
+```
+
+Restore the metadata.
+
+<h3 id="save_metadata"><code>save_metadata</code></h3>
+
+```python
+save_metadata(data_dir)
+```
+
+Save the metadata.