Update documentation for display on tensorflow.org/hub. Link to the new

pages from the README.md to prevent duplication. PiperOrigin-RevId: 189780676
peizhongzhe · Mar 21, 2018 · 0c4102d · 0c4102d
1 parent 13fbd9d
commit 0c4102d
Show file tree

Hide file tree

Showing 16 changed files with 323 additions and 271 deletions.
diff --git a/README.md b/README.md
@@ -16,243 +16,40 @@ limitations under the License.
 # TensorFlow Hub
 
 TensorFlow Hub is a library to foster the publication, discovery, and
-consumption of reusable parts of machine learning models. A **module** is a
-self-contained piece of a TensorFlow graph, along with its weights and assets,
-that can be reused across different tasks.
-
-Typically, modules contain variables that have been pre-trained for a task using
-a large dataset. By reusing a module on a related or similar task, a user can
-train a model with a smaller dataset, improve generalization, or simply speed up
-training.
-
-Modules can be instantiated from a URL or filesystem path while a TensorFlow
-graph is being constructed. It can then be *applied* like an ordinary Python
-function to build part of the graph. For example:
-
-```python
-import tensorflow as tf
-import tensorflow_hub as hub
-
-with tf.Graph().as_default():
-  # Download a 128-dimension English embedding.
-  embed = hub.Module("https://storage.googleapis.com/tfhub-test-modules/google/text/nnlm-en-dim128-with-normalization/1.tar.gz")
-
-  # Use the module to map an array of strings to their embeddings.
-  embeddings = embed([
-      "A long sentence.",
-      "single-word",
-      "http://example-url.com"])
-
-  with tf.Session() as sess:
-    sess.run(tf.global_variables_initializer())
-    sess.run(tf.tables_initializer())
-
-    print(sess.run(embeddings))
-```
-
-Each module has a defined interface that allows it to be used in a replaceable
-way, with little or no knowledge of its internals. Once exported to disk, a
-module is self-contained and can be used by others without access to the code
-and data used to create and train it.
-
-
-## Installation
-
-Currently, TensorFlow Hub depends on bug fixes and enhancements not present in a
-stable TensorFlow release. For now, please
-[install or upgrade](https://www.tensorflow.org/install/)
-TensorFlow package past 1.7.0rc0. For instance:
-
-```bash
-$ pip install --upgrade "tensorflow>=1.7.0rc0"
-$ pip install --upgrade tensorflow-hub
-```
-
-This section will be updated to include a specific TensorFlow version
-requirement when a compatible release is made available.
-
-
-## Status
-
-Although we hope to prevent breaking changes, this project is still under active
-development and is not yet guaranteed to have a stable API or module format.
-
-
-## Security
-
-Since they contain arbitrary TensorFlow graphs, modules can be thought of as
-programs. [Using TensorFlow Securely](https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md)
-describes the security implications of referencing a module from an untrusted
-source.
-
-
-## Modules
-
-TensorFlow Hub modules, pre-trained on public datasets, are available for a
-variety of tasks. They are listed at
-[tensorflow.org/modules](https://tensorflow.org/modules).
-
-
-## Key Concepts
-
-### Instantiating a Module
-
-A TensorFlow Hub module is imported into a TensorFlow program by
-creating a `Module` object from a string with its URL or filesystem path,
-such as:
-
-```python
-m = hub.Module("path/to/a/module_dir")
-```
-
-This adds the module's variables to the current TensorFlow graph.
-Running their initializers will read their pre-trained values from disk.
-Likewise, tables and other state is added to the graph.
-
-#### Caching Modules
-
-When creating a module from a URL, the module content is downloaded
-and cached in the local system temporary directory. The location where
-modules are cached can be overridden using `TFHUB_CACHE_DIR` environment
-variable.
-
-For example, setting `TFHUB_CACHE_DIR` to `/my_module_cache`:
-
-```shell
-$ export TFHUB_CACHE_DIR=/my_module_cache
-```
-
-and then creating a module from a URL:
-
-```python
-m = hub.Module("https://storage.googleapis.com/tfhub-test-modules/google/test/half-plus-two/1.tar.gz")
-```
-
-results in downloading the unpacked version of the module in
-`/my_module_cache`.
-
-
-### Applying a Module
-
-Once instantiated, a Module `m` can be called zero or more times
-like a Python function from tensor inputs to tensor outputs:
-
-```python
-y = m(x)
-```
-
-Each such call adds operations to the current TensorFlow graph to compute
-`y` from `x`. If this involves variables with trained weights, these are
-shared between all applications.
-
-Modules can define multiple named *signatures* in order to allow being applied
-in more than one way. A module's documentation should describe the available
-signatures. The call above applies the signature named `"default"`. Other
-signature names can be specified with the optional `signature=` argument.
-
-If a signature has multiple inputs, they must be passed as a dict,
-with the keys defined by the signature. Likewise, if a signature has
-multiple outputs, these can be retrieved as a dict by passing `as_dict=True`,
-under the keys defined by the signature. (The key `"default"` is for the
-single output returned if `as_dict=False`.)
-So the most general form of applying a Module looks like:
-
-```python
-outputs = m(dict(apples=x1, oranges=x2), signature="my_method", as_dict=True)
-y1 = outputs["cats"]
-y2 = outputs["dogs"]
-```
-
-A caller must supply all inputs defined by a signature, but there is no
-requirement to use all of a module's outputs.
-TensorFlow will run only those parts of the module that end up
-as dependencies of a target in `tf.Session.run()`. Indeed, module publishers may
-choose to provide various outputs for advanced uses (like activations of
-intermediate layers) along with the main outputs. Module consumers should
-handle additional outputs gracefully.
-
-### Creating a New Module
-
-To define a new module, a publisher calls `hub.create_module_spec()` with a
-function `module_fn`. This function constructs a graph representing the module's
-internal structure, using `tf.placeholder()` for inputs to be supplied by
-the caller. Then it defines signatures by calling
-`hub.add_signature(name, inputs, outputs)` one or more times.
-
-For example:
-
-```python
-def module_fn():
-  x = tf.placeholder(dtype=tf.float32, shape=[None, 50])
-  layer1 = tf.layers.fully_connected(inputs, 200)
-  layer2 = tf.layers.fully_connected(layer1, 100)
-  outputs = dict(default=layer2, hidden_activations=layer1)
-  # Add default signature.
-  hub.add_signature(inputs=x, outputs=outputs)
-
-...
-spec = hub.create_module_spec(module_fn)
-```
-
-The result of `hub.create_module_spec()` can be used, instead of a path,
-to instantiate a module object within a particular TensorFlow graph. In
-such case, there is no checkpoint, and the module instance will use the
-variables initializers instead.
-
-Any module instance can be serialized to disk via its `export(path, session)`
-method. Exporting a module serializes its definition together with the current
-state of its variables in `session` into the passed path. This can be used
-when exporting a module for the first time, as well as when exporting a fine
-tuned module.
-
-Additionally, for compatibility with TensorFlow Estimators, `hub` library
-provides a `LatestModuleExporter`.
-
-Module publishers should implement a [common
-signature](docs/common_signatures/index.md)
-when possible, so that consumers can easily exchange modules and find the best
-one for their problem.
-
-### Fine Tuning
-
-Training the variables of a consumer model, including those of an imported
-module, is called *fine-tuning*. Fine-tuning can result in better quality, but
-adds new complications. We advise consumers to look into fine-tuning only after
-exploring simpler quality tweaks.
-
-#### For Consumers
-
-To enable fine-tuning, instantiate the module with
-`hub.Module(..., trainable=True)` to make its variables trainable and
-import TensorFlow's `REGULARIZATION_LOSSES`. If the module has multiple
-graph variants, make sure to pick the one approprate for training.
-Usually, that's the one with tags `{"train"}`.
-
-Choose a training regime that does not ruin the pre-trained weights,
-for example, a lower learning rate than for training from scratch.
-
-#### For Publishers
-
-To make fine-tuning easier for consumers, please be mindful of the following:
-
-*   Fine-tuning needs regularization. Your module is exported with the
-    `REGULARIZATION_LOSSES` collection, which is what puts your choice of
-    `tf.layers.dense(..., kernel_regularizer=...)` etc. into what the consumer
-    gets from `tf.losses.get_regularization_losses()`. Prefer this way of
-    defining L1/L2 regularization losses.
-
-*   In the publisher model, avoid defining L1/L2 regularization via the `l1_`
-    and `l2_regularization_strength` parameters of `tf.train.FtrlOptimizer`,
-    `tf.train.ProximalGradientDescentOptimizer`, and other proximal
-    optimizers. These are not exported alongside the module, and setting
-    regularization strengths globally may not be appropriate for the
-    consumer. Except for L1 regularization in wide (i.e. sparse linear) or wide
-    & deep models, it should be possible to use individual regularization losses
-    instead.
-
-*   If you use dropout, batch normalization, or similar training techniques, set
-    dropout rate and other hyperparameters to values that make sense across many
-    expected uses.
+consumption of reusable parts of machine learning models. In particular,
+it provides **modules**, which are pre-trained pieces of TensorFlow models
+that can be reused on new tasks.
+
+
+## Getting Started
+
+*   [Introduction](docs/index.md)
+*   [Installation](docs/installation.md)
+*   Tutorials:
+    *   [Image Retraining](docs/tutorials/image_retraining.md)
+    *   [Text Classification](docs/tutorials/text_classification.md)
+    *   [All Tutorials](docs/tutorials/index.md)
+*   Key Concepts:
+    *   [Using a Module](docs/basics.md)
+    *   [Creating a New Module](docs/creating.md)
+    *   [Fine-Tuning a Module](docs/fine_tuning.md)
+*   Modules:
+    *   [Available Modules](docs/modules/index.md) -- quick links:
+        [image](docs/modules/image.md), [text](docs/modules/text.md),
+        [other](docs/modules/other.md)
+    *   [Common Signatures for Modules](docs/common_signatures/index.md)
+
+
+## Contributing
+
+If you'd like to contribute to TensorFlow Hub, be sure to review the
+[contribution guidelines](CONTRIBUTING.md). This project adheres to TensorFlow's
+[code of
+conduct](https://github.com/tensorflow/tensorflow/blob/master/CODE_OF_CONDUCT.md). By
+participating, you are expected to uphold this code.
+
+We use [GitHub issues](https://github.com/tensorflow/hub/issues) for tracking
+requests and bugs.
 
 
 ## License

diff --git a/docs/basics.md b/docs/basics.md
@@ -0,0 +1,77 @@
+# Using a Module
+
+## Instantiating a Module
+
+A TensorFlow Hub module is imported into a TensorFlow program by
+creating a `Module` object from a string with its URL or filesystem path,
+such as:
+
+```python
+m = hub.Module("path/to/a/module_dir")
+```
+
+This adds the module's variables to the current TensorFlow graph.
+Running their initializers will read their pre-trained values from disk.
+Likewise, tables and other state is added to the graph.
+
+## Caching Modules
+
+When creating a module from a URL, the module content is downloaded
+and cached in the local system temporary directory. The location where
+modules are cached can be overridden using `TFHUB_CACHE_DIR` environment
+variable.
+
+For example, setting `TFHUB_CACHE_DIR` to `/my_module_cache`:
+
+```shell
+$ export TFHUB_CACHE_DIR=/my_module_cache
+```
+
+and then creating a module from a URL:
+
+```python
+m = hub.Module("https://storage.googleapis.com/tfhub-test-modules/google/test/half-plus-two/1.tar.gz")
+```
+
+results in downloading the unpacked version of the module in
+`/my_module_cache`.
+
+
+## Applying a Module
+
+Once instantiated, a module `m` can be called zero or more times like a Python
+function from tensor inputs to tensor outputs:
+
+```python
+y = m(x)
+```
+
+Each such call adds operations to the current TensorFlow graph to compute
+`y` from `x`. If this involves variables with trained weights, these are
+shared between all applications.
+
+Modules can define multiple named *signatures* in order to allow being applied
+in more than one way. A module's documentation should describe the available
+signatures. The call above applies the signature named `"default"`. Other
+signature names can be specified with the optional `signature=` argument.
+
+If a signature has multiple inputs, they must be passed as a dict,
+with the keys defined by the signature. Likewise, if a signature has
+multiple outputs, these can be retrieved as a dict by passing `as_dict=True`,
+under the keys defined by the signature. (The key `"default"` is for the
+single output returned if `as_dict=False`.)
+So the most general form of applying a Module looks like:
+
+```python
+outputs = m(dict(apples=x1, oranges=x2), signature="my_method", as_dict=True)
+y1 = outputs["cats"]
+y2 = outputs["dogs"]
+```
+
+A caller must supply all inputs defined by a signature, but there is no
+requirement to use all of a module's outputs.
+TensorFlow will run only those parts of the module that end up
+as dependencies of a target in `tf.Session.run()`. Indeed, module publishers may
+choose to provide various outputs for advanced uses (like activations of
+intermediate layers) along with the main outputs. Module consumers should
+handle additional outputs gracefully.
diff --git a/docs/common_signatures/images.md b/docs/common_signatures/images.md
@@ -1,7 +1,7 @@
 # Common Signatures for Images
 
-This page describes common signatures for [TensorFlow Hub
-Modules](../../README.md) for tasks that involve images.
+This page describes common signatures that should be implemented by modules
+for image-related tasks.
 
 Some modules can be used for more than one task (e.g., image classification
 modules tend do to some feature extraction on the way). Therefore, each module

diff --git a/docs/common_signatures/index.md b/docs/common_signatures/index.md
@@ -1,8 +1,9 @@
-# Common Signatures for TensorFlow Hub Modules
+# Common Signatures for Modules
 
-[TensorFlow Hub Modules](../../README.md) for the same task should implement a
-common signature, so that module consumers can easily exchange them and find the
-best one for their problem.
+## Introduction
+
+Modules for the same task should implement a common signature, so that module
+consumers can easily exchange them and find the best one for their problem.
 
 This directory collects specifications of common signatures. We expect it
 to grow over time, as modules are created for a wider variety of tasks.
@@ -15,3 +16,9 @@ document them along the signature.
 
 In any case, the goal is to make exchanging different modules for the same task
 as simple as switching a string-valued hyperparameter.
+
+
+## Signatures
+
+*   [Image Signatures](images.md)
+*   [Text Signatures](text.md)
diff --git a/docs/common_signatures/text.md b/docs/common_signatures/text.md
@@ -1,7 +1,7 @@
 # Common Signatures for Text
 
-This page describes common signatures for [TensorFlow Hub
-Modules](../../README.md) for tasks that accept text inputs.
+This page describes common signatures that should be implemented by modules
+for tasks that accept text inputs.
 
 ## Text feature vector