Skip to content

Commit

Permalink
Update documentation for display on tensorflow.org/hub. Link to the new
Browse files Browse the repository at this point in the history
pages from the README.md to prevent duplication.

PiperOrigin-RevId: 189780676
  • Loading branch information
TensorFlow Hub Authors authored and andresusanopinto committed Mar 21, 2018
1 parent 13fbd9d commit 0c4102d
Show file tree
Hide file tree
Showing 16 changed files with 323 additions and 271 deletions.
271 changes: 34 additions & 237 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,243 +16,40 @@ limitations under the License.
# TensorFlow Hub

TensorFlow Hub is a library to foster the publication, discovery, and
consumption of reusable parts of machine learning models. A **module** is a
self-contained piece of a TensorFlow graph, along with its weights and assets,
that can be reused across different tasks.

Typically, modules contain variables that have been pre-trained for a task using
a large dataset. By reusing a module on a related or similar task, a user can
train a model with a smaller dataset, improve generalization, or simply speed up
training.

Modules can be instantiated from a URL or filesystem path while a TensorFlow
graph is being constructed. It can then be *applied* like an ordinary Python
function to build part of the graph. For example:

```python
import tensorflow as tf
import tensorflow_hub as hub

with tf.Graph().as_default():
# Download a 128-dimension English embedding.
embed = hub.Module("https://storage.googleapis.com/tfhub-test-modules/google/text/nnlm-en-dim128-with-normalization/1.tar.gz")

# Use the module to map an array of strings to their embeddings.
embeddings = embed([
"A long sentence.",
"single-word",
"http://example-url.com"])

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.tables_initializer())

print(sess.run(embeddings))
```

Each module has a defined interface that allows it to be used in a replaceable
way, with little or no knowledge of its internals. Once exported to disk, a
module is self-contained and can be used by others without access to the code
and data used to create and train it.


## Installation

Currently, TensorFlow Hub depends on bug fixes and enhancements not present in a
stable TensorFlow release. For now, please
[install or upgrade](https://www.tensorflow.org/install/)
TensorFlow package past 1.7.0rc0. For instance:

```bash
$ pip install --upgrade "tensorflow>=1.7.0rc0"
$ pip install --upgrade tensorflow-hub
```

This section will be updated to include a specific TensorFlow version
requirement when a compatible release is made available.


## Status

Although we hope to prevent breaking changes, this project is still under active
development and is not yet guaranteed to have a stable API or module format.


## Security

Since they contain arbitrary TensorFlow graphs, modules can be thought of as
programs. [Using TensorFlow Securely](https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md)
describes the security implications of referencing a module from an untrusted
source.


## Modules

TensorFlow Hub modules, pre-trained on public datasets, are available for a
variety of tasks. They are listed at
[tensorflow.org/modules](https://tensorflow.org/modules).


## Key Concepts

### Instantiating a Module

A TensorFlow Hub module is imported into a TensorFlow program by
creating a `Module` object from a string with its URL or filesystem path,
such as:

```python
m = hub.Module("path/to/a/module_dir")
```

This adds the module's variables to the current TensorFlow graph.
Running their initializers will read their pre-trained values from disk.
Likewise, tables and other state is added to the graph.

#### Caching Modules

When creating a module from a URL, the module content is downloaded
and cached in the local system temporary directory. The location where
modules are cached can be overridden using `TFHUB_CACHE_DIR` environment
variable.

For example, setting `TFHUB_CACHE_DIR` to `/my_module_cache`:

```shell
$ export TFHUB_CACHE_DIR=/my_module_cache
```

and then creating a module from a URL:

```python
m = hub.Module("https://storage.googleapis.com/tfhub-test-modules/google/test/half-plus-two/1.tar.gz")
```

results in downloading the unpacked version of the module in
`/my_module_cache`.


### Applying a Module

Once instantiated, a Module `m` can be called zero or more times
like a Python function from tensor inputs to tensor outputs:

```python
y = m(x)
```

Each such call adds operations to the current TensorFlow graph to compute
`y` from `x`. If this involves variables with trained weights, these are
shared between all applications.

Modules can define multiple named *signatures* in order to allow being applied
in more than one way. A module's documentation should describe the available
signatures. The call above applies the signature named `"default"`. Other
signature names can be specified with the optional `signature=` argument.

If a signature has multiple inputs, they must be passed as a dict,
with the keys defined by the signature. Likewise, if a signature has
multiple outputs, these can be retrieved as a dict by passing `as_dict=True`,
under the keys defined by the signature. (The key `"default"` is for the
single output returned if `as_dict=False`.)
So the most general form of applying a Module looks like:

```python
outputs = m(dict(apples=x1, oranges=x2), signature="my_method", as_dict=True)
y1 = outputs["cats"]
y2 = outputs["dogs"]
```

A caller must supply all inputs defined by a signature, but there is no
requirement to use all of a module's outputs.
TensorFlow will run only those parts of the module that end up
as dependencies of a target in `tf.Session.run()`. Indeed, module publishers may
choose to provide various outputs for advanced uses (like activations of
intermediate layers) along with the main outputs. Module consumers should
handle additional outputs gracefully.

### Creating a New Module

To define a new module, a publisher calls `hub.create_module_spec()` with a
function `module_fn`. This function constructs a graph representing the module's
internal structure, using `tf.placeholder()` for inputs to be supplied by
the caller. Then it defines signatures by calling
`hub.add_signature(name, inputs, outputs)` one or more times.

For example:

```python
def module_fn():
x = tf.placeholder(dtype=tf.float32, shape=[None, 50])
layer1 = tf.layers.fully_connected(inputs, 200)
layer2 = tf.layers.fully_connected(layer1, 100)
outputs = dict(default=layer2, hidden_activations=layer1)
# Add default signature.
hub.add_signature(inputs=x, outputs=outputs)

...
spec = hub.create_module_spec(module_fn)
```

The result of `hub.create_module_spec()` can be used, instead of a path,
to instantiate a module object within a particular TensorFlow graph. In
such case, there is no checkpoint, and the module instance will use the
variables initializers instead.

Any module instance can be serialized to disk via its `export(path, session)`
method. Exporting a module serializes its definition together with the current
state of its variables in `session` into the passed path. This can be used
when exporting a module for the first time, as well as when exporting a fine
tuned module.

Additionally, for compatibility with TensorFlow Estimators, `hub` library
provides a `LatestModuleExporter`.

Module publishers should implement a [common
signature](docs/common_signatures/index.md)
when possible, so that consumers can easily exchange modules and find the best
one for their problem.

### Fine Tuning

Training the variables of a consumer model, including those of an imported
module, is called *fine-tuning*. Fine-tuning can result in better quality, but
adds new complications. We advise consumers to look into fine-tuning only after
exploring simpler quality tweaks.

#### For Consumers

To enable fine-tuning, instantiate the module with
`hub.Module(..., trainable=True)` to make its variables trainable and
import TensorFlow's `REGULARIZATION_LOSSES`. If the module has multiple
graph variants, make sure to pick the one approprate for training.
Usually, that's the one with tags `{"train"}`.

Choose a training regime that does not ruin the pre-trained weights,
for example, a lower learning rate than for training from scratch.

#### For Publishers

To make fine-tuning easier for consumers, please be mindful of the following:

* Fine-tuning needs regularization. Your module is exported with the
`REGULARIZATION_LOSSES` collection, which is what puts your choice of
`tf.layers.dense(..., kernel_regularizer=...)` etc. into what the consumer
gets from `tf.losses.get_regularization_losses()`. Prefer this way of
defining L1/L2 regularization losses.

* In the publisher model, avoid defining L1/L2 regularization via the `l1_`
and `l2_regularization_strength` parameters of `tf.train.FtrlOptimizer`,
`tf.train.ProximalGradientDescentOptimizer`, and other proximal
optimizers. These are not exported alongside the module, and setting
regularization strengths globally may not be appropriate for the
consumer. Except for L1 regularization in wide (i.e. sparse linear) or wide
& deep models, it should be possible to use individual regularization losses
instead.

* If you use dropout, batch normalization, or similar training techniques, set
dropout rate and other hyperparameters to values that make sense across many
expected uses.
consumption of reusable parts of machine learning models. In particular,
it provides **modules**, which are pre-trained pieces of TensorFlow models
that can be reused on new tasks.


## Getting Started

* [Introduction](docs/index.md)
* [Installation](docs/installation.md)
* Tutorials:
* [Image Retraining](docs/tutorials/image_retraining.md)
* [Text Classification](docs/tutorials/text_classification.md)
* [All Tutorials](docs/tutorials/index.md)
* Key Concepts:
* [Using a Module](docs/basics.md)
* [Creating a New Module](docs/creating.md)
* [Fine-Tuning a Module](docs/fine_tuning.md)
* Modules:
* [Available Modules](docs/modules/index.md) -- quick links:
[image](docs/modules/image.md), [text](docs/modules/text.md),
[other](docs/modules/other.md)
* [Common Signatures for Modules](docs/common_signatures/index.md)


## Contributing

If you'd like to contribute to TensorFlow Hub, be sure to review the
[contribution guidelines](CONTRIBUTING.md). This project adheres to TensorFlow's
[code of
conduct](https://github.com/tensorflow/tensorflow/blob/master/CODE_OF_CONDUCT.md). By
participating, you are expected to uphold this code.

We use [GitHub issues](https://github.com/tensorflow/hub/issues) for tracking
requests and bugs.


## License
Expand Down
77 changes: 77 additions & 0 deletions docs/basics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Using a Module

## Instantiating a Module

A TensorFlow Hub module is imported into a TensorFlow program by
creating a `Module` object from a string with its URL or filesystem path,
such as:

```python
m = hub.Module("path/to/a/module_dir")
```

This adds the module's variables to the current TensorFlow graph.
Running their initializers will read their pre-trained values from disk.
Likewise, tables and other state is added to the graph.

## Caching Modules

When creating a module from a URL, the module content is downloaded
and cached in the local system temporary directory. The location where
modules are cached can be overridden using `TFHUB_CACHE_DIR` environment
variable.

For example, setting `TFHUB_CACHE_DIR` to `/my_module_cache`:

```shell
$ export TFHUB_CACHE_DIR=/my_module_cache
```

and then creating a module from a URL:

```python
m = hub.Module("https://storage.googleapis.com/tfhub-test-modules/google/test/half-plus-two/1.tar.gz")
```

results in downloading the unpacked version of the module in
`/my_module_cache`.


## Applying a Module

Once instantiated, a module `m` can be called zero or more times like a Python
function from tensor inputs to tensor outputs:

```python
y = m(x)
```

Each such call adds operations to the current TensorFlow graph to compute
`y` from `x`. If this involves variables with trained weights, these are
shared between all applications.

Modules can define multiple named *signatures* in order to allow being applied
in more than one way. A module's documentation should describe the available
signatures. The call above applies the signature named `"default"`. Other
signature names can be specified with the optional `signature=` argument.

If a signature has multiple inputs, they must be passed as a dict,
with the keys defined by the signature. Likewise, if a signature has
multiple outputs, these can be retrieved as a dict by passing `as_dict=True`,
under the keys defined by the signature. (The key `"default"` is for the
single output returned if `as_dict=False`.)
So the most general form of applying a Module looks like:

```python
outputs = m(dict(apples=x1, oranges=x2), signature="my_method", as_dict=True)
y1 = outputs["cats"]
y2 = outputs["dogs"]
```

A caller must supply all inputs defined by a signature, but there is no
requirement to use all of a module's outputs.
TensorFlow will run only those parts of the module that end up
as dependencies of a target in `tf.Session.run()`. Indeed, module publishers may
choose to provide various outputs for advanced uses (like activations of
intermediate layers) along with the main outputs. Module consumers should
handle additional outputs gracefully.
4 changes: 2 additions & 2 deletions docs/common_signatures/images.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Common Signatures for Images

This page describes common signatures for [TensorFlow Hub
Modules](../../README.md) for tasks that involve images.
This page describes common signatures that should be implemented by modules
for image-related tasks.

Some modules can be used for more than one task (e.g., image classification
modules tend do to some feature extraction on the way). Therefore, each module
Expand Down
15 changes: 11 additions & 4 deletions docs/common_signatures/index.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Common Signatures for TensorFlow Hub Modules
# Common Signatures for Modules

[TensorFlow Hub Modules](../../README.md) for the same task should implement a
common signature, so that module consumers can easily exchange them and find the
best one for their problem.
## Introduction

Modules for the same task should implement a common signature, so that module
consumers can easily exchange them and find the best one for their problem.

This directory collects specifications of common signatures. We expect it
to grow over time, as modules are created for a wider variety of tasks.
Expand All @@ -15,3 +16,9 @@ document them along the signature.

In any case, the goal is to make exchanging different modules for the same task
as simple as switching a string-valued hyperparameter.


## Signatures

* [Image Signatures](images.md)
* [Text Signatures](text.md)
4 changes: 2 additions & 2 deletions docs/common_signatures/text.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Common Signatures for Text

This page describes common signatures for [TensorFlow Hub
Modules](../../README.md) for tasks that accept text inputs.
This page describes common signatures that should be implemented by modules
for tasks that accept text inputs.

## Text feature vector

Expand Down
Loading

0 comments on commit 0c4102d

Please sign in to comment.