TensorFlow Hub is a library to foster the publication, discovery, and consumption of reusable parts of machine learning models. A module is a self-contained piece of a TensorFlow graph, along with its weights and assets, that can be reused across different tasks in a process known as transfer learning.
Modules contain variables that have been pre-trained for a task using a large dataset. By reusing a module on a related task, you can:
- train a model with a smaller dataset,
- improve generalization, or
- significantly speed up training.
Here's an example that uses an English embedding module to map an array of strings to their embeddings:
import tensorflow as tf
import tensorflow_hub as hub
with tf.Graph().as_default():
embed = hub.Module("https://tfhub.dev/google/nnlm-en-dim128-with-normalization/1")
embeddings = embed(["A long sentence.", "single-word", "http://example.com"])
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.tables_initializer())
print(sess.run(embeddings))
- Installation
- Tutorials:
- Key Concepts:
- Modules:
- Available Modules -- quick links: image, text, other
- Common Signatures for Modules
- Hosting a Module
As in all of machine learning, fairness is an important consideration. Modules typically leverage large pretrained datasets. When reusing such a dataset, it’s important to be mindful of what data it contains (and whether there are any existing biases there), and how these might impact your downstream experiments.
Although we hope to prevent breaking changes, this project is still under active development and is not yet guaranteed to have a stable API or module format.
Since they contain arbitrary TensorFlow graphs, modules can be thought of as programs. Using TensorFlow Securely describes the security implications of referencing a module from an untrusted source.
The source code is available on GitHub. Use GitHub issues for feature requests and bugs. Please see the TensorFlow Hub mailing list for general questions and discussion.