Document how to run inference for TF2 models.

Context: tensorflow#479 PiperOrigin-RevId: 290569934
ChanchalKumarMaji · Jan 20, 2020 · 7e217af · 7e217af
1 parent 67a5e78
commit 7e217af
Showing 1 changed file with 27 additions and 3 deletions.
diff --git a/docs/common_issues.md b/docs/common_issues.md
@@ -61,14 +61,38 @@ $ python
 
 ## Running inference on a pre-initialized module
 
-If you are applying a module over data multiple times (e.g. to serve user
-requests) you should use TensorFlow Session.run to avoid the overhead of
-constructing and initializing parts of the graph multiple times.
+If you are writing a Python program that applies a module many times on input
+data, you can apply the following recipes. (Note: For serving requests in
+production servives, consider go/servo or other scalable, Python-free
+solutions.)
 
 Assuming your use-case model is **initialization** and subsequent **requests**
 (for example Django, Flask, custom HTTP server, etc.), you can set-up the
 serving as follows:
 
+### TF2 SavedModels
+
+*   In the initialization part:
+    *   Load the TF2.0 model.
+
+```python
+import tensorflow_hub as hub
+
+embedding_fn = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")
+```
+
+*   In the request part:
+    *   Use the embedding function to run inference.
+
+```python
+embedding_fn(["Hello world"])
+```
+
+This call of a tf.function is optimized for performance, see
+[tf.function guide](https://www.tensorflow.org/guide/function).
+
+### TF1 Hub modules
+
 *   In the initialization part:
     *   Build the graph with a **placeholder** - entry point into the graph.
     *   Initialize the session.