Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify why TextVectorization works on CPU #913

Merged
merged 2 commits into from
Jun 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 19 additions & 15 deletions guides/ipynb/preprocessing_layers.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@
"exist, those can be loaded directly into the lookup tables by passing a path to the\n",
"vocabulary file in the layer's constructor arguments.\n",
"\n",
"Here's an example where we instantiate a `StringLookup` layer with precomputed vocabulary:"
"Here's an example where you instantiate a `StringLookup` layer with precomputed vocabulary:"
]
},
{
Expand Down Expand Up @@ -229,7 +229,7 @@
"\n",
"With this option, preprocessing will happen on device, synchronously with the rest of the\n",
"model execution, meaning that it will benefit from GPU acceleration.\n",
"If you're training on GPU, this is the best option for the `Normalization` layer, and for\n",
"If you're training on a GPU, this is the best option for the `Normalization` layer, and for\n",
"all image preprocessing and data augmentation layers.\n",
"\n",
"**Option 2:** apply it to your `tf.data.Dataset`, so as to obtain a dataset that yields\n",
Expand All @@ -239,7 +239,7 @@
"dataset = dataset.map(lambda x, y: (preprocessing_layer(x), y))\n",
"```\n",
"\n",
"With this option, your preprocessing will happen on CPU, asynchronously, and will be\n",
"With this option, your preprocessing will happen on a CPU, asynchronously, and will be\n",
"buffered before going into the model.\n",
"In addition, if you call `dataset.prefetch(tf.data.AUTOTUNE)` on your dataset,\n",
"the preprocessing will happen efficiently in parallel with training:\n",
Expand All @@ -251,11 +251,15 @@
"```\n",
"\n",
"This is the best option for `TextVectorization`, and all structured data preprocessing\n",
"layers. It can also be a good option if you're training on CPU\n",
"and you use image preprocessing layers.\n",
"layers. It can also be a good option if you're training on a CPU and you use image preprocessing\n",
"layers.\n",
"\n",
"**When running on TPU, you should always place preprocessing layers in the `tf.data` pipeline**\n",
"(with the exception of `Normalization` and `Rescaling`, which run fine on TPU and are commonly\n",
"Note that the `TextVectorization` layer can only be executed on a CPU, as it is mostly a\n",
"dictionary lookup operation. Therefore, if you are training your model on a GPU or a TPU,\n",
"you should put the `TextVectorization` layer in the `tf.data` pipeline to get the best performance.\n",
"\n",
"**When running on a TPU, you should always place preprocessing layers in the `tf.data` pipeline**\n",
"(with the exception of `Normalization` and `Rescaling`, which run fine on a TPU and are commonly\n",
"used as the first layer is an image model)."
]
},
Expand Down Expand Up @@ -307,7 +311,7 @@
"[tf.distribute](https://www.tensorflow.org/api_docs/python/tf/distribute) API\n",
"for running training across multiple machines.\n",
"\n",
"In general, preprocessing layers should be placed inside a `strategy.scope()`\n",
"In general, preprocessing layers should be placed inside a `tf.distribute.Strategy.scope()`\n",
"and called either inside or before the model as discussed above.\n",
"\n",
"```python\n",
Expand All @@ -317,9 +321,9 @@
" dense_layer = tf.keras.layers.Dense(16)\n",
"```\n",
"\n",
"For more details, refer to the\n",
"[preprocessing section](https://www.tensorflow.org/tutorials/distribute/input#data_preprocessing)\n",
"of the distributed input guide."
"For more details, refer to the _Data preprocessing_ section\n",
"of the [Distributed input](https://www.tensorflow.org/tutorials/distribute/input)\n",
"tutorial."
]
},
{
Expand Down Expand Up @@ -642,7 +646,7 @@
"colab_type": "text"
},
"source": [
"### Encoding text as a dense matrix of ngrams with multi-hot encoding\n",
"### Encoding text as a dense matrix of N-grams with multi-hot encoding\n",
"\n",
"This is how you should preprocess text to be passed to a `Dense` layer."
]
Expand Down Expand Up @@ -712,7 +716,7 @@
"colab_type": "text"
},
"source": [
"### Encoding text as a dense matrix of ngrams with TF-IDF weighting\n",
"### Encoding text as a dense matrix of N-grams with TF-IDF weighting\n",
"\n",
"This is an alternative way of preprocessing text before passing it to a `Dense` layer."
]
Expand Down Expand Up @@ -790,11 +794,11 @@
"You may find yourself working with a very large vocabulary in a `TextVectorization`, a `StringLookup` layer,\n",
"or an `IntegerLookup` layer. Typically, a vocabulary larger than 500MB would be considered \"very large\".\n",
"\n",
"In such case, for best performance, you should avoid using `adapt()`.\n",
"In such a case, for best performance, you should avoid using `adapt()`.\n",
"Instead, pre-compute your vocabulary in advance\n",
"(you could use Apache Beam or TF Transform for this)\n",
"and store it in a file. Then load the vocabulary into the layer at construction\n",
"time by passing the filepath as the `vocabulary` argument.\n",
"time by passing the file path as the `vocabulary` argument.\n",
"\n",
"\n",
"### Using lookup layers on a TPU pod or with `ParameterServerStrategy`.\n",
Expand Down
76 changes: 51 additions & 25 deletions guides/md/preprocessing_layers.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,14 @@ print("Features std: %.2f" % (normalized_data.numpy().std()))

<div class="k-default-codeblock">
```
2022-06-15 15:02:07.223345: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
Copy link
Contributor Author

@8bitmp3 8bitmp3 Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw @fchollet Note the new messages in the output after regenerating the notebook and Markdown files

2022-06-15 15:02:07.223381: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-06-15 15:02:20.304033: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-06-15 15:02:20.304073: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-06-15 15:02:20.304097: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (codespaces-c67928): /proc/driver/nvidia/version does not exist
2022-06-15 15:02:20.304650: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

Features mean: -0.00
Features std: 1.00

Expand Down Expand Up @@ -164,7 +172,7 @@ files for the `TextVectorization`, `StringLookup`, or `IntegerLookup` layers alr
exist, those can be loaded directly into the lookup tables by passing a path to the
vocabulary file in the layer's constructor arguments.

Here's an example where we instantiate a `StringLookup` layer with precomputed vocabulary:
Here's an example where you instantiate a `StringLookup` layer with precomputed vocabulary:


```python
Expand Down Expand Up @@ -199,7 +207,7 @@ model = keras.Model(inputs, outputs)

With this option, preprocessing will happen on device, synchronously with the rest of the
model execution, meaning that it will benefit from GPU acceleration.
If you're training on GPU, this is the best option for the `Normalization` layer, and for
If you're training on a GPU, this is the best option for the `Normalization` layer, and for
all image preprocessing and data augmentation layers.

**Option 2:** apply it to your `tf.data.Dataset`, so as to obtain a dataset that yields
Expand All @@ -209,7 +217,7 @@ batches of preprocessed data, like this:
dataset = dataset.map(lambda x, y: (preprocessing_layer(x), y))
```

With this option, your preprocessing will happen on CPU, asynchronously, and will be
With this option, your preprocessing will happen on a CPU, asynchronously, and will be
buffered before going into the model.
In addition, if you call `dataset.prefetch(tf.data.AUTOTUNE)` on your dataset,
the preprocessing will happen efficiently in parallel with training:
Expand All @@ -221,11 +229,15 @@ model.fit(dataset, ...)
```

This is the best option for `TextVectorization`, and all structured data preprocessing
layers. It can also be a good option if you're training on CPU
and you use image preprocessing layers.
layers. It can also be a good option if you're training on a CPU and you use image preprocessing
layers.

Note that the `TextVectorization` layer can only be executed on a CPU, as it is mostly a
dictionary lookup operation. Therefore, if you are training your model on a GPU or a TPU,
you should put the `TextVectorization` layer in the `tf.data` pipeline to get the best performance.

**When running on TPU, you should always place preprocessing layers in the `tf.data` pipeline**
(with the exception of `Normalization` and `Rescaling`, which run fine on TPU and are commonly
**When running on a TPU, you should always place preprocessing layers in the `tf.data` pipeline**
(with the exception of `Normalization` and `Rescaling`, which run fine on a TPU and are commonly
used as the first layer is an image model).

---
Expand Down Expand Up @@ -265,7 +277,7 @@ Preprocessing layers are compatible with the
[tf.distribute](https://www.tensorflow.org/api_docs/python/tf/distribute) API
for running training across multiple machines.

In general, preprocessing layers should be placed inside a `strategy.scope()`
In general, preprocessing layers should be placed inside a `tf.distribute.Strategy.scope()`
and called either inside or before the model as discussed above.

```python
Expand All @@ -275,9 +287,9 @@ with strategy.scope():
dense_layer = tf.keras.layers.Dense(16)
```

For more details, refer to the
[preprocessing section](https://www.tensorflow.org/tutorials/distribute/input#data_preprocessing)
of the distributed input guide.
For more details, refer to the _Data preprocessing_ section
of the [Distributed input](https://www.tensorflow.org/tutorials/distribute/input)
tutorial.

---
## Quick recipes
Expand Down Expand Up @@ -324,9 +336,21 @@ model.fit(train_dataset, steps_per_epoch=5)

<div class="k-default-codeblock">
```
5/5 [==============================] - 10s 415ms/step - loss: 8.7501
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170498071/170498071 [==============================] - 14s 0us/step

2022-06-15 15:02:40.512792: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 153600000 exceeds 10% of free system memory.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw @fchollet Maybe we should make the output in cells less verbose.

2022-06-15 15:02:42.635033: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 153600000 exceeds 10% of free system memory.

1/5 [=====>........................] - ETA: 46s - loss: 4.4839

2022-06-15 15:02:54.422388: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 15040512 exceeds 10% of free system memory.
2022-06-15 15:02:54.422493: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 15040512 exceeds 10% of free system memory.
2022-06-15 15:02:54.429803: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 15040512 exceeds 10% of free system memory.

5/5 [==============================] - 14s 712ms/step - loss: 8.8112

<keras.callbacks.History at 0x1277aa790>
<keras.callbacks.History at 0x7f80ec476620>

```
</div>
Expand Down Expand Up @@ -360,9 +384,9 @@ model.fit(x_train, y_train)

<div class="k-default-codeblock">
```
1563/1563 [==============================] - 2s 1ms/step - loss: 2.1209
1563/1563 [==============================] - 3s 2ms/step - loss: 2.1300

<keras.callbacks.History at 0x1288e7d90>
<keras.callbacks.History at 0x7f80e5f0a320>

```
</div>
Expand Down Expand Up @@ -537,14 +561,14 @@ Encoded text:
<div class="k-default-codeblock">
```
Training model...
1/1 [==============================] - 1s 1s/step - loss: 0.4862
1/1 [==============================] - 2s 2s/step - loss: 0.4970
```
</div>

<div class="k-default-codeblock">
```
Calling end-to-end model on test string...
Model output: tf.Tensor([[0.0396869]], shape=(1, 1), dtype=float32)
Model output: tf.Tensor([[0.03878693]], shape=(1, 1), dtype=float32)

```
</div>
Expand All @@ -555,7 +579,7 @@ in the example
Note that when training such a model, for best performance, you should always
use the `TextVectorization` layer as part of the input pipeline.

### Encoding text as a dense matrix of ngrams with multi-hot encoding
### Encoding text as a dense matrix of N-grams with multi-hot encoding

This is how you should preprocess text to be passed to a `Dense` layer.

Expand Down Expand Up @@ -614,6 +638,7 @@ print("Model output:", test_output)

<div class="k-default-codeblock">
```
WARNING:tensorflow:5 out of the last 1567 calls to <function PreprocessingLayer.make_adapt_function.<locals>.adapt_step at 0x7f80ec464a60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
Copy link
Contributor Author

@8bitmp3 8bitmp3 Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw @fchollet Note the warnings in the output after regenerating .md and Jupyter files

Encoded text:
[[1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 0. 0. 0.
0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0.]]
Expand All @@ -623,18 +648,18 @@ Encoded text:
<div class="k-default-codeblock">
```
Training model...
1/1 [==============================] - 0s 192ms/step - loss: 2.7082
1/1 [==============================] - 0s 252ms/step - loss: 1.7566
```
</div>

<div class="k-default-codeblock">
```
Calling end-to-end model on test string...
Model output: tf.Tensor([[-0.58801]], shape=(1, 1), dtype=float32)
Model output: tf.Tensor([[-0.01154183]], shape=(1, 1), dtype=float32)

```
</div>
### Encoding text as a dense matrix of ngrams with TF-IDF weighting
### Encoding text as a dense matrix of N-grams with TF-IDF weighting

This is an alternative way of preprocessing text before passing it to a `Dense` layer.

Expand Down Expand Up @@ -694,6 +719,7 @@ print("Model output:", test_output)

<div class="k-default-codeblock">
```
WARNING:tensorflow:6 out of the last 1568 calls to <function PreprocessingLayer.make_adapt_function.<locals>.adapt_step at 0x7f80ec466b90> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
Encoded text:
[[5.461647 1.6945957 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
Expand All @@ -707,14 +733,14 @@ Encoded text:
<div class="k-default-codeblock">
```
Training model...
1/1 [==============================] - 0s 192ms/step - loss: 1.3662
1/1 [==============================] - 0s 260ms/step - loss: 6.3598
```
</div>

<div class="k-default-codeblock">
```
Calling end-to-end model on test string...
Model output: tf.Tensor([[1.6707027]], shape=(1, 1), dtype=float32)
Model output: tf.Tensor([[-0.33832753]], shape=(1, 1), dtype=float32)

```
</div>
Expand All @@ -726,11 +752,11 @@ Model output: tf.Tensor([[1.6707027]], shape=(1, 1), dtype=float32)
You may find yourself working with a very large vocabulary in a `TextVectorization`, a `StringLookup` layer,
or an `IntegerLookup` layer. Typically, a vocabulary larger than 500MB would be considered "very large".

In such case, for best performance, you should avoid using `adapt()`.
In such a case, for best performance, you should avoid using `adapt()`.
Instead, pre-compute your vocabulary in advance
(you could use Apache Beam or TF Transform for this)
and store it in a file. Then load the vocabulary into the layer at construction
time by passing the filepath as the `vocabulary` argument.
time by passing the file path as the `vocabulary` argument.


### Using lookup layers on a TPU pod or with `ParameterServerStrategy`.
Expand Down
Loading