Add a story for "Recipes generation".

trekhleb · trekhleb · commit 7422c7541629 · 2020-06-25T07:40:12.000+02:00
diff --git a/assets/recipes_generation.ru.md b/assets/recipes_generation.ru.md
@@ -796,11 +796,13 @@ _<small>➔ вывод:</small>_
 > ▪︎ Cover, and cook for 5 to 6 hours on High. About 30 minutes before serving, place the torn biscuit dough in the slow cooker. Cook until the dough is no longer raw in the center.
 > ```    
 
-### Add padding to sequences
+### Подгонка последовательностей к одной длине
 
-We need all recipes to have the same length for training. To do that we'll use [tf.keras.preprocessing.sequence.pad_sequences](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/sequence/pad_sequences) utility to add a stop word to the end of each recipe and to make them have the same length.
+ℹ️ _Как синоним подгонки по длине мы также будем использовать слово "паддинг"_
 
-Let's check the recipes lengths:
+Все рецепты должны иметь одинаковую длину перед тренировкой модели. Для паддинга рецептов мы воспользуемся утилитой [tf.keras.preprocessing.sequence.pad_sequences](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/sequence/pad_sequences). С помощью этой функции мы добавим стоп-символ в конец каждого рецепта так, что все рецепты будут одинаковой длины.
+
+На данный момент длины первых 10-и рецептов выглядят следующим образом:
 
 ```python
 for recipe_index, recipe in enumerate(dataset_vectorized[:10]):
@@ -822,17 +824,17 @@ _<small>➔ вывод:</small>_
 > Recipe #10 length: 854
 > ```
 
-Let's pad all recipes with a `STOP_SIGN`:
+Добавим `STOP_SIGN` в конец каждого рецепта:
 
 ```python
 dataset_vectorized_padded_without_stops = tf.keras.preprocessing.sequence.pad_sequences(
     dataset_vectorized,
     padding='post',
     truncating='post',
-    # We use -1 here and +1 in the next step to make sure
-    # that all recipes will have at least 1 stops sign at the end,
-    # since each sequence will be shifted and truncated afterwards
-    # (to generate X and Y sequences).
+    # Используем -1 здесь и +1 ниже, чтобы каждый рецепт имел как минимум
+    # один стоп-символ в конце, поскольку в дальнейшем каждый текст будет
+    # обрезан на один символ с конца строки для формирования последовательностей X и Y.
+    # (см. ниже по тексту)
     maxlen=MAX_RECIPE_LENGTH-1,
     value=tokenizer.texts_to_sequences([STOP_SIGN])[0]
 )
@@ -864,9 +866,11 @@ _<small>➔ вывод:</small>_
 > Recipe #9 length: 2001
 > ```
 
-After the padding all recipes in the dataset now have the same length and RNN will also be able to learn where each recipe stops (by observing the presence of a `STOP_SIGN`).
+После подгонки каждый рецепт в наборе данных имеет одинаковую длину и стоп-символ в конце.
+
+Длина рецепта на данный момент на один символ больше запланированной (`2001` вместо `2000`). Это сделано по той причине, что ниже из каждого рецепта мы будем формировать входную последовательность `X` (длиной в `2000`) и выходную последовательность `Y` (длиной в `2000`), которые будут сдвинуты друг относительно друга на `1` символ.
 
-Here is an example of how a first recipe looks like after the padding.
+Вот как выглядит первый рецепт после паддинга:
 
 ```python
 recipe_sequence_to_string(dataset_vectorized_padded[0])
@@ -892,7 +896,7 @@ _<small>➔ вывод:</small>_
 > ␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
 > ```
 
-All recipes now end with one or many `␣` signs. We expect our LSTM model to learn that whenever it sees the `␣` stop-character it means that the recipe is ended. Once the network will learn this concept it will put stop-character at the end of every newly generated recipe. 
+Все рецепты сейчас заканчиваются одним или несколькими символами `␣`. Ожидается, что наша LSTM модель научится предлагать этот символ как следующий рекомендуемый, если она будет считать, что текст рецепта уже закончен.
 
 ### Create TensorFlow dataset