Colab demonstrating basic usage of multilingual universal encoder (un…

…iversal-encoder-xling-many) module. PiperOrigin-RevId: 233990556
miws · Feb 26, 2019 · 61113ef · 61113ef
1 parent d9e6efb
commit 61113ef
Showing 1 changed file with 335 additions and 0 deletions.
diff --git a/examples/colab/cross_lingual_similarity_with_tf_hub_multilingual_universal_encoder.ipynb b/examples/colab/cross_lingual_similarity_with_tf_hub_multilingual_universal_encoder.ipynb
@@ -0,0 +1,335 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "RUymE2l9GZfO"
+      },
+      "source": [
+        "##### Copyright 2019 The TensorFlow Hub Authors.\n",
+        "\n",
+        "Licensed under the Apache License, Version 2.0 (the \"License\");"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 0,
+      "metadata": {
+        "cellView": "code",
+        "colab": {},
+        "colab_type": "code",
+        "id": "JMyTNwSJGGWg"
+      },
+      "outputs": [],
+      "source": [
+        "# Copyright 2019 The TensorFlow Hub Authors. All Rights Reserved.\n",
+        "#\n",
+        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+        "# you may not use this file except in compliance with the License.\n",
+        "# You may obtain a copy of the License at\n",
+        "#\n",
+        "#     http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing, software\n",
+        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+        "# See the License for the specific language governing permissions and\n",
+        "# limitations under the License.\n",
+        "# =============================================================================="
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "co7MV6sX7Xto"
+      },
+      "source": [
+        "# Multilingual Universal Sentence Encoder\n",
+        "\n",
+        "\n",
+        "\u003ctable align=\"left\"\u003e\u003ctd\u003e\n",
+        "  \u003ca target=\"_blank\"  href=\"https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/cross_lingual_similarity_with_tf_hub_multilingual_universal_encoder.ipynb\"\u003e\n",
+        "    \u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\n",
+        "  \u003c/a\u003e\n",
+        "\u003c/td\u003e\u003ctd\u003e\n",
+        "  \u003ca target=\"_blank\"  href=\"https://github.com/tensorflow/hub/blob/master/examples/colab/cross_lingual_similarity_with_tf_hub_multilingual_universal_encoder.ipynb\"\u003e\n",
+        "    \u003cimg width=32px src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
+        "\u003c/td\u003e\u003c/table\u003e\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "eAVQGidpL8v5"
+      },
+      "source": [
+        "This notebook illustrates how to access the Multilingual Universal Sentence Encoder module and use it for sentence similarity across multiple languages. This module is an extension of the [original Universal Encoder module](https://tfhub.dev/google/universal-sentence-encoder/2)."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "pOTzp8O36CyQ"
+      },
+      "source": [
+        "# Getting Started\n",
+        "\n",
+        "This section sets up the environment for access to the Multilingual Universal Sentence Encoder Module and also prepares a set of English sentences and their translations. In the following sections, the multilingual module will be used to compute similarity *across languages*."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 0,
+      "metadata": {
+        "colab": {},
+        "colab_type": "code",
+        "id": "lVjNK8shFKOC"
+      },
+      "outputs": [],
+      "source": [
+        "# Install the latest TensorFlow version compatible with tf-sentencepiece.\n",
+        "!pip3 install --quiet tensorflow==1.12.0\n",
+        "# Install TF-Hub.\n",
+        "!pip3 install --quiet tensorflow-hub\n",
+        "!pip3 install --quiet seaborn\n",
+        "# Install Sentencepiece.\n",
+        "!pip3 install --quiet tf-sentencepiece"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "63Pd3nJnTl-i"
+      },
+      "source": [
+        "More detailed information about installing Tensorflow can be found at [https://www.tensorflow.org/install/](https://www.tensorflow.org/install/)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 0,
+      "metadata": {
+        "colab": {},
+        "colab_type": "code",
+        "id": "MSeY-MUQo2Ha"
+      },
+      "outputs": [],
+      "source": [
+        "import tensorflow as tf\n",
+        "import tensorflow_hub as hub\n",
+        "import numpy as np\n",
+        "import seaborn as sns\n",
+        "import tf_sentencepiece"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 0,
+      "metadata": {
+        "colab": {},
+        "colab_type": "code",
+        "id": "Q8F4LNGFqOiq"
+      },
+      "outputs": [],
+      "source": [
+        "# Some texts of different lengths in different languages.\n",
+        "english_sentences = [\"dog\", \"Puppies are nice.\", \"I enjoy taking long walks along the beach with my dog.\"]\n",
+        "spanish_sentences = [\"perro\", \"Los cachorros son agradables.\", \"Disfruto de dar largos paseos por la playa con mi perro.\"]\n",
+        "german_sentences = [\"Hund\", \"Welpen sind nett.\", \"Ich genieße lange Spaziergänge am Strand entlang mit meinem Hund.\"]\n",
+        "french_sentences = [\"chien\", \"Les chiots sont gentils.\", \"J'aime faire de longues promenades sur la plage avec mon chien.\"]\n",
+        "italian_sentences = [\"cane\", \"I cuccioli sono carini.\", \"Mi piace fare lunghe passeggiate lungo la spiaggia con il mio cane.\"]\n",
+        "chinese_sentences = [\"狗\", \"小狗很好。\", \"我喜欢和我的狗一起沿着海滩散步。\"]\n",
+        "korean_sentences = [\"개\", \"강아지가 좋다.\", \"나는 나의 산책을 해변을 따라 길게 산책하는 것을 즐긴다.\"]\n",
+        "japanese_sentences = [\"犬\", \"子犬はいいです\", \"私は犬と一緒にビーチを散歩するのが好きです\"]"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "8xdAogbxJDTD"
+      },
+      "source": [
+        "## Computing Text Embeddings\n",
+        "We first precompute the embeddings for all of our sentences."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 0,
+      "metadata": {
+        "colab": {},
+        "colab_type": "code",
+        "id": "weXZqLtTJY9b"
+      },
+      "outputs": [],
+      "source": [
+        "# The 8-language multilingual module. There are also en-es, en-de, and en-fr bilingual modules.\n",
+        "module_url = \"https://tfhub.dev/google/universal-sentence-encoder-xling-many/1\"\n",
+        "\n",
+        "# Set up graph.\n",
+        "g = tf.Graph()\n",
+        "with g.as_default():\n",
+        "  text_input = tf.placeholder(dtype=tf.string, shape=[None])\n",
+        "  xling_8_embed = hub.Module(module_url)\n",
+        "  embedded_text = xling_8_embed(text_input)\n",
+        "  init_op = tf.group([tf.global_variables_initializer(), tf.tables_initializer()])\n",
+        "g.finalize()\n",
+        "\n",
+        "# Initialize session.\n",
+        "session = tf.Session(graph=g)\n",
+        "session.run(init_op)\n",
+        "\n",
+        "# Compute embeddings.\n",
+        "en_result = session.run(embedded_text, feed_dict={text_input: english_sentences})\n",
+        "es_result = session.run(embedded_text, feed_dict={text_input: spanish_sentences})\n",
+        "de_result = session.run(embedded_text, feed_dict={text_input: german_sentences})\n",
+        "fr_result = session.run(embedded_text, feed_dict={text_input: french_sentences})\n",
+        "it_result = session.run(embedded_text, feed_dict={text_input: italian_sentences})\n",
+        "zh_result = session.run(embedded_text, feed_dict={text_input: chinese_sentences})\n",
+        "ko_result = session.run(embedded_text, feed_dict={text_input: korean_sentences})\n",
+        "ja_result = session.run(embedded_text, feed_dict={text_input: japanese_sentences})"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "jhLPq6AROyFk"
+      },
+      "source": [
+        "## Visualize Embedding Similarity\n",
+        "With the sentence embeddings now in hand, we can visualize semantic similarity across different languages."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 0,
+      "metadata": {
+        "colab": {},
+        "colab_type": "code",
+        "id": "3Peiq_iSNJNX"
+      },
+      "outputs": [],
+      "source": [
+        "def visualize_similarity(embeddings_1, embeddings_2, labels_1, labels_2, plot_title):\n",
+        "  corr = np.inner(embeddings_1, embeddings_2)\n",
+        "  g = sns.heatmap(corr,\n",
+        "                  xticklabels=labels_1,\n",
+        "                  yticklabels=labels_2,\n",
+        "                  vmin=0,\n",
+        "                  vmax=1,\n",
+        "                  cmap=\"YlOrRd\")\n",
+        "  g.set_yticklabels(g.get_yticklabels(), rotation=0)\n",
+        "  g.set_title(plot_title)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "imn28LCiQO7d"
+      },
+      "source": [
+        "### English-Italian Similarity"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 0,
+      "metadata": {
+        "colab": {},
+        "colab_type": "code",
+        "id": "X9uD3DirPIGd"
+      },
+      "outputs": [],
+      "source": [
+        "visualize_similarity(en_result, it_result, english_sentences, italian_sentences, \"English-Italian Similarity\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "BJkL6Az0QXNN"
+      },
+      "source": [
+        "### English-Japanese Similarity"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 0,
+      "metadata": {
+        "colab": {},
+        "colab_type": "code",
+        "id": "CH_BXVGhQ0GL"
+      },
+      "outputs": [],
+      "source": [
+        "visualize_similarity(en_result, ja_result, english_sentences, japanese_sentences, \"English-Japanese Similarity\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "m6ySvEGbQaTM"
+      },
+      "source": [
+        "### Italian-Japanese Similarity"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 0,
+      "metadata": {
+        "colab": {},
+        "colab_type": "code",
+        "id": "irfwIeitQ7V6"
+      },
+      "outputs": [],
+      "source": [
+        "visualize_similarity(it_result, ja_result, italian_sentences, japanese_sentences, \"Italian-Japanese Similarity\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "rRabHHQYQfLr"
+      },
+      "source": [
+        "### And more...\n",
+        "\n",
+        "The above examples can be extended to any language pair from **English, Spanish, German, French, Italian, Chinese, Korean, and Japanese**. Happy coding!"
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "collapsed_sections": [
+        "RUymE2l9GZfO"
+      ],
+      "last_runtime": {
+        "build_target": "",
+        "kind": "local"
+      },
+      "name": "Cross-Lingual Similarity with TF-Hub Multilingual Universal Encoder",
+      "provenance": [],
+      "version": "0.3.2"
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}