Skip to content

Commit

Permalink
Colab demonstrating basic usage of multilingual universal encoder (un…
Browse files Browse the repository at this point in the history
…iversal-encoder-xling-many) module.

PiperOrigin-RevId: 233990556
  • Loading branch information
TensorFlow Hub Authors authored and vbardiovskyg committed Feb 26, 2019
1 parent d9e6efb commit 61113ef
Showing 1 changed file with 335 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,335 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "RUymE2l9GZfO"
},
"source": [
"##### Copyright 2019 The TensorFlow Hub Authors.\n",
"\n",
"Licensed under the Apache License, Version 2.0 (the \"License\");"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"cellView": "code",
"colab": {},
"colab_type": "code",
"id": "JMyTNwSJGGWg"
},
"outputs": [],
"source": [
"# Copyright 2019 The TensorFlow Hub Authors. All Rights Reserved.\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# http://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License.\n",
"# =============================================================================="
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "co7MV6sX7Xto"
},
"source": [
"# Multilingual Universal Sentence Encoder\n",
"\n",
"\n",
"\u003ctable align=\"left\"\u003e\u003ctd\u003e\n",
" \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/cross_lingual_similarity_with_tf_hub_multilingual_universal_encoder.ipynb\"\u003e\n",
" \u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\n",
" \u003c/a\u003e\n",
"\u003c/td\u003e\u003ctd\u003e\n",
" \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/hub/blob/master/examples/colab/cross_lingual_similarity_with_tf_hub_multilingual_universal_encoder.ipynb\"\u003e\n",
" \u003cimg width=32px src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
"\u003c/td\u003e\u003c/table\u003e\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "eAVQGidpL8v5"
},
"source": [
"This notebook illustrates how to access the Multilingual Universal Sentence Encoder module and use it for sentence similarity across multiple languages. This module is an extension of the [original Universal Encoder module](https://tfhub.dev/google/universal-sentence-encoder/2)."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "pOTzp8O36CyQ"
},
"source": [
"# Getting Started\n",
"\n",
"This section sets up the environment for access to the Multilingual Universal Sentence Encoder Module and also prepares a set of English sentences and their translations. In the following sections, the multilingual module will be used to compute similarity *across languages*."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "lVjNK8shFKOC"
},
"outputs": [],
"source": [
"# Install the latest TensorFlow version compatible with tf-sentencepiece.\n",
"!pip3 install --quiet tensorflow==1.12.0\n",
"# Install TF-Hub.\n",
"!pip3 install --quiet tensorflow-hub\n",
"!pip3 install --quiet seaborn\n",
"# Install Sentencepiece.\n",
"!pip3 install --quiet tf-sentencepiece"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "63Pd3nJnTl-i"
},
"source": [
"More detailed information about installing Tensorflow can be found at [https://www.tensorflow.org/install/](https://www.tensorflow.org/install/)."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "MSeY-MUQo2Ha"
},
"outputs": [],
"source": [
"import tensorflow as tf\n",
"import tensorflow_hub as hub\n",
"import numpy as np\n",
"import seaborn as sns\n",
"import tf_sentencepiece"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "Q8F4LNGFqOiq"
},
"outputs": [],
"source": [
"# Some texts of different lengths in different languages.\n",
"english_sentences = [\"dog\", \"Puppies are nice.\", \"I enjoy taking long walks along the beach with my dog.\"]\n",
"spanish_sentences = [\"perro\", \"Los cachorros son agradables.\", \"Disfruto de dar largos paseos por la playa con mi perro.\"]\n",
"german_sentences = [\"Hund\", \"Welpen sind nett.\", \"Ich genieße lange Spaziergänge am Strand entlang mit meinem Hund.\"]\n",
"french_sentences = [\"chien\", \"Les chiots sont gentils.\", \"J'aime faire de longues promenades sur la plage avec mon chien.\"]\n",
"italian_sentences = [\"cane\", \"I cuccioli sono carini.\", \"Mi piace fare lunghe passeggiate lungo la spiaggia con il mio cane.\"]\n",
"chinese_sentences = [\"\", \"小狗很好。\", \"我喜欢和我的狗一起沿着海滩散步。\"]\n",
"korean_sentences = [\"\", \"강아지가 좋다.\", \"나는 나의 산책을 해변을 따라 길게 산책하는 것을 즐긴다.\"]\n",
"japanese_sentences = [\"\", \"子犬はいいです\", \"私は犬と一緒にビーチを散歩するのが好きです\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "8xdAogbxJDTD"
},
"source": [
"## Computing Text Embeddings\n",
"We first precompute the embeddings for all of our sentences."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "weXZqLtTJY9b"
},
"outputs": [],
"source": [
"# The 8-language multilingual module. There are also en-es, en-de, and en-fr bilingual modules.\n",
"module_url = \"https://tfhub.dev/google/universal-sentence-encoder-xling-many/1\"\n",
"\n",
"# Set up graph.\n",
"g = tf.Graph()\n",
"with g.as_default():\n",
" text_input = tf.placeholder(dtype=tf.string, shape=[None])\n",
" xling_8_embed = hub.Module(module_url)\n",
" embedded_text = xling_8_embed(text_input)\n",
" init_op = tf.group([tf.global_variables_initializer(), tf.tables_initializer()])\n",
"g.finalize()\n",
"\n",
"# Initialize session.\n",
"session = tf.Session(graph=g)\n",
"session.run(init_op)\n",
"\n",
"# Compute embeddings.\n",
"en_result = session.run(embedded_text, feed_dict={text_input: english_sentences})\n",
"es_result = session.run(embedded_text, feed_dict={text_input: spanish_sentences})\n",
"de_result = session.run(embedded_text, feed_dict={text_input: german_sentences})\n",
"fr_result = session.run(embedded_text, feed_dict={text_input: french_sentences})\n",
"it_result = session.run(embedded_text, feed_dict={text_input: italian_sentences})\n",
"zh_result = session.run(embedded_text, feed_dict={text_input: chinese_sentences})\n",
"ko_result = session.run(embedded_text, feed_dict={text_input: korean_sentences})\n",
"ja_result = session.run(embedded_text, feed_dict={text_input: japanese_sentences})"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "jhLPq6AROyFk"
},
"source": [
"## Visualize Embedding Similarity\n",
"With the sentence embeddings now in hand, we can visualize semantic similarity across different languages."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "3Peiq_iSNJNX"
},
"outputs": [],
"source": [
"def visualize_similarity(embeddings_1, embeddings_2, labels_1, labels_2, plot_title):\n",
" corr = np.inner(embeddings_1, embeddings_2)\n",
" g = sns.heatmap(corr,\n",
" xticklabels=labels_1,\n",
" yticklabels=labels_2,\n",
" vmin=0,\n",
" vmax=1,\n",
" cmap=\"YlOrRd\")\n",
" g.set_yticklabels(g.get_yticklabels(), rotation=0)\n",
" g.set_title(plot_title)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "imn28LCiQO7d"
},
"source": [
"### English-Italian Similarity"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "X9uD3DirPIGd"
},
"outputs": [],
"source": [
"visualize_similarity(en_result, it_result, english_sentences, italian_sentences, \"English-Italian Similarity\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "BJkL6Az0QXNN"
},
"source": [
"### English-Japanese Similarity"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "CH_BXVGhQ0GL"
},
"outputs": [],
"source": [
"visualize_similarity(en_result, ja_result, english_sentences, japanese_sentences, \"English-Japanese Similarity\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "m6ySvEGbQaTM"
},
"source": [
"### Italian-Japanese Similarity"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "irfwIeitQ7V6"
},
"outputs": [],
"source": [
"visualize_similarity(it_result, ja_result, italian_sentences, japanese_sentences, \"Italian-Japanese Similarity\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "rRabHHQYQfLr"
},
"source": [
"### And more...\n",
"\n",
"The above examples can be extended to any language pair from **English, Spanish, German, French, Italian, Chinese, Korean, and Japanese**. Happy coding!"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [
"RUymE2l9GZfO"
],
"last_runtime": {
"build_target": "",
"kind": "local"
},
"name": "Cross-Lingual Similarity with TF-Hub Multilingual Universal Encoder",
"provenance": [],
"version": "0.3.2"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

0 comments on commit 61113ef

Please sign in to comment.