JohnSnowLabs · maziyarpanahi · Aug 24, 2023 · Jul 27, 2023 · Aug 2, 2023 · Aug 3, 2023
diff --git a/examples/util/OpenAICompletion.ipynb b/examples/util/OpenAICompletion.ipynb
@@ -0,0 +1,300 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "fTyNlgUztaq9",
+    "outputId": "e36bd296-66e8-4153-e0ff-494028f92fda"
+   },
+   "source": [
+    "![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)\n",
+    "\n",
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/util/OpenAICompletion.ipynb)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "E81LqDJzJIW7",
+    "outputId": "8670b8f9-22e1-4782-f5b0-7c77d0308eca"
+   },
+   "source": [
+    "## OpenAICompletion in SparkNLP"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "ZXU_LZZUJI6V"
+   },
+   "source": [
+    "In this notebook, we'll explore the process of utilizing OpenAICompletition within SparkNLP's framework.\n",
+    "\n",
+    "Spark NLP offers a seamless integration with various OpenAI APIs, presenting a powerful synergy. With the introduction of Spark NLP 5.1.0, leveraging the OpenAICompletition and OpenAIEmbeddings transformers becomes achievable. This integration not only ensures the utilization of OpenAI's capabilities but also capitalizes on Spark's inherent scalability advantages."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "k3HjBjIC9b2W"
+   },
+   "source": [
+    "## Spark NLP Settings"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 73
+    },
+    "id": "d3o4GxH-9IQk",
+    "outputId": "5f9e0ae7-3b44-40fc-f106-b1018f18b994"
+   },
+   "source": [
+    "All you need to do is to setup your [OpenAI API Key](https://platform.openai.com/docs/api-reference/authentication) and add it to Spark properties"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(\"Enter your OPENAI API Key:\")\n",
+    "OPENAI_API_KEY = input()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "XLNO3Z9r6HgR"
+   },
+   "outputs": [],
+   "source": [
+    "from sparknlp.annotator import *\n",
+    "from pyspark.ml import Pipeline\n",
+    "from sparknlp.base import LightPipeline"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sparknlp\n",
+    "# let's start Spark with Spark NLP\n",
+    "openai_params = {\"spark.jsl.settings.openai.api.key\": OPENAI_API_KEY}\n",
+    "spark = sparknlp.start(params=openai_params)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "_eB72Yzg8_Jx"
+   },
+   "outputs": [],
+   "source": [
+    "document_assembler = DocumentAssembler() \\\n",
+    "        .setInputCol(\"text\") \\\n",
+    "        .setOutputCol(\"document\")\n",
+    "\n",
+    "openai_completion = OpenAICompletion() \\\n",
+    "       .setInputCols(\"document\") \\\n",
+    "       .setOutputCol(\"completion\") \\\n",
+    "       .setModel(\"text-davinci-003\") \\\n",
+    "       .setMaxTokens(50)\n",
+    "\n",
+    "# Define the pipeline\n",
+    "pipeline = Pipeline(stages=[\n",
+    "    document_assembler, openai_completion\n",
+    "])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "tRyju8D-6XJ1",
+    "outputId": "f10dc46d-6d51-4044-a2cd-42acb401df0d"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "+--------------------+\n",
+      "|                text|\n",
+      "+--------------------+\n",
+      "|Generate a restau...|\n",
+      "|Write a review fo...|\n",
+      "|Create a JSON wit...|\n",
+      "+--------------------+\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "empty_df = spark.createDataFrame([[\"\"]], [\"text\"])\n",
+    "sample_text= [[\"Generate a restaurant review.\"], [\"Write a review for a local eatery.\"], [\"Create a JSON with a review of a dining experience.\"]]\n",
+    "sample_df= spark.createDataFrame(sample_text).toDF(\"text\")\n",
+    "sample_df.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "X5G4_BXwOYtC"
+   },
+   "outputs": [],
+   "source": [
+    "pipeline_model = pipeline.fit(empty_df)\n",
+    "completion_df = pipeline_model.transform(sample_df)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "wz7nZpIY4aYh",
+    "outputId": "249517f5-b2e5-4246-b300-a4ff273d4853"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n",
+      "|completion                                                                                                                                                                                                                                                                                      |\n",
+      "+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n",
+      "|[{document, 0, 256, \\n\\nI had the pleasure of eating at [name of restaurant] recently and I was thoroughly impressed. The atmosphere was welcoming and warm, making it a great spot for dinner with friends or family. The service was fantastic - our waiter was incredibly attentive, {}, []}]|\n",
+      "|[{document, 0, 243, \\n\\nI recently visited Lao Restaurant for lunch and had a wonderful experience. As soon as we walked in, we were greeted with smiles and warm welcomes from the staff. We decided to try some of the traditional Lao dishes and were not disappointed, {}, []}]             |\n",
+      "|[{document, 0, 158, \\n\\n{ \\n    restaurant: \"Little India\", \\n    foodRating: 5, \\n    serviceRating: 4, \\n    review: \"This was my first time trying Indian food and I'm so glad I did!, {}, []}]                                                                                              |\n",
+      "+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "completion_df.select(\"completion\").show(truncate=False)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "5jAaTlBN_lNu"
+   },
+   "source": [
+    "LightPipeline"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "roBEvzkz_oQJ"
+   },
+   "outputs": [],
+   "source": [
+    "light_pipeline_openai = LightPipeline(pipeline_model)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "2Oq7PEFWe5B7",
+    "outputId": "c824e475-131e-4547-ceb6-e158784994ca"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[{'document': [Annotation(document, 0, 36, Generate a negative review of a movie, {}, [])],\n",
+       "  'completion': [Annotation(document, 0, 235, \n",
+       "   \n",
+       "   This movie was terrible! The plot was extremely contrived and the acting was terrible. It was just such a waste of time and money. The dialogue was especially bad and the special effects were really cheap. Even if you try to watch it, {}, [])]}]"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "light_pipeline_openai.fullAnnotate(\"Generate a negative review of a movie\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "FhKPEMb09w6a",
+    "outputId": "6c6bf1ab-c38c-4350-84bb-4e52260648a1"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'document': ['Generate a negative review of a movie'],\n",
+       " 'completion': ['\\n\\nThis movie was terrible! Despite the star-studded cast, the story was a jumbled mess that made no sense. The characters had no motivation and the dialogue was pretty stiff and unbelievable. The special effects were way over the top']}"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "light_pipeline_openai.annotate(\"Generate a negative review of a movie\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}