diff --git a/notebook/Logs Notebook-cp4d.ipynb b/notebook/Logs Notebook-cp4d.ipynb
new file mode 100644
index 0000000..ce268b1
--- /dev/null
+++ b/notebook/Logs Notebook-cp4d.ipynb
@@ -0,0 +1,469 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Waston Assistant Logs Notebook\n",
+ "### IBM Cloud Pak for Data version"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Introduction\n",
+ "This notebook demonstrates how to download Watson Assistant user-generated logs based on different criteria.\n",
+ "\n",
+ "### Programming language and environment\n",
+ "Some familiarity with Python is recommended. This notebook runs on Python 3.7+"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n",
+ "## 1. Configuration and Setup\n",
+ "\n",
+ "In this section, we add data and workspace access credentials, import required libraries and functions.\n",
+ "\n",
+ "### 1.1 Install Assistant Improve Toolkit"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "!pip install --user --upgrade \"assistant-improve-toolkit\";"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 1.2 Import functions used in the notebook"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Import Watson Assistant related functions\n",
+ "from ibm_cloud_sdk_core.authenticators import IAMAuthenticator\n",
+ "from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator\n",
+ "import pandas as pd\n",
+ "import json\n",
+ "from ibm_watson import AssistantV1, AssistantV2\n",
+ "\n",
+ "from src.assistant_improve_toolkit.watson_assistant_func import get_logs\n",
+ "from src.assistant_improve_toolkit.watson_assistant_func import get_assistant_definition\n",
+ "from src.assistant_improve_toolkit.watson_assistant_func import load_logs_from_file\n",
+ "from src.assistant_improve_toolkit.watson_assistant_func import export_csv_for_intent_recommendation"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2. Load and format data "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "pycharm": {
+ "name": "#%% md\n"
+ }
+ },
+ "source": [
+ "### 2.1 Add Watson Assistant configuration\n",
+ "\n",
+ "The notebook uses `CloudPakForDataAuthenticator` to authenticate the APIs.\n",
+ "\n",
+ "- Replace `username` and `password` with your Cloud Pak for Data credentials\n",
+ "- `base_url` is the base URL of your instance. It is in the format of `https://{cpd_cluster_host}{:port}/icp4d-api`\n",
+ "- The string to set for version is a date in the format version=YYYY-MM-DD. The version date string determines which version of the Watson Assistant v1/v2 API will be called. For more information about version, see [Versioning](https://cloud.ibm.com/apidocs/assistant-data-v1?code=python#versioning)\n",
+ "- The string to pass into `assistant.set_service_url` is the service URL of your Watson Assistant. The URL follows this pattern: `https://{cpd_cluster_host}{:port}/assistant/{release}/instances/{instance_id}/api`. To find this URL, view the details for the service instance from the Cloud Pak for Data web client. For more information, see [Service Endpoint](https://cloud.ibm.com/apidocs/assistant-data-v1?code=python#service-endpoint)\n",
+ "\n",
+ "The notebook requires initializing both v1 API instance `sdk_v1_object` and v2 API instance `sdk_v2_object`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Provide credentials to connect to assistant\n",
+ "# Set disable_ssl_verification=True for self-signed certificate\n",
+ "authenticator = CloudPakForDataAuthenticator(\n",
+ " username='username',\n",
+ " password='password',\n",
+ " url='base_url',\n",
+ " disable_ssl_verification=False\n",
+ ")\n",
+ "\n",
+ "# Initialize v1 API instance\n",
+ "sdk_v1_object = AssistantV1(version='2020-04-01', authenticator = authenticator)\n",
+ "sdk_v1_object.set_service_url('service_url')\n",
+ "\n",
+ "# Initialize v2 API instance\n",
+ "sdk_v2_object = AssistantV2(version='2020-09-24', authenticator = authenticator)\n",
+ "sdk_v2_object.set_service_url('service_url')\n",
+ "\n",
+ "# Set set_disable_ssl_verification to True for self-signed certificate\n",
+ "# sdk_v1_object.set_disable_ssl_verification(True)\n",
+ "# sdk_v2_object.set_disable_ssl_verification(True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Add the information of your assistant. To load the skill of an assistant in the next section, you need to provide either Workspace ID or Skill ID. To locate your assistant ID, open the assistant settings and click __API Details__. To location your workspace ID or skill ID, go to the Skills page and select __View API Details__ from the menu of a skill tile. If you are using versioning in Watson Assistant, this ID represents the Development version of your skill definition."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "assistant_information = {'workspace_id' : '',\n",
+ " 'skill_id' : '',\n",
+ " 'assistant_id' : ''}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 2.2 Fetch and load logs\n",
+ "\n",
+ "- `num_logs`: number of logs to fetch\n",
+ "- Use `filename` to specify if logs are saved as a JSON file (default: `None`)\n",
+ "- Apply `filters` while fetching logs (default: `[]`), e.g.,\n",
+ " - removing empty input: `meta.summary.input_text_length_i>0`\n",
+ " - fetching logs generated after a timestamp: `response_timestamp>=2018-09-18`\n",
+ " \n",
+ " Refer to [Filter query reference](https://cloud.ibm.com/docs/services/assistant?topic=assistant-filter-reference) for\n",
+ " more information.\n",
+ "- Use `project` to specify project when using Watson Studio (default: `None`)\n",
+ "- Use `overwrite` to overwrite if `filename` exists (default: `False`)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "__A. Download all logs for a period of time (and save as a JSON file for Measure notebook)__"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Add filter queries\n",
+ "filters = ['language::en', # Logs in English\n",
+ " 'meta.summary.input_text_length_i>0', # Logs with non empty input \n",
+ " 'response_timestamp>=2020-03-01'] # Logs with response timestamp later or equal to 2020-03-01\n",
+ "\n",
+ "# Query 20,000 logs\n",
+ "filename = 'logs.json'\n",
+ "\n",
+ "# Fetch 20,000 logs, set `overwrite` to True to reload logs, set version=2 to use v2 log apis\n",
+ "logs = get_logs(sdk_v1_object=sdk_v1_object,\n",
+ " sdk_v2_object=sdk_v2_object,\n",
+ " assistant_info=assistant_information,\n",
+ " num_logs=20000,\n",
+ " filename=filename,\n",
+ " filters=filters,\n",
+ " overwrite=True,\n",
+ " project=None,\n",
+ " version=2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "__B. Download and export logs for intent recommendation__\n",
+ "\n",
+ "For intent recommendation, by default, an utterance is considered only when:\n",
+ "- It is the first user utterance in each conversation\n",
+ "- its confidence `response.intents::confidence` is between 0.1 and 0.6 (exclusive),\n",
+ "- its token count is between 3 and 20 (exclusive), and\n",
+ "- it is not a duplicate of the other utterances in the logs.\n",
+ "\n",
+ "This example adds confidence filters when calling `get_logs`, and then exports the utterances to a CSV file by calling\n",
+ "`export_csv_for_intent_recommendation` with token count filter and dedeplication applied.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "pycharm": {
+ "name": "#%%\n"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Add filter queries\n",
+ "filters = ['language::en', # Logs in English\n",
+ " 'request.context.system.dialog_turn_counter::1', # The first user utterance in each conversation\n",
+ " 'response.intents:confidence<0.6', # filter out high intent confidence utterance\n",
+ " 'response.intents:confidence>0.1', # filter out low intent confidnce utterance\n",
+ " ]\n",
+ "\n",
+ "# Query 20,000 logs using filename 'log_first_utterances.json'\n",
+ "logs = get_logs(sdk_v1_object=sdk_v1_object,\n",
+ " sdk_v2_object=sdk_v2_object,\n",
+ " assistant_info=assistant_information,\n",
+ " num_logs=20000,\n",
+ " filename='log_for_intent_recommendation.json',\n",
+ " filters=filters,\n",
+ " overwrite=True,\n",
+ " version=2)\n",
+ "\n",
+ "# Or, load previously saved logs.\n",
+ "logs = load_logs_from_file(filename='log_for_intent_recommendation.json')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Export logs to a CSV file for intent recommendation\n",
+ "\n",
+ "- `logs`: the logs object from `get_logs` or `load_logs_from_file`\n",
+ "- `filename`: the CSV output filename\n",
+ "- Use `deduplicate` to specify if duplicate messages should be removed (default: `True`)\n",
+ "- Use `project` to specify project when using Watson Studio (default: `None`)\n",
+ "- Use `overwrite` to overwrite if `filename` exists (default: `False`)\n",
+ "- Use `min_length` to filter out utterances that are less than certain number of tokens (exclusive, default: `3`)\n",
+ "- Use `max_length` to filter out utterances that are more than certain number of tokens (exclusive, default: `20`)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "pycharm": {
+ "name": "#%%\n"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "export_csv_for_intent_recommendation(logs,\n",
+ " filename='log_for_intent_recommendation.csv',\n",
+ " deduplicate=True,\n",
+ " min_length=3,\n",
+ " max_length=20,\n",
+ " overwrite=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "__C. More examples__\n",
+ "\n",
+ "Download logs of the first user utterance in each conversation for a period of time"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Add filter queries\n",
+ "filters = ['language::en', # Logs in English \n",
+ " 'request.context.system.dialog_turn_counter::1', # The first user utterance in each conversation\n",
+ " 'response_timestamp>=2020-03-01'] # Logs with response timestamp later or equal to 2020-03-01\n",
+ "\n",
+ "# Query 20,000 logs using filename 'log_first_utterances.json'\n",
+ "logs = get_logs(sdk_v1_object=sdk_v1_object,\n",
+ " sdk_v2_object=sdk_v2_object,\n",
+ " assistant_info=assistant_information,\n",
+ " num_logs=20000,\n",
+ " filename='log_first_utterances.json',\n",
+ " filters=filters,\n",
+ " overwrite=True,\n",
+ " version=2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Download logs containing specific input text"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "pycharm": {
+ "name": "#%%\n"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Add filter queries\n",
+ "filters = ['language::en', # Logs in English\n",
+ " 'request.input.text::\"Is there an article on how to make cherry pie?\"'] # Logs with input text: \"Is there an article on how to make cherry pie?\"\n",
+ "\n",
+ "# Query 20,000 logs using filename 'log_input.json'\n",
+ "logs = get_logs(sdk_v1_object=sdk_v1_object,\n",
+ " sdk_v2_object=sdk_v2_object,\n",
+ " assistant_info=assistant_information,\n",
+ " num_logs=20000,\n",
+ " filename='log_input.json',\n",
+ " filters=filters,\n",
+ " overwrite=True,\n",
+ " version=2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Download logs trigging specific intent"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Add filter queries\n",
+ "filters = ['language::en', # Logs in English\n",
+ " 'response.intents:intent::\"article_food\"'] # Intent been triggered: article_food\n",
+ "# Query 20,000 logs using filename log_intent.json\n",
+ "logs = get_logs(sdk_v1_object=sdk_v1_object,\n",
+ " sdk_v2_object=sdk_v2_object,\n",
+ " assistant_info=assistant_information,\n",
+ " num_logs=20000,\n",
+ " filename='log_intent.json',\n",
+ " filters=filters,\n",
+ " overwrite=True,\n",
+ " version=2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Download logs trigging specific intent with a confidence range"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Add filter queries\n",
+ "filters = ['language::en', # Logs in English\n",
+ " 'response.intents:(intent:article_food,confidence<0.25)'] # Intent been triggered: article_food with confidence below 0.25\n",
+ "# Query 20,000 logs using filename log_intent.json\n",
+ "logs = get_logs(sdk_v1_object=sdk_v1_object,\n",
+ " sdk_v2_object=sdk_v2_object,\n",
+ " assistant_info=assistant_information,\n",
+ " num_logs=20000,\n",
+ " filename='log_intent_confidence.json',\n",
+ " filters=filters,\n",
+ " overwrite=True,\n",
+ " version=2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Download logs visited specific node"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Fetch assistant definition and save to a JSON file\n",
+ "df_assistant = get_assistant_definition(sdk_v1_object, assistant_information, filename='assistant_definition.json')\n",
+ "\n",
+ "# Get all intents\n",
+ "assistant_intents = [intent['intent'] for intent in df_assistant['intents'].values[0]] \n",
+ "\n",
+ "# Get all dialog nodes\n",
+ "assistant_nodes = pd.DataFrame(df_assistant['dialog_nodes'].values[0])\n",
+ "\n",
+ "# Find mappings betweeen node name and node id\n",
+ "node_title_map = dict()\n",
+ "for idx, node in assistant_nodes.iterrows():\n",
+ " if str(node['title']) != 'nan':\n",
+ " node_title_map[node['title']] = node['dialog_node']\n",
+ "node_df = pd.DataFrame(node_title_map.items())\n",
+ "node_df.columns = {'node_name', 'node_id'}\n",
+ "\n",
+ "# Add filter queries\n",
+ "intent_name = 'book_short_dialog'\n",
+ "if intent_name in node_title_map:\n",
+ " filters = ['language::en', # Logs in English\n",
+ " 'response.output:nodes_visited::[{}]'.format(node_title_map[intent_name])] # Visited node: book_short_dialog\n",
+ " # Query 20,000 logs using filename log_node.json\n",
+ " logs = get_logs(sdk_v1_object=sdk_v1_object,\n",
+ " sdk_v2_object=sdk_v2_object,\n",
+ " assistant_info=assistant_information,\n",
+ " num_logs=20000,\n",
+ " filename='log_node.json',\n",
+ " filters=filters,\n",
+ " overwrite=True,\n",
+ " version=2)\n",
+ "else:\n",
+ " print('Cannot find {} in skill definition.'.format(intent_name))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Copyright © 2021 IBM. This notebook and its source code are released under the terms of the MIT License."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.10"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}