forked from truera/trulens
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[MLNN-1046] Example app with TruLens (truera#500)
* add langchain multi-retrieval agents + chroma vector mgr example * basic one feedback function working e2e * fix deps version * add response length custom feedback func * update notebook with markdown + more feedback functions + deferred mode add markdown and colab widget remove ckp
- Loading branch information
1 parent
f473da1
commit 1b880b4
Showing
1 changed file
with
377 additions
and
0 deletions.
There are no files selected for viewing
377 changes: 377 additions & 0 deletions
377
trulens_eval/examples/quickstart/langchain_retrieval_agent.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,377 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Langchain retrieval agent \n", | ||
"In this notebook, we are building a Langchain agent to take in user input and figure out the best tool(s) to use via chain of thought (CoT) reasoning. \n", | ||
"\n", | ||
"Given we have more than one distinct tasks defined in the tools for our agent, one being summarization and another one, which generates multiple choice questions and corresponding answers, being more similar to traditional Natural Language Understanding (NLU), we are also leveraging deferred evaluation. By creating different options for context evaluation, we can use deferred evaluations to try both and use the one that matches the structure of the serialized record. This is especially true when not all of the feedback functions defined are applicable to the selected tool(s) by the agent. In the below notebook, you can see we also run the evaluations at a later time. \n", | ||
"\n", | ||
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/quickstart/langchain_retrieval_agent.ipynb)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#! pip install trulens_eval==0.15.3 langchain==0.0.315 unstructured==0.10.23 chromadb==0.4.14" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"from langchain.document_loaders import WebBaseLoader\n", | ||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n", | ||
"from langchain.prompts import PromptTemplate\n", | ||
"\n", | ||
"\n", | ||
"from langchain.chat_models import ChatOpenAI\n", | ||
"from langchain.chains import RetrievalQA\n", | ||
"\n", | ||
"from langchain import OpenAI\n", | ||
"\n", | ||
"from langchain.agents import Tool\n", | ||
"from langchain.agents import initialize_agent\n", | ||
"from langchain.memory import ConversationSummaryBufferMemory\n", | ||
"from langchain.embeddings import OpenAIEmbeddings\n", | ||
"from langchain.vectorstores import Chroma\n", | ||
"\n", | ||
"import openai\n", | ||
"\n", | ||
"\n", | ||
"from trulens_eval import TruChain, Feedback, Tru, feedback, Select, FeedbackMode\n", | ||
"from trulens_eval.feedback import OpenAI as fOpenAI\n", | ||
"\n", | ||
"os.environ[\"OPENAI_API_KEY\"] = \"...\"\n", | ||
"tru = Tru()\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Define custom class that loads dcouments into local vector store.\n", | ||
"We are using Chroma, one of the open-source embedding database offerings, in the following example" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"class VectorstoreManager:\n", | ||
" def __init__(self):\n", | ||
" self.vectorstore = None # Vectorstore for the current conversation\n", | ||
" self.all_document_splits = [] # List to hold all document splits added during a conversation\n", | ||
"\n", | ||
" def initialize_vectorstore(self):\n", | ||
" \"\"\"Initialize an empty vectorstore for the current conversation.\"\"\"\n", | ||
" self.vectorstore = Chroma(\n", | ||
" embedding_function=OpenAIEmbeddings(), \n", | ||
" )\n", | ||
" self.all_document_splits = [] # Reset the documents list for the new conversation\n", | ||
" return self.vectorstore\n", | ||
"\n", | ||
" def add_documents_to_vectorstore(self, url_lst: list):\n", | ||
" \"\"\"Example assumes loading new documents from websites to the vectorstore during a conversation.\"\"\"\n", | ||
" for doc_url in url_lst:\n", | ||
" document_splits = self.load_and_split_document(doc_url)\n", | ||
" self.all_document_splits.extend(document_splits)\n", | ||
" \n", | ||
" # Create a new Chroma instance with all the documents\n", | ||
" self.vectorstore = Chroma.from_documents(\n", | ||
" documents=self.all_document_splits, \n", | ||
" embedding=OpenAIEmbeddings(), \n", | ||
" )\n", | ||
"\n", | ||
" return self.vectorstore\n", | ||
"\n", | ||
" def get_vectorstore(self):\n", | ||
" \"\"\"Provide the initialized vectorstore for the current conversation. If not initialized, do it first.\"\"\"\n", | ||
" if self.vectorstore is None:\n", | ||
" raise ValueError(\"Vectorstore is not initialized. Please initialize it first.\")\n", | ||
" return self.vectorstore\n", | ||
"\n", | ||
" @staticmethod\n", | ||
" def load_and_split_document(url: str, chunk_size=1000, chunk_overlap=0): \n", | ||
" \"\"\"Load and split a document into chunks.\"\"\"\n", | ||
" loader = WebBaseLoader(url)\n", | ||
" splits = loader.load_and_split(RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap))\n", | ||
" return splits" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"DOC_URL = \"http://paulgraham.com/worked.html\"\n", | ||
"\n", | ||
"vectorstore_manager = VectorstoreManager()\n", | ||
"vec_store = vectorstore_manager.add_documents_to_vectorstore([DOC_URL])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Set up conversational agent with multiple tools.\n", | ||
"The tools are then selected based on the match between their names/descriptions and the user input, for document retrieval, summarization, and generation of question-answering pairs." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"llm = ChatOpenAI(\n", | ||
" model_name='gpt-3.5-turbo-16k',\n", | ||
" temperature=0.0 \n", | ||
" )\n", | ||
"\n", | ||
"\n", | ||
"conversational_memory = ConversationSummaryBufferMemory(\n", | ||
" k=4,\n", | ||
" max_token_limit=64,\n", | ||
" llm=llm,\n", | ||
" memory_key = \"chat_history\",\n", | ||
" return_messages=True\n", | ||
")\n", | ||
"\n", | ||
"\n", | ||
"retrieval_summarization_template = \"\"\"\n", | ||
"System: Follow these instructions below in all your responses:\n", | ||
"System: always try to retrieve documents as knowledge base or external data source from retriever (vector DB). \n", | ||
"System: If performing summarization, you will try to be as accurate and informational as possible.\n", | ||
"System: If providing a summary/key takeaways/highlights, make sure the output is numbered as bullet points.\n", | ||
"If you don't understand the source document or cannot find sufficient relevant context, be sure to ask me for more context information.\n", | ||
"{context}\n", | ||
"Question: {question}\n", | ||
"Action:\n", | ||
"\"\"\"\n", | ||
"question_generation_template = \"\"\"\n", | ||
"System: Based on the summarized context, you are expected to generate a specified number of multiple choice questions and their answers from the context to ensure understanding. Each question, unless specified otherwise, is expected to have 4 options and only correct answer.\n", | ||
"System: Questions should be in the format of numbered list.\n", | ||
"{context}\n", | ||
"Question: {question}\n", | ||
"Action:\n", | ||
"\"\"\"\n", | ||
"\n", | ||
"\n", | ||
"summarization_prompt = PromptTemplate(template=retrieval_summarization_template, input_variables=[\"question\", \"context\"])\n", | ||
"question_generator_prompt = PromptTemplate(template=question_generation_template, input_variables=[\"question\", \"context\"])\n", | ||
"\n", | ||
"\n", | ||
"\n", | ||
"# retrieval qa chain\n", | ||
"summarization_chain = RetrievalQA.from_chain_type(\n", | ||
" llm=llm,\n", | ||
" chain_type=\"stuff\",\n", | ||
" retriever=vec_store.as_retriever(),\n", | ||
" chain_type_kwargs={'prompt': summarization_prompt}\n", | ||
")\n", | ||
"\n", | ||
"question_answering_chain = RetrievalQA.from_chain_type(llm=llm,\n", | ||
" chain_type=\"stuff\",\n", | ||
" retriever=vec_store.as_retriever(),\n", | ||
" chain_type_kwargs={'prompt': question_generator_prompt}\n", | ||
" )\n", | ||
"\n", | ||
"\n", | ||
"tools = [\n", | ||
" Tool(\n", | ||
" name=\"Knowledge Base / retrieval from documents\",\n", | ||
" func=summarization_chain.run,\n", | ||
" description=\"useful for when you need to answer questions about the source document(s).\",\n", | ||
" ),\n", | ||
" \n", | ||
" Tool(\n", | ||
" name=\"Conversational agent to generate multiple choice questions and their answers about the summary of the source document(s)\",\n", | ||
" func=question_answering_chain.run,\n", | ||
" description=\"useful for when you need to have a conversation with a human and hold the memory of the current / previous conversation.\",\n", | ||
" ),\n", | ||
"]\n", | ||
"agent = initialize_agent(\n", | ||
" agent='chat-conversational-react-description',\n", | ||
" tools=tools,\n", | ||
" llm=llm,\n", | ||
" memory=conversational_memory\n", | ||
" )\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Set up Evaluation" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"class OpenAI_custom(fOpenAI):\n", | ||
" def no_answer_feedback(self, question: str, response: str) -> float:\n", | ||
" return float(openai.ChatCompletion.create(\n", | ||
" model=\"gpt-3.5-turbo\",\n", | ||
" messages=[\n", | ||
" {\"role\": \"system\", \"content\": \"Does the RESPONSE provide an answer to the QUESTION? Rate on a scale of 1 to 10. Respond with the number only.\"},\n", | ||
" {\"role\": \"user\", \"content\": f\"QUESTION: {question}; RESPONSE: {response}\"}\n", | ||
" ]\n", | ||
" )[\"choices\"][0][\"message\"][\"content\"]) / 10\n", | ||
"\n", | ||
" def response_length_feedback(self, question: str, response: str) -> float:\n", | ||
" return float(openai.ChatCompletion.create( \n", | ||
" model=\"gpt-3.5-turbo\",\n", | ||
" messages=[\n", | ||
" {\"role\": \"system\", \"content\": \"Is the RESPONSE too long or too short based on the user's specification? Rate on a scale of 1 to 10. Respond with the number only. If the response is for summarization or key takeaways, anything over 100 words is considered too long.\"},\n", | ||
" {\"role\": \"user\", \"content\": f\"QUESTION: {question}; RESPONSE: {response}\"}\n", | ||
" ]\n", | ||
" )[\"choices\"][0][\"message\"][\"content\"]) / 10\n", | ||
"\n", | ||
" def query_translation_score(self, question1: str, question2: str) -> float:\n", | ||
" return float(openai.ChatCompletion.create(\n", | ||
" model=\"gpt-3.5-turbo\",\n", | ||
" messages=[\n", | ||
" {\"role\": \"system\", \"content\": \"Your job is to rate how similar two quesitons are on a scale of 1 to 10. Respond with the number only.\"},\n", | ||
" {\"role\": \"user\", \"content\": f\"QUESTION 1: {question1}; QUESTION 2: {question2}\"}\n", | ||
" ]\n", | ||
" )[\"choices\"][0][\"message\"][\"content\"]) / 10\n", | ||
"\n", | ||
" def qa_generation_score(self, last_context: str) -> float:\n", | ||
" return float(openai.ChatCompletion.create(\n", | ||
" model=\"gpt-3.5-turbo\",\n", | ||
" messages=[\n", | ||
" {\"role\": \"system\", \"content\": \"Your job is to respond with a '1' if the following statement contains multiple choice questions and each question has one correct answer, and a '0' if not.\"},\n", | ||
" {\"role\": \"user\", \"content\": f\"STATEMENT: {last_context}\"}\n", | ||
" ]\n", | ||
" )[\"choices\"][0][\"message\"][\"content\"])\n", | ||
" \n", | ||
"custom = OpenAI_custom()\n", | ||
"\n", | ||
"# No answer feedback (custom)\n", | ||
"f_no_answer = Feedback(custom.no_answer_feedback).on_input_output()\n", | ||
"\n", | ||
"# Query translation feedback (custom) to evaluate the similarity between user's original question and the question genenrated by the agent after paraphrasing.\n", | ||
"f_query_translation = Feedback(\n", | ||
" custom.query_translation_score,name=\"Query Translation\").on_input().on(Select.Record.app.query[0].args.str_or_query_bundle)\n", | ||
"\n", | ||
"f_qa_generation = Feedback(\n", | ||
" custom.qa_generation_score,\n", | ||
" name=\"Question-answer Pairs Generation\").on_input()\n", | ||
"\n", | ||
"# Groundedness between each context chunk and the response.\n", | ||
"grounded = feedback.Groundedness()\n", | ||
"f_groundedness = feedback.Feedback(grounded.groundedness_measure_with_cot_reasons).on_input().on_output().aggregate(grounded.grounded_statements_aggregator)\n", | ||
"\n", | ||
"# Response / transcript length feedback (custom)\n", | ||
"f_transcript_length = Feedback(custom.response_length_feedback).on_input_output()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"tru_agent = TruChain(agent, app_id = \"Conversational_Agent\", feedbacks = [f_no_answer, f_query_translation, f_qa_generation, f_groundedness, f_transcript_length], feedback_mode=FeedbackMode.DEFERRED)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"user_prompts = [\n", | ||
" \"Please summarize the document to a short summary under 100 words\",\n", | ||
" # \"Give me 5 questions in multiple choice format based on the previous summary and give me their answers\" \n", | ||
"]\n", | ||
"\n", | ||
"with tru_agent as recording:\n", | ||
" for prompt in user_prompts:\n", | ||
" print(agent(prompt))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Run Trulens dashboard" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"tru.run_dashboard()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"tru.start_evaluator()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |