Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern to extract conversation thread for embeddings: #26

Open
irthomasthomas opened this issue Sep 5, 2023 · 1 comment
Open

Pattern to extract conversation thread for embeddings: #26

irthomasthomas opened this issue Sep 5, 2023 · 1 comment
Labels
embeddings vector embeddings and related tools idea Just a seed of an idea llm Large Language Models

Comments

@irthomasthomas
Copy link
Owner

Algorithm to extract conversation thread and send to OpenAI API for embeddings:

import sqlite3
from openai import OpenAIAPI

def fetch_conversation(conversation_id):
    
    # Create a DB connection
    conn = sqlite3.connect('chatgpt_conversation_db.db') 
    cursor = conn.cursor() 

    # SQL query to retrieve a conversation based on id
    query = """SELECT responses.prompt, responses.response 
               FROM responses 
               WHERE responses.conversation_id = ?"""

    # Execute the query
    cursor.execute(query, (conversation_id,))

    # Fetch results
    conversation = cursor.fetchall()

    conn.close()

    return conversation

def send_to_openai_api(conversation):
    
    convo_text = "
".join([f"User: {c[0]}
ChatGPT: {c[1]}" for c in conversation])

    openai_api = OpenAIAPI("your-api-key")

    embeddings = openai_api.encode(convo_text)

    return embeddings

Potential usages for embeddings and chat DB:

  1. Conversation Classification: We can use the embeddings to train machine learning models that classify the conversations by their content or sentiment.

  2. Topic Modeling: The embeddings can be used to conduct topic modeling to understand the main topics discussed during the conversation.

  3. Information Retrieval: The chat database could be utilized to build a retrieval-based chatbot that fetches relevant information based on context.

  4. User Behavior Understanding: Analyzing chat logs can help in understanding user behavior, preferences, and interaction patterns.

Approaches to enrich the database:

  1. Adding metadata: Information like user demographics, time of conversation, etc., can add value to the analyses.

  2. Adding conversation context: Adding data about the context of the conversation can provide help in retrieving and understanding the conversation better.

Creating topic system:

We can introduce a table "topics" with columns for "topic_id" and "topic_name". We add a "topic_id" column to "conversations" table. We then use a topic modeling algorithm like LDA (Latent Dirichlet Allocation) on conversation text to find main topics and link conversations to these topics.

CREATE TABLE [topics] (
   [id] INTEGER PRIMARY KEY,
   [name] TEXT
);
ALTER TABLE [conversations] 
ADD COLUMN [topic_id] INTEGER REFERENCES [topics]([id]);

Then to retrieve chats based on their topic, we can query:

SELECT * 
FROM conversations, responses, topics
WHERE conversations.id = responses.conversation_id 
AND conversations.topic_id = topics.id
AND topics.name = ?;
@irthomasthomas irthomasthomas added embeddings vector embeddings and related tools llm Large Language Models labels Sep 5, 2023
@irthomasthomas irthomasthomas added the idea Just a seed of an idea label Sep 5, 2023
@irthomasthomas
Copy link
Owner Author

irthomasthomas commented Sep 5, 2023

Since this was co-written, simonw has added an embeddings feature to llm cli.
It supports local models and openai ada2 embeddings api.

https://simonwillison.net/2023/Sep/4/llm-embeddings/

@irthomasthomas irthomasthomas changed the title Algorithm to extract conversation thread for embeddings: Pattern to extract conversation thread for embeddings: Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
embeddings vector embeddings and related tools idea Just a seed of an idea llm Large Language Models
Projects
None yet
Development

No branches or pull requests

1 participant