Skip to content

Commit

Permalink
Merge pull request #9 from pragunbhutani/staging
Browse files Browse the repository at this point in the history
Feat: Added the DocumentGenerator Module
  • Loading branch information
pragunbhutani authored Mar 30, 2024
2 parents 2b71f20 + 0df9424 commit 39c939f
Show file tree
Hide file tree
Showing 14 changed files with 780 additions and 109 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ name: Upload Python Package

on:
push:
branches: [main, development]
branches: [main, staging]

permissions:
contents: read
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,7 @@ pyrightconfig.json
### Vector Store Instances ###
chroma.db
test_chroma.db
.database/*

### Mac
.DS_Store
72 changes: 51 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
# Ragstar - LLM Tools for DBT Projects

# Ragstar
Ragstar (inspired by `RAG & select *`) is set of LLM powered tools to elevate your dbt projects and supercharge your data team.

Ragstar (inspired by `RAG & select *`) is a tool that enables you to ask ChatGPT questions about your dbt project.
These tools include:

- Chatbot: ask questions about data and get answers based on your dbt model documentation
- Documentation Generator: generate documentation for dbt models based on model and upstream model definition.

## Get Started

Expand All @@ -13,9 +17,9 @@ Ragstar can be installed via pip.
pip install ragstar
```

### Basic Usage
## Basic Usage - Chatbot

How to multiply one number by another with this lib:
How to load your dbt project into the Chatbot and ask questions about your data.

```Python
from ragstar import Chatbot
Expand All @@ -31,21 +35,14 @@ chatbot.load_models()

# Step 2. Ask the chatbot a question
response = chatbot.ask_question(
'How can I obtain the number of customers who upgraded to a paid plan in the last 3 months?'
'How can I obtain the number of customers who upgraded to a paid plan in the last 3 months?'
)
print(response)

# Step 3. Clear your local database (Optional).
# You only need to do this if you would like to load a different project into your db
# or restart from scratch for whatever reason.

# If you make any changes to your existing models and load them again, they get upserted into the database.
chatbot.reset_model_db()
```

**Note**: Ragstar currently only supports OpenAI ChatGPT models for generating embeddings and responses to queries.

## How it works
### How it works

Ragstar is based on the concept of Retrieval Augmented Generation and basically works as follows:

Expand All @@ -55,17 +52,40 @@ Ragstar is based on the concept of Retrieval Augmented Generation and basically
- These models are then fed into ChatGPT as a prompt, along with some basic instructions and your question.
- The response is returned to you as a string.

## Basic Usage - Documentation Generator

How to load your dbt project into the Documentation Generator and have it write documentation for your models.

```Python
from ragstar import DocumentationGenerator

# Instantiate a Documentation Generator object
doc_gen = DocumentationGenerator(
dbt_project_root="YOUR_DBT_PROJECT_PATH",
openai_api_key="YOUR_OPENAI_API_KEY",
)

# Generate documentation for a model and all its upstream models
doc_gen.generate_documentation(
model_name='dbt_model_name',
write_documentation_to_yaml=False
)
```

## Advanced Usage

You can control the behaviour of some of the class member functions in more detail, or inspect the underlying classes for more functionality.

The Chatbot is composed of two classes:

- Vector Store
- DBT Project
- Composed of DBT Model

Here are the classes and methods they expose:

### Chatbot

A class representing a chatbot that allows users to ask questions about dbt models.

Attributes:
Expand All @@ -83,7 +103,8 @@ A class representing a chatbot that allows users to ask questions about dbt mode

### Methods

#### __init__
#### **init**

Initializes a chatbot object along with a default set of instructions.

Args:
Expand All @@ -94,15 +115,16 @@ Initializes a chatbot object along with a default set of instructions.
Defaults to "text-embedding-3-large".

chatbot_model (str, optional): The name of the OpenAI chatbot model to be used.
Defaults to "gpt-4-turbo-preview".
Defaults to "gpt-4-turbo-preview".

db_persist_path (str, optional): The path to the persistent database file.
Defaults to "./chroma.db".
db_persist_path (str, optional): The path to the persistent database file.
Defaults to "./chroma.db".

Returns:
None

#### load_models

Upsert the set of models that will be available to your chatbot into a vector store. The chatbot will only be able to use these models to answer questions and nothing else.

The default behavior is to load all models in the dbt project, but you can specify a subset of models, included folders or excluded folders to customize the set of models that will be available to the chatbot.
Expand Down Expand Up @@ -137,29 +159,34 @@ This will reset and remove all the models from the vector store. You'll need to
None

#### get_instructions

Get the instructions being used to tune the chatbot.

Returns:
list[str]: A list of instructions being used to tune the chatbot.

#### set_instructions

Set the instructions for the chatbot.

Args:
instructions (list[str]): A list of instructions for the chatbot.

Returns:
None

#### set_embedding_model

Set the embedding model for the vector store.

Args:
model (str): The name of the OpenAI embedding model to be used.

Returns:
None

#### set_chatbot_model

Set the chatbot model for the chatbot.

Args:
Expand All @@ -169,9 +196,10 @@ Set the chatbot model for the chatbot.
None

## Appendices

These are the underlying classes that are used to compose the functionality of the chatbot.

### Vector Store
### Vector Store

A class representing a vector store for dbt models.

Expand All @@ -181,16 +209,18 @@ A class representing a vector store for dbt models.
reset_collection: Clear the collection of all documents.

### DBT Project
A class representing a DBT project yaml parser.

A class representing a DBT project yaml parser.

Attributes:
project_root (str): Absolute path to the root of the dbt project being parsed

### DBT Model

A class representing a dbt model.

Attributes:
name (str): The name of the model.
description (str, optional): The description of the model.
columns (list[DbtModelColumn], optional): A list of columns contained in the model.
May or may not be exhaustive.
May or may not be exhaustive.
13 changes: 13 additions & 0 deletions ragstar/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,17 @@
from ragstar.types import (
PromptMessage,
ParsedSearchResult,
DbtModelDict,
DbtModelDirectoryEntry,
)

from ragstar.instructions import (
INTERPRET_MODEL_INSTRUCTIONS,
ANSWER_QUESTION_INSTRUCTIONS,
)

from ragstar.dbt_model import DbtModel
from ragstar.dbt_project import DbtProject
from ragstar.vector_store import VectorStore
from ragstar.chatbot import Chatbot
from ragstar.documentation_generator import DocumentationGenerator
41 changes: 14 additions & 27 deletions ragstar/chatbot.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

from ragstar.types import PromptMessage, ParsedSearchResult

from ragstar.instructions import ANSWER_QUESTION_INSTRUCTIONS
from ragstar.dbt_project import DbtProject
from ragstar.vector_store import VectorStore

Expand Down Expand Up @@ -30,7 +31,8 @@ def __init__(
openai_api_key: str,
embedding_model: str = "text-embedding-3-large",
chatbot_model: str = "gpt-4-turbo-preview",
db_persist_path: str = "./chroma.db",
vector_db_path: str = "./database/chroma.db",
database_path: str = "./database/directory.json",
) -> None:
"""
Initializes a chatbot object along with a default set of instructions.
Expand All @@ -53,30 +55,17 @@ def __init__(
self.__chatbot_model: str = chatbot_model
self.__openai_api_key: str = openai_api_key

self.project: DbtProject = DbtProject(dbt_project_root)
self.project: DbtProject = DbtProject(
dbt_project_root=dbt_project_root, database_path=database_path
)

self.store: VectorStore = VectorStore(
openai_api_key, embedding_model, db_persist_path
openai_api_key, embedding_model, vector_db_path
)

self.__instructions: list[str] = [
"You are a data analyst working with a data warehouse.",
"You should provide the user with the information they need to answer their question.",
"You should only provide information that you are confident is correct.",
"When you are not sure about the answer, you should let the user know.",
"If you are able to construct a SQL query that would answer the user's question, you should do so.",
"However please refrain from doing so if the user's question is ambiguous or unclear.",
"When writing a SQL query, you should only use column values if these values have been explicitly"
+ " provided to you in the information you have been given.",
"Do not write a SQL query if you are unsure about the correctness of the query or"
+ " about the values contained in the columns.",
"Only write a SQL query if you are confident that the query is exhaustive"
+ " and that it will return the correct results.",
"If it is not possible to write a SQL that fulfils these conditions, you should instead respond"
+ " with the names of the tables or columns that you think are relevant to the user's question.",
"You should also refrain from providing any information that is not directly related to the"
+ " user's question or that which cannot be inferred from the information you have been given.",
"The following information about tables and columns is available to you:",
]
self.client = OpenAI(api_key=self.__openai_api_key)

self.__instructions: list[str] = [ANSWER_QUESTION_INSTRUCTIONS]

def __prepare_prompt(
self, closest_models: list[ParsedSearchResult], query: str
Expand Down Expand Up @@ -186,7 +175,7 @@ def reset_model_db(self) -> None:
"""
self.store.reset_collection()

def ask_question(self, query: str, get_models_name_only: bool = False) -> str:
def ask_question(self, query: str, get_model_names_only: bool = False) -> str:
"""
Ask the chatbot a question about your dbt models and get a response.
The chatbot looks the dbt models most similar to the user query and uses them to answer the question.
Expand All @@ -204,18 +193,16 @@ def ask_question(self, query: str, get_models_name_only: bool = False) -> str:
closest_models = self.store.query_collection(query)
model_names = ", ".join(map(lambda x: x["id"], closest_models))

if get_models_name_only:
if get_model_names_only:
return model_names

print("Closest models found:", model_names)

print("\nPreparing prompt...")
prompt = self.__prepare_prompt(closest_models, query)

client = OpenAI(api_key=self.__openai_api_key)

print("\nCalculating response...")
completion = client.chat.completions.create(
completion = self.client.chat.completions.create(
model=self.__chatbot_model,
messages=prompt,
)
Expand Down
Loading

0 comments on commit 39c939f

Please sign in to comment.