Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DH-4441/adding the docs for context, vector, evals #91

Merged
merged 6 commits into from
Aug 15, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ instance/

# Sphinx documentation
docs/_build/
docs/.DS_Store

# PyBuilder
.pybuilder/
Expand Down
Binary file modified docs/Architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
30 changes: 27 additions & 3 deletions docs/context_store.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,28 @@
Context store
=====
Context Store Module
====================

Foo
The Context Store module is responsible for managing a vector database that stores relevant information for accurately generating SQL queries based on natural language prompts.

Context Retrieval
------------------

.. py:function:: retrieve_context_for_question(nl_question: str) -> str

Given a natural language question, this method retrieves a single string containing information about relevant data stores, tables, and columns necessary for building the SQL query. This information includes example questions, corresponding SQL queries, and metadata about the tables (e.g., categorical columns). The retrieved string is then passed to the text-to-SQL generator.

:param nl_question: The natural language question.
:type nl_question: str
:return: A string containing context information for generating SQL.
:rtype: str

Context Addition
-----------------

The Context Store also provides methods for adding different types of context to the store. The initial context type involves adding NL<>SQL pairs as examples that will be used in prompts for the Language Model.

.. py:function:: add_context_store_golden_sql(nl_2_sql_list: List)

This method adds NL<>SQL pairs to the context store. These pairs serve as examples and will be included in prompts to the Language Model.

:param nl_2_sql_list: List of NL<>SQL pairs.
:type nl_2_sql_list: List
86 changes: 83 additions & 3 deletions docs/evaluator.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,84 @@
Evaluator
=====================
Evaluation Component
====================

Foo
The Evaluation component is a critical post-processing step that follows SQL query generation. It provides insights into the confidence level of the agent regarding the generated query. This module aims to produce a confidence score within the range of 0 to 1, indicating the level of certainty associated with the generated SQL query. The component takes the response from the model and the question as its parameters.

EvaluationAgent
-------------------------

The "EvaluationAgent" method utilizes an agent that interacts with a set of tools to generate evaluation scores. This method involves the following steps:

1. **InfoSQLDatabaseTool**: This tool takes a list of tables as input and returns the schema along with sample rows for the given tables.

2. **QuerySQLDataBaseTool**: This tool runs a provided query on the database.

3. **EntityFinder**: This tool checks the existence of a given entity within a column.

Due to the reliance on these tools and the need to interact with them, the evaluation process takes around 40 to 50 seconds to evaluate a single query.

SimpleEvaluator
--------------------------

The "SimpleEvaluator" method is implemented using LLMs and is designed to be much faster compared to the "EvaluationAgent" method. In this method:

1. A list of common problems associated with SQL queries is provided to the model.
2. The model checks the generated query against all of these common issues.
3. The evaluation process doesn't require interactions with external tools.

This method is preferred when speed is crucial, as it doesn't involve tool interactions.

Both methods aim to generate evaluation scores that reflect the quality and correctness of the generated SQL queries, helping to assess the reliability of the generated responses.


Methods
-------

.. py:class:: Evaluation

Represents the evaluation result with attributes.

:param id: The evaluation's ID.
:type id: str
:param question_id: The associated question's ID.
:type question_id: str
:param answer_id: The associated answer's ID.
:type answer_id: str
:param score: The confidence score, ranging from 0 to 1.
:type score: float

.. py:class:: Evaluator(Component, ABC)

An abstract base class for evaluators.

:param database: The SQLDatabase instance for evaluation.
:type database: SQLDatabase
:param acceptance_threshold: The threshold for accepting generated responses.
:type acceptance_threshold: float
:param system: The system containing the evaluator.
:type system: System

.. py:method:: get_confidence_score(question, generated_answer, database_connection)

Determines if a generated response from the engine is acceptable based on the ACCEPTANCE_THRESHOLD.

:param question: The natural language question.
:type question: NLQuery
:param generated_answer: The generated SQL query response.
:type generated_answer: NLQueryResponse
:param database_connection: The database connection.
:type database_connection: DatabaseConnection
:return: The confidence score.
:rtype: float

.. py:method:: evaluate(question, generated_answer, database_connection)

Abstract method to evaluate a question with an SQL pair. Subclasses must implement this method.

:param question: The natural language question.
:type question: NLQuery
:param generated_answer: The generated SQL query response.
:type generated_answer: NLQueryResponse
:param database_connection: The database connection.
:type database_connection: DatabaseConnection
:return: An Evaluation instance.
:rtype: Evaluation
18 changes: 18 additions & 0 deletions docs/text_to_sql_engine.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,24 @@ The Dataherald_sqlagent is equipped with a range of tools that enhance its capab

7. **GetFewShotExamples**: Allows the agent to request relevant Question/SQL pairs dynamically. The agent can ask for more examples based on question complexity, fostering adaptive learning.

Method Details
--------------

:class:`SQLGenerator`
^^^^^^^^^^^^^^^^^^^^^

This class is a base class that all SQL generation classes inherit from. It provides common methods for generating SQL responses.

.. method:: create_sql_query_status(db, query, response)

Creates a SQL query status using provided parameters.

.. method:: generate_response(user_question, database_connection, context=None)

Generates a response to a user question based on the given user question, database connection, and optional context.

For detailed implementation guidelines and further assistance, consult our official documentation or reach out to our dedicated support team.

Conclusion
----------

Expand Down
62 changes: 59 additions & 3 deletions docs/vector_store.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,60 @@
Vector store
=====
Vector Store Options
====================

Foo
In our system, we currently support two commonly used vector stores: ChromaDB and Pinecone. Each vector store offers distinct features and advantages that cater to different use cases.

ChromaDB
--------

ChromaDB is a powerful vector database that provides efficient storage and retrieval of high-dimensional vectors. Its advantages include:

- **Local Storage**: ChromaDB stores vectors in a local database, ensuring fast access and low-latency queries.
- **Flexible Schema**: ChromaDB allows for flexible schema designs, making it suitable for a variety of vector types and applications.
- **Efficient Indexing**: ChromaDB employs advanced indexing techniques to accelerate vector similarity searches, enabling quick and accurate retrieval of similar vectors.

Pinecone
--------

Pinecone is a cloud-based vector search engine that specializes in delivering high-performance vector search capabilities. Its advantages include:

- **Scalability**: Pinecone offers seamless scalability, enabling you to handle large-scale vector datasets and dynamic workloads.
- **Real-time Search**: Pinecone is optimized for real-time vector search, making it suitable for applications that require low-latency retrieval of similar vectors.
- **API-Driven**: Pinecone provides a comprehensive API that allows you to integrate vector search capabilities into your applications with ease.

Abstract Vector Store Class
---------------------------

Both ChromaDB and Pinecone are implemented as subclasses of the abstract `VectorStore` class. This abstract class provides a unified interface for working with different vector store implementations.

:class:`VectorStore`
^^^^^^^^^^^^^^^^^^^^^

This abstract class defines the common methods that both ChromaDB and Pinecone vector stores should implement.

.. method:: __init__(self, system: System)

Initializes the vector store instance.

.. method:: query(self, query_texts: List[str], db_alias: str, collection: str, num_results: int) -> list

Executes a query to retrieve similar vectors from the vector store.

.. method:: create_collection(self, collection: str)

Creates a new collection within the vector store.

.. method:: add_record(self, documents: str, collection: str, metadata: Any, ids: List = None)

Adds vectors along with metadata to a specified collection.

.. method:: delete_record(self, collection: str, id: str)

Deletes a vector record from a collection.

.. method:: delete_collection(self, collection: str)

Deletes a collection from the vector store.

By utilizing the `VectorStore` abstract class, you can seamlessly switch between different vector store implementations while maintaining consistent interaction with the underlying systems.

For detailed implementation guidelines and further assistance, consult our official documentation or reach out to our dedicated support team.