Skip to content

bnbsking/RAGENTools

Repository files navigation

RAGENTools: Retrieved, Augmentated, Generation (RAG) and AGENT tools.



Github: source

Motivation

  1. Extended LLM call

    • Based on Gemini and OpenAI official API, extend more useful functions include:
    Chat API Async Retry Get token/price Formatted response Img Input
    Official ⚠️ Wrapper ❌ Not support ⚠️ Hassle ✅ Server-side (strong)
    LangChain ⚠️ Wrapper ⚠️ conn. only ⚠️ Hassle ⚠️ Client-side (medium)
    Ours ✅ Call ✅ conn. & format ✅ .get_price() ✅ Server-side (strong)
    • Also auto batching for embedding api
    Emb API Async Retry Get token/price Batching
    Official ⚠️ Wrapper ❌ Not support ⚠️ Hassle ⚠️ Overflow error
    Ours ✅ Call ✅ connection ✅ .get_price() ✅ Auto address error

    see implementation details of Gemini and GPT

    • related resuable commponents: async executer, concurrency wrapper
  2. Agents

    • Based on Extended LLM call and LangChain Runnable, build complex agent by LangGraph efficiently.
    Method Node Pattern
    Traditional Functions Messy and hard to scale-up
    LangGraph Extended LLM call + LangChain Runnable Clean code due to Blackboard Design Pattern
    • Structure Design
      Flow

    • Example: Text2Chart agent

  3. RAG

    • BaseRAG is high level class, consists of

      • RagEngine: implement indexing and (single term) retrieving
        • TwoLevelRAGEngine: Coarse-To-Fine approach
          • fine-level: chunk
          • coarse-level: file summary (recursive)
        • MSGraphRAGEngine: Microsoft GraphRAG wrapped to python
      • Reranker:
        • Naive reranker: Rank and concat by score
        • LLM reranker: Rank by LLM, filter irrelevant and concat finally.
    • Based on BaseRAG

      • Naive RAG = BaseRAG

      • Agentic RAG (retrieval only) depends on BaseRAGEngine only

        agentic_rag

    • Related components

      • Scalabitity, Flexibility for various type of parsers, RAGEngine, VectorStore, Retrievers
      • example: Embedding for LangChain
        lanchain_emb
    • Parsers

      • Tuning chunk size
        • Rule of thumbs: chunk_size=500~1500 characters, chunk_overlap=10%~20%
        • Too fragmented -> Low k@recall or Low context recall -> Need to increase
        • Too much irrelevent -> Low k@precision -> Need to decrease
      • Supported Parsers
        • PDFParser
        • TextParser
    • Converter

      • PDF2Txt: for Microsoft GraphRAG
    • Evaluators

      • Supported evaluators
        • RAGAs
          ragas
        • Tuning strategy
Metrics Method Used Target Meaning Tuning
k@precision LLM as judge Y Retrieved-Query How many chunks are closely related to the query Reduce chunk size
k@recall Precompute N Retrieved-Query How many chunks for generating this QA is retrieved -
context recall LLM as judge Y Retrieved-GT How many chunks are helpful to the GT increase chunk size, top-k
faithfulness LLM as judge Y Retrieved-Response How many chunks are helpful to the Response low context recall: increase chunk size, overlap, top-k
high context recall: LLM hallucinates or do not need RAG
relevancy LLM as judge Y Query-Response Whether LLM understand query Enhance LLM or prompt
correctness LLM as judge Y Response-GT Overall score -

Installation

  • Docker image as in test.sh or test.ps1

  • Container for testing

bash env.sh
source ~/.bashrc
  • Directly use
pip install -e .

Example 1 - Call API

following code with the features

  • 1 retry: based on tencaity, customize retry times and intervals.
  • 2 formatted response: dictionary followed by Google official configuration.
  • 3 image input: followed by Google official configuration
  • 4 async: call async_executer(async_func: Callable, arg_list: List[Dict]) to easily harness async program.
  • 5 get price: use .get_price() easily. Update the price table in here
from ragentools.api_calls.google_gemini import GoogleGeminiChatAPI
from ragentools.common.async_funcs import async_executer, concurrency_wrapper
from ragentools.common.formatting import get_response_model

api = GoogleGeminiChatAPI(
    api_key="",  # SET API KEY HERE
    model_name="gemini-2.0-flash-lite",
    retry_times=3,  # 1 retry 
    retry_sec=5
)
response_format = {"description": {"type": "string"}}  # 2 formatted response

parts = [
    {"text": "What's in this picture?"},
    {"inline_data": {
        "mime_type": "image/jpeg",
        "data": open("/app/tests/api_calls/dog.jpg", "rb").read()
    }}
]  # 3 image input
api_run_limited = concurrency_wrapper(api.arun, 2)
results = async_executer(  # 4 async
    api_run_limited,
    [
        {
            "prompt": "What is next day of Friday?",
        },
        {
            "prompt": [{"role": "user", "parts": parts}],
            "response_format": response_format,
            "temperature": 0
        }
    ]
)

expect_response_format = get_response_model(response_format)
expect_response_format(**results[0])
print(results)               # 2 formatted response
print(api.get_price())  # 5 get price

The outcome will be

[
    'The next day after Friday is **Saturday**.\n',
    {'description': 'A black and white border collie dog is sitting on a white surface.'}
]
0

Example 2 - Text2Chart agent

  • code
    • Each node
      • Inherits "LangChain Runnable" for graph scalability
      • Has attribute "Extended LLM Call" for api benefits.
python agents/text2chart/v1/main.py
  • graph
    graph

  • prompts are in here. For instance, the eval prompt

prompt: |
  **Task:** You are an expert in evaluating a diagram generated from code written by a LLM, in response to a user's query.
  Your goal is to assess how accurately the diagram fullfills the user's intent.

  **Evaluation Scope:**
  Focus only on aspects that directly relate to the accuracy and informativeness of the diagram, as determined by the user's query.
  Ignore stylistic features such as color schemes, font styles, line thickness, or point markers, etc.

  **Evaluation Criteria:**
  Assess the diagram using the following four criteria. For each, select a score from the scale provided.
  1. Representative:
      Is the chosen diagram type (e.g., bar chart, line chart, scatter plot, pie chart) appropriate for visualizing the data and answering the user's query?
      **Scale**
      - 0 (Barely representative)
      - 1 (Partially representative)
      - 2 (Mostly representative)
  2. Data consistency:
      Do the data values shown in the diagram match what is implied or explicitly described in the user's query?
      If the query does not mention specific data values or ranges, consider it consistent.
      **Scale**
      - 0 (Barely consistent)
      - 1 (Partially consistent)
      - 2 (Mostly consistent)
  3. Scale correctness:
      Are the axes' scales (e.g. range, units, intervals) appropriate and correct based on the user's query?
      If the query does not specify scales or if the diagram does not require them (e.g. pie charts), consider it correct.
      - 0 (Barely correct)
      - 1 (Partially correct)
      - 2 (Mostly correct)
  4. Label accuracy:
      Are the diagram title, axes labels, legends, and other textual annotations accurate with respect to the variables or categories specified in the query?
      Are any key components missing?
      - 0 (Barely accurate)
      - 1 (Partially accurate)
      - 2 (Mostly accurate)

  **Query:**  {{ query }}

  **Response:** Provide a structured JSON.

default_replacements: {}

response_format:
  representative:
    type: integer
  data_consistency:
    type: integer
  scale_correctness:
    type: integer
  label_accuracy:
    type: integer
  explanation:
    type: string
api:
  api_key_path: /app/tests/api_keys.yaml
  api_key_env: GOOGLE_API_KEY
  model_name: gemini-2.0-flash-lite

mode: PLOT  # PLOT or RUN
data_path: /app/agents/text2chart/data/matplotbench_easy/data.json
save_folder: /app/agents/text2chart/v1/save/matplotbench_easy/

prompts:
  gen_path: /app/ragentools/prompts/text2chart/gen.yaml
  fix_path: /app/ragentools/prompts/text2chart/fix.yaml
  eval_path: /app/ragentools/prompts/text2chart/eval.yaml
  refine_path: /app/ragentools/prompts/text2chart/refine.yaml
  • dataset is in here.
[
    {
        "instruction": "Create a pie chart:\n\nThe pie chart represents the distribution of fruits in a basket, with the proportions being 35% apples, 45% oranges, and 20% bananas",
        "id": 5
    },
    {
        "instruction": "Generate a Python script using matplotlib to create a 4x4 inch figure that plots a line based on array 'x' from 0.0 to 10.0 (step 0.02) against 'y' which is sine(3pix). Set the x-axis limit from -2 to 10 and the y-axis limit from -6 to 6.",
        "id": 9
    },
    {
        "instruction": "Could you assist me in creating a Python script that generates a plot with the following specifications?\n\n1. The plot should contain three lines. The first line should represent the square of a numerical sequence ranging from 0.0 to 3.0 in increments of 0.02. The second line should represent the cosine of '3*pi' times the same sequence. The third line should represent the product of the square of the sequence and the cosine of '3*pi' times the sequence.\n\n2. The plot should have a legend, labeling the first line as 'square', second line as 'oscillatory' and the third line as 'damped'.\n\n3. The x-axis should be labeled as 'time' and the y-axis as 'amplitude'. The title of the plot should be 'Damped oscillation'.\n\nCould you help me with this?\"",
        "id": 10
    }
]
  • output folder: agents/text2chart/v1/save/matplotbench_easy
    • example of "id=5" data
      • plot:
        id5
      • eval:
{
    "representative": 2,
    "data_consistency": 2,
    "scale_correctness": 2,
    "label_accuracy": 2,
    "explanation": "The pie chart accurately represents the fruit distribution with correct proportions and labels."
}

Example 3 - RAG

  • Overview flow based on TwoLevelRAGEngine
    two_level_rag
  • Full example at here

Quick example

import glob
from typing import Iterator

import yaml

from ragentools.api_calls.google_gemini import (
    GoogleGeminiEmbeddingAPI,
    GoogleGeminiChatAPI,
)

from ragentools.parsers import Document
from ragentools.parsers.readers import PDFReader
from ragentools.parsers.chunkers import OverlapChunker
from ragentools.parsers.savers import PDFSaver
from ragentools.parsers.parsers import BaseParser

from langchain_community.vectorstores import FAISS
from ragentools.rags.utils.embedding import LangChainEmbedding
from ragentools.rags.rags import BaseRAG
from ragentools.rags.rag_engines import TwoLevelRAGEngine
from ragentools.rags.rerankers import BaseReranker


# inputs
cfg = yaml.safe_load(open("/app/rags/papers/v2/rags_papers_v2.yaml"))
cfg_api = cfg["api"]
cfg_par = cfg["parser"]
cfg_rag = cfg["rag"]


# init clients
api_key = yaml.safe_load(open(cfg_api["api_key_path"]))[cfg_api["api_key_env"]]
api_emb = GoogleGeminiEmbeddingAPI(api_key=api_key, model_name=cfg_api["emb_model_name"], retry_sec=65)
api_chat = GoogleGeminiChatAPI(api_key=api_key, model_name=cfg_api["chat_model_name"], retry_sec=65)
embed_model = LangChainEmbedding(api=api_emb, dim=3072)


# parser
parser = BaseParser(
    reader=PDFReader(pdf_paths=glob.glob(cfg_par["pdf_paths"])),
    chunker=OverlapChunker(
        chunk_size=cfg_par["chunk_size"],
        overlap_size=cfg_par["overlap_size"]
    ),
    saver=PDFSaver(save_folder=cfg_par["save_folder"])
)
parse_result: Iterator[Document] = parser.run(lazy=True)


# rag
rag = BaseRAG(
    rag_engine=TwoLevelRAGEngine(
        vector_store_cls=FAISS,
        embed_model = embed_model,
        api_chat = api_chat
    ),
    reranker=BaseReranker()
)
rag.index(
    docs = parse_result,
    coarse_key = "source_path",
    save_folder = cfg_rag["save_folder"],
)
print(rag.retrieve("What are the key areas that medicine focuses on to ensure well-being?"))

Parsing result

  • Input: List of pdf path

  • output:

    • csv for each pdf. example:
    chunk source_path page
    Hi There! Nice to meet you /path/to/doc.pdf 7

Indexing result

  • Input: Iterator[Document]
  • Output:
    • *.faiss

Load and Retrieve

import yaml

from ragentools.api_calls.google_gemini import (
    GoogleGeminiEmbeddingAPI,
    GoogleGeminiChatAPI,
)

from langchain_community.vectorstores import FAISS
from ragentools.rags.utils.embedding import LangChainEmbedding
from ragentools.rags.rags import BaseRAG
from ragentools.rags.rag_engines import TwoLevelRAGEngine
from ragentools.rags.rerankers import LLMReranker


# inputs
cfg = yaml.safe_load(open("/app/rags/papers/v2/rags_papers_v2.yaml"))
cfg_api = cfg["api"]
cfg_par = cfg["parser"]
cfg_rag = cfg["rag"]


# init clients
api_key = yaml.safe_load(open(cfg_api["api_key_path"]))[cfg_api["api_key_env"]]
api_emb = GoogleGeminiEmbeddingAPI(api_key=api_key, model_name=cfg_api["emb_model_name"], retry_sec=65)
api_chat = GoogleGeminiChatAPI(api_key=api_key, model_name="gemini-2.0-flash", retry_sec=65)
embed_model = LangChainEmbedding(api=api_emb, dim=3072)


# rag
rag_engine = TwoLevelRAGEngine(
        vector_store_cls=FAISS,
        embed_model = embed_model,
        api_chat = api_chat
)
rag_engine.load(load_folder = cfg_rag["save_folder"])
rag = BaseRAG(
    rag_engine=rag_engine,
    reranker=LLMReranker(
        api=api_chat,
        prompt_path="/app/ragentools/prompts/reranker.yaml")
)

print(rag.retrieve("What are the key areas that medicine focuses on to ensure well-being?"))

The outcome is like

Chunk 1 with score 1.0:
cts on the body.​
 
6.​ Medical Research: Involves investigating the underlying mechanisms of diseases, 
testing new treatments, and developing innovative therapies.​
 
Medical Practices and Tools:​
 

==========
Chunk 2 with score 1.0:
a wide range of 
healthcare practices developed to preserve and restore human and animal health through 
prevention, diagnosis, treatment, and rehabilitation. 
History:​
 The practice of medicine date

Output after put everything together

[
    {
        "question": "What are the key areas that medicine focuses on to ensure well-being?",
        "answer": "Medicine focuses on diagnosing, treating, and preventing disease and injury, as well as maintaining and promoting overall health.",
        "source_path": "/app/rags/papers/data/medicine.pdf",
        "page": 1,
        "llm_response": "Medicine focuses on diagnosing, treating, and preventing disease and injury, as well as maintaining and promoting overall health. It aims to improve the quality and longevity of life and alleviate suffering through continuous learning, research, and clinical practice. Key areas include clinical medicine, preventive medicine, pharmacology, surgery, and pathology.\n",
        "retrieved_text": ...,
        "eval": {
            "answer_correctness": {
                "score": 5,
                "reason": "The response is fully correct and semantically equivalent to the ground truth. The additional information is consistent and does not contradict the ground truth."
            },
            "answer_relevancy": {
                "score": 5,
                "reason": "The response directly answers the question by listing key areas of medicine that ensure well-being, such as diagnosing, treating, and preventing disease."
            },
            "context_precision": {
                "score": 5,
                "reason": "The retrieved text focuses specifically on the key areas of medicine related to ensuring well-being, such as disease prevention, treatment, and health promotion."
            },
            "context_recall": {
                "score": 5,
                "reason": "The retrieved text fully encompasses the ground truth answer, covering diagnosis, treatment, prevention, and health maintenance."
            },
            "faithfulness": {
                "score": 5,
                "reason": "The response accurately summarizes the retrieved text, focusing on the definition, goals, and key areas of medicine without introducing any unsupported information or contradictions."
            }
        }
    },
    ...
]

and all data

{
    "answer_correctness": 5.0,
    "answer_relevancy": 5.0,
    "context_precision": 5.0,
    "context_recall": 3.0,
    "faithfulness": 5.0
}

About

RAG and Agent Tools.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages