Skip to content

complete addon_params with all prompt templates #1401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

drahnreb
Copy link
Contributor

@drahnreb drahnreb commented Apr 17, 2025

Description

Today you can only control for very broad language and entity types.

This PR exposes all prompt templates keys by adding them to the addon_params. This allows for easy customization of extraction prompts.
This gives more fine-grained control over prompts, e.g. you could instantiate different objects with special addon_params for certain types of text with more suitable domain relevant few-shot examples.

It also opens the way to impose more structure, e.g. via an ontology or (causal) relations.

Related Issues

None

Changes Made

added the following keys (without DEFAULT_):

PROMPTS["DEFAULT_TUPLE_DELIMITER"]
PROMPTS["DEFAULT_RECORD_DELIMITER"]
PROMPTS["DEFAULT_COMPLETION_DELIMITER"]

PROMPTS["summarize_entity_descriptions"]
PROMPTS["entity_extraction_examples"]
PROMPTS["entity_extraction"]
PROMPTS["entity_continue_extraction"]
PROMPTS["entity_if_loop_extraction"]
PROMPTS["keywords_extraction_examples"]
PROMPTS["keywords_extraction"]

PROMPTS["mix_rag_response"]
PROMPTS["naive_rag_response"]

PROMPTS["similarity_check"]

Checklist

  • Changes tested locally
  • Code reviewed
  • Documentation updated (if necessary)
  • Unit tests added (if applicable)

Additional Notes

@danielaskdd Please review. I left it simple and just added to addon_params, but this could be also grouped into prompts.

@drahnreb drahnreb changed the title complete addon params complete addon_params with all prompt templates Apr 17, 2025
@drahnreb drahnreb force-pushed the drahnreb/complete-addon-params branch from f96f050 to 61b6b19 Compare April 19, 2025 10:26
@drahnreb
Copy link
Contributor Author

@danielaskdd ready to merge if you want.

Copy link
Collaborator

@danielaskdd danielaskdd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing users to configure prompts is a great idea. In terms of implementation, we hope to make it more thorough and convenient. Issue #1353 proposed a potentially better approach: writing multiple sets of different prompts in the prompt directory, enabling users to freely choose which prompt to use for document indexing or queries. Could you propose an interface design under this concept that would make it even more user-friendly?

@drahnreb
Copy link
Contributor Author

drahnreb commented Apr 20, 2025

This is the intention.
Once all prompt templates are exposed with this PR you could do this:

Directory structure:

my_docs/
 └── books/
     ├── book1.txt
     ├── book2.txt
 └── articles/
     ├── article1.txt
     ├── article2.txt
     ├── insert_prompt_template.json
my_queries/
 └── articles/
     └── query_prompt_template.json
async def initialize_rag(addon_params: Optional[dict] = None):
    rag_kwargs = {
        "working_dir": WORKING_DIR,
        "llm_model_func": gpt_4o_mini_complete,
    }
    # Only add addon_params to kwargs if it's provided by the caller
    # otherwise will overrid default_factory (should be fine still, default language is pulled from PROMPTS)
    if addon_params is not None:
        rag_kwargs["addon_params"] = addon_params

    rag = LightRAG(**rag_kwargs)

    await rag.initialize_storages()
    await initialize_pipeline_status()

    return rag

# create file based example
json.dump({
    "entity_extraction_examples": ["device", "make", "model", "publication", "date"]
}, open('./my_docs/articles/insert_template_prompts.json', 'w'))
json.dump({
    "rag_response": "System prompt specific to articles..."
}, open('./my_queries/articles/query_template_prompts.json', 'w'))

docs = {
    "books": {
        "file_paths": ["./books/book1.txt", "./books/book2.txt"],
        "addon_params": {
            "entity_extraction_examples": ["organization", "person", "location"],
        },
        "system_prompts": {
            "rag_response": "KG mode system prompt specific to books...",
            "naive_rag_response": "Naive mode system prompt specific to books...",
            "mix_rag_response": "Mix mode system prompt specific to books...",
        },
    },
    "articles": {
        "file_paths": ["./articles/article1.txt", "./articles/article2.txt"],
        "addon_params": json.load(open('./my_docs/articles/insert_template_prompts.json', 'r')),
        "system_prompts": json.load(open('./my_queries/articles/query_template_prompts.json', 'r')),
    },
}

def get_content(file_paths):
    contents = []
    for fp in file_paths:
        with open(fp, "r", encoding="utf-8") as f:
            contents.append(f.read())
    return contents

# Insert differently per doc type
for doc_type, doc_info in docs.items():
    file_paths = doc_info["file_paths"]
    addon_params = doc_info["addon_params"]

    # Initialize the RAG instance for each document type
    print(f"Initializing RAG for {doc_type}")
    rag = asyncio.run(initialize_rag(addon_params))

    contents = get_content(file_paths)
    rag.insert(contents, file_paths=file_paths)

# Perform hybrid search for specific to `books` type queries
print(
    rag.query(
        "What are the top themes in this story?",
        param=QueryParam(mode="hybrid"),
        system_prompt=docs["books"][
            "system_prompts"
        ]["rag_response"],  # Use the hybrid mode specific system prompt for books type data
    )
)

Of course you could write convenience functions for the template handling, template checks (are the placeholder present etc.) or the correct query template associations (e.g.: for local,global,hybrid you could specify rag_response while for mix it could be mix_rag_response and for naive it could be naive_rag_response to keep it aligned to current prompts.py and pass any of them to system_prompt as illustrated in the last example)...

We could open a new PR for handling and checks to warn users if placeholders are missing. As any prompt would not fail for now, this could serve as an example illustration in the meantime. And we add the information to the README and an example?

@drahnreb drahnreb force-pushed the drahnreb/complete-addon-params branch from 61b6b19 to 885b480 Compare April 20, 2025 21:07
move *_responses from addon_param to query_param if not system_prompt, add optional system_prompt arg to query_with_keywords to customize context building and final response.
@drahnreb drahnreb requested a review from danielaskdd April 21, 2025 01:11
@drahnreb
Copy link
Contributor Author

drahnreb commented Apr 21, 2025

  • cleaned up and separated query from insert prompts d71ceb9
  • added checks to prevent possible problems when customizing critical prompt templates 3d7b1df
  • added exhaustive examples to illustrate usage 01aee34

This should address the core items. @danielaskdd PTAL when convenient.

@drahnreb
Copy link
Contributor Author

Just in case I missed it?
@danielaskdd do you still need anything to approve this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants