Conversation
|
@codecov-ai-reviewer review |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Walkthrough该 PR 将 pycgraph 依赖锁定为 3.2.2,移除 pycgraph 的外部 Git 源;在 RAG 演示中为 Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant UI as RAG UI
participant Stream as rag_answer_streaming
participant Detector as IntentDetectorSingleton
participant LLM
participant Scheduler as schedule_stream_flow
participant Flow as RAG Flow
User->>UI: 切换 Auto Mode / 提交查询
UI->>Stream: 调用(text, auto_mode=..., 其他参数...)
alt auto_mode = true
Stream->>Detector: detect(text, flow_list)
Detector->>LLM: 选择流(INTENT_DETECTOR_PROMPT)
LLM-->>Detector: 返回选中流名
Detector->>LLM: 提取参数(PARAMETER_EXTRACTOR_PROMPT)
LLM-->>Detector: 返回参数 JSON
Detector-->>Stream: {tool_name, parameters, flags}
Stream->>Scheduler: schedule_stream_flow(flow_key, parameters)
else auto_mode = false
Stream->>Stream: 根据手动选项决定 flow_key & parameters
Stream->>Scheduler: schedule_stream_flow(flow_key, parameters)
end
Scheduler->>Flow: 启动流式执行
Flow-->>Stream: 流式结果片段
Stream-->>UI: 异步迭代推送结果
UI-->>User: 展示答案
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 重点审查项:
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)hugegraph-llm/**/*.py📄 CodeRabbit inference engine (hugegraph-llm/AGENTS.md)
Files:
🧠 Learnings (1)📓 Common learnings🧬 Code graph analysis (1)hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py (2)
🪛 GitHub Actions: Pylinthugegraph-llm/src/hugegraph_llm/flows/intent_detector.py[warning] 174-174: Bad indentation. Found 10 spaces, expected 12 (bad-indentation) [warning] 137-137: R1711: Useless return at end of function or method (useless-return) 🔇 Additional comments (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @weijinglin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the RAG system by implementing an intelligent, LLM-driven automatic flow routing and parameter extraction mechanism. This allows the system to dynamically select the optimal RAG strategy and graph query approach based on the user's natural language query, greatly improving usability and adaptability by reducing the need for manual configuration. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
license-eye has checked 372 files.
| Valid | Invalid | Ignored | Fixed |
|---|---|---|---|
| 304 | 1 | 67 | 0 |
Click to see the invalid file list
- hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py
Use this command to fix any missing license headers
```bash
docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header fix
</details>
| @@ -0,0 +1,224 @@ | |||
| import threading | |||
There was a problem hiding this comment.
| import threading | |
| # Licensed to the Apache Software Foundation (ASF) under one or more | |
| # contributor license agreements. See the NOTICE file distributed with | |
| # this work for additional information regarding copyright ownership. | |
| # The ASF licenses this file to You under the Apache License, Version 2.0 | |
| # (the "License"); you may not use this file except in compliance with | |
| # the License. You may obtain a copy of the License at | |
| # http://www.apache.org/licenses/LICENSE-2.0 | |
| # Unless required by applicable law or agreed to in writing, software | |
| # distributed under the License is distributed on an "AS IS" BASIS, | |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| # See the License for the specific language governing permissions and | |
| # limitations under the License. | |
| import threading |
There was a problem hiding this comment.
Code Review
This pull request introduces an excellent new feature: an 'auto mode' for RAG use cases, driven by an LLM-based intent detector. This significantly improves usability by automatically selecting the appropriate RAG flow. The implementation is comprehensive, covering the intent detection logic, UI updates in Gradio, and necessary refactoring of flow definitions. My review focuses on improving code maintainability, robustness, and fixing a few minor issues. I've pointed out areas with code duplication and verbose logic in rag_block.py that could be refactored, a critical bug in the prompt formatting in intent_detector.py, and some opportunities to make the code more robust and consistent.
| if flow in self.flow_message: | ||
| tool_descs.append(self.flow_message[flow]["desc"]) | ||
| tools_str = "\n\n".join(tool_descs) | ||
| prompt = INTENT_DETECTOR_PROMPT.replace("{{tool_list}}", tools_str) |
There was a problem hiding this comment.
There's a typo in the placeholder name. The prompt INTENT_DETECTOR_PROMPT defines {{flow_list}}, but the code is trying to replace {{tool_list}}. This will prevent the prompt from being formatted correctly, causing the LLM to receive an incomplete list of available flows.
| prompt = INTENT_DETECTOR_PROMPT.replace("{{tool_list}}", tools_str) | |
| prompt = INTENT_DETECTOR_PROMPT.replace("{{flow_list}}", tools_str) |
| tool_result = await self.llm_client.agenerate(prompt=prompt) | ||
| tool_result = tool_result.strip() | ||
| # expected tool_result belong to [4 kinds of Flow] | ||
| detail = None if self.flow_message[tool_result] is None else self.flow_message[tool_result]["detail"] |
There was a problem hiding this comment.
This line uses direct dictionary access self.flow_message[tool_result], which will raise a KeyError if tool_result from the LLM is not a valid flow name (e.g., if it returns 'none' or an unexpected value). It's safer to use .get() to handle cases where the key might not exist, preventing the application from crashing.
| detail = None if self.flow_message[tool_result] is None else self.flow_message[tool_result]["detail"] | |
| detail = self.flow_message.get(tool_result, {}).get("detail") |
| from hugegraph_llm.flows.scheduler import SchedulerSingleton | ||
| from hugegraph_llm.flows.intent_detector import IntentDetectorSingleton | ||
| import pandas as pd | ||
| import gradio as gr |
| if auto_mode: | ||
| intent_detector = IntentDetectorSingleton.get_instance() | ||
| result = await intent_detector.detect( | ||
| text, | ||
| [FlowName.RAG_RAW, FlowName.RAG_VECTOR_ONLY, FlowName.RAG_GRAPH_ONLY, FlowName.RAG_GRAPH_VECTOR], | ||
| ) | ||
| if result["tool_name"] is None or result["tool_name"] == "none": | ||
| raise RuntimeError("No suitable flow found") | ||
| elif result["tool_name"] in [ | ||
| FlowName.RAG_RAW, | ||
| FlowName.RAG_VECTOR_ONLY, | ||
| FlowName.RAG_GRAPH_ONLY, | ||
| FlowName.RAG_GRAPH_VECTOR | ||
| ]: | ||
| flow_key = result["tool_name"] | ||
| else: | ||
| raise RuntimeError("Unsupported flow type") | ||
| async for res in scheduler.schedule_stream_flow( | ||
| flow_key, | ||
| query=text, | ||
| vector_search=result["parameters"].get("vector_search", vector_search) | ||
| if "parameters" in result | ||
| else vector_search, | ||
| graph_search=result["parameters"].get("graph_search", graph_search) | ||
| if "parameters" in result | ||
| else graph_search, | ||
| raw_answer=result["parameters"].get("raw_answer", False) | ||
| if "parameters" in result | ||
| else False, | ||
| vector_only_answer=result["parameters"].get("vector_only_answer", False) | ||
| if "parameters" in result | ||
| else False, | ||
| graph_only_answer=result["parameters"].get("graph_only_answer", False) | ||
| if "parameters" in result | ||
| else False, | ||
| graph_vector_answer=result["parameters"].get( | ||
| "graph_vector_answer", False | ||
| ) | ||
| if "parameters" in result | ||
| else False, | ||
| graph_ratio=result["parameters"].get("graph_ratio", graph_ratio) | ||
| if "parameters" in result | ||
| else graph_ratio, | ||
| rerank_method=result["parameters"].get("rerank_method", rerank_method) | ||
| if "parameters" in result | ||
| else rerank_method, | ||
| near_neighbor_first=result["parameters"].get( | ||
| "near_neighbor_first", near_neighbor_first | ||
| ) | ||
| if "parameters" in result | ||
| else near_neighbor_first, | ||
| custom_related_information=result["parameters"].get( | ||
| "custom_related_information", custom_related_information | ||
| ) | ||
| if "parameters" in result | ||
| else custom_related_information, | ||
| answer_prompt=result["parameters"].get("answer_prompt", answer_prompt) | ||
| if "parameters" in result | ||
| else answer_prompt, | ||
| keywords_extract_prompt=result["parameters"].get( | ||
| "keywords_extract_prompt", keywords_extract_prompt | ||
| ) | ||
| if "parameters" in result | ||
| else keywords_extract_prompt, | ||
| gremlin_tmpl_num=result["parameters"].get( | ||
| "gremlin_tmpl_num", gremlin_tmpl_num | ||
| ) | ||
| if "parameters" in result | ||
| else gremlin_tmpl_num, | ||
| gremlin_prompt=result["parameters"].get( | ||
| "gremlin_prompt", gremlin_prompt | ||
| ) | ||
| if "parameters" in result | ||
| else gremlin_prompt, | ||
| ): | ||
| if res.get("switch_to_bleu"): | ||
| gr.Warning( | ||
| "Online reranker fails, automatically switches to local bleu rerank." | ||
| ) | ||
| yield ( | ||
| res.get("raw_answer", ""), | ||
| res.get("vector_only_answer", ""), | ||
| res.get("graph_only_answer", ""), | ||
| res.get("graph_vector_answer", ""), | ||
| ) | ||
| else: | ||
| raise RuntimeError("Unsupported flow type") | ||
| if graph_vector_answer or (graph_only_answer and vector_only_answer): | ||
| flow_key = FlowName.RAG_GRAPH_VECTOR | ||
| elif vector_only_answer: | ||
| flow_key = FlowName.RAG_VECTOR_ONLY | ||
| elif graph_only_answer: | ||
| flow_key = FlowName.RAG_GRAPH_ONLY | ||
| elif raw_answer: | ||
| flow_key = FlowName.RAG_RAW | ||
| else: | ||
| raise RuntimeError("Unsupported flow type") | ||
|
|
||
| async for res in scheduler.schedule_stream_flow( | ||
| flow_key, | ||
| query=text, | ||
| vector_search=vector_search, | ||
| graph_search=graph_search, | ||
| raw_answer=raw_answer, | ||
| vector_only_answer=vector_only_answer, | ||
| graph_only_answer=graph_only_answer, | ||
| graph_vector_answer=graph_vector_answer, | ||
| graph_ratio=graph_ratio, | ||
| rerank_method=rerank_method, | ||
| near_neighbor_first=near_neighbor_first, | ||
| custom_related_information=custom_related_information, | ||
| answer_prompt=answer_prompt, | ||
| keywords_extract_prompt=keywords_extract_prompt, | ||
| gremlin_tmpl_num=gremlin_tmpl_num, | ||
| gremlin_prompt=gremlin_prompt, | ||
| ): | ||
| if res.get("switch_to_bleu"): | ||
| gr.Warning( | ||
| "Online reranker fails, automatically switches to local bleu rerank." | ||
| async for res in scheduler.schedule_stream_flow( | ||
| flow_key, | ||
| query=text, | ||
| vector_search=vector_search, | ||
| graph_search=graph_search, | ||
| raw_answer=raw_answer, | ||
| vector_only_answer=vector_only_answer, | ||
| graph_only_answer=graph_only_answer, | ||
| graph_vector_answer=graph_vector_answer, | ||
| graph_ratio=graph_ratio, | ||
| rerank_method=rerank_method, | ||
| near_neighbor_first=near_neighbor_first, | ||
| custom_related_information=custom_related_information, | ||
| answer_prompt=answer_prompt, | ||
| keywords_extract_prompt=keywords_extract_prompt, | ||
| gremlin_tmpl_num=gremlin_tmpl_num, | ||
| gremlin_prompt=gremlin_prompt, | ||
| ): | ||
| if res.get("switch_to_bleu"): | ||
| gr.Warning( | ||
| "Online reranker fails, automatically switches to local bleu rerank." | ||
| ) | ||
| yield ( | ||
| res.get("raw_answer", ""), | ||
| res.get("vector_only_answer", ""), | ||
| res.get("graph_only_answer", ""), | ||
| res.get("graph_vector_answer", ""), | ||
| ) |
There was a problem hiding this comment.
The logic within this try block can be refactored to improve readability and reduce code duplication.
- Verbose Parameter Passing: The method of passing parameters to
scheduler.schedule_stream_flowinauto_modeis very verbose, with a redundantif "parameters" in result else ...check for every parameter. This can be simplified significantly. - Code Duplication: The
async for res in scheduler.schedule_stream_flow(...)loop and its body are duplicated in both theif auto_mode:andelse:branches.
Consider refactoring to first determine the flow_key and a flow_params dictionary, and then use a single schedule_stream_flow call and loop. This will make the code much cleaner and easier to maintain.
Here is an example of how it could be refactored:
scheduler = SchedulerSingleton.get_instance()
flow_params = {}
flow_key = ""
if auto_mode:
intent_detector = IntentDetectorSingleton.get_instance()
result = await intent_detector.detect(
text,
[FlowName.RAG_RAW, FlowName.RAG_VECTOR_ONLY, FlowName.RAG_GRAPH_ONLY, FlowName.RAG_GRAPH_VECTOR],
)
tool_name = result.get("tool_name")
if tool_name is None or tool_name == "none":
raise RuntimeError("No suitable flow found")
flow_key = tool_name
if flow_key not in [FlowName.RAG_RAW, FlowName.RAG_VECTOR_ONLY, FlowName.RAG_GRAPH_ONLY, FlowName.RAG_GRAPH_VECTOR]:
raise RuntimeError("Unsupported flow type")
params = result.get("parameters", {})
flow_params = {
"query": text,
"vector_search": params.get("vector_search", vector_search),
"graph_search": params.get("graph_search", graph_search),
# ... other params
}
else:
if graph_vector_answer or (graph_only_answer and vector_only_answer):
flow_key = FlowName.RAG_GRAPH_VECTOR
# ... other manual flow selections
else:
raise RuntimeError("Unsupported flow type")
flow_params = {
"query": text,
"vector_search": vector_search,
"graph_search": graph_search,
# ... other params
}
async for res in scheduler.schedule_stream_flow(flow_key, **flow_params):
if res.get("switch_to_bleu"):
gr.Warning(
"Online reranker fails, automatically switches to local bleu rerank."
)
yield (
res.get("raw_answer", ""),
res.get("vector_only_answer", ""),
res.get("graph_only_answer", ""),
res.get("graph_vector_answer", ""),
)| "desc": RAGGRAPHVECTOR_FLOW_DESC, | ||
| "detail": RAGGRAPHVECTOR_FLOW_DETAIL, | ||
| } | ||
| return |
| # expected tool_result belong to [4 kinds of Flow] | ||
| detail = None if self.flow_message[tool_result] is None else self.flow_message[tool_result]["detail"] | ||
| if detail is None: | ||
| raise ValueError("LLM返回的flow类型不在支持的RAGFlow范围内!") |
There was a problem hiding this comment.
The error message is in Chinese. For consistency with the rest of the codebase and to make it accessible to a wider audience, it's better to use English for error messages.
| raise ValueError("LLM返回的flow类型不在支持的RAGFlow范围内!") | |
| raise ValueError("The flow type returned by the LLM is not in the supported RAGFlows.") |
| RAGGRAPHONLY_FLOW_DESC = """ | ||
| { | ||
| "name": "rag_graph_only", | ||
| "desc": "Graph-only retrieval augmented generation workflow. Answers are generated based solely on graph search results, without vector-based augmentation.", | ||
| } | ||
| """ | ||
|
|
||
| RAGGRAPHONLY_FLOW_DETAIL = """ | ||
| { | ||
| "required_params": [ | ||
| {"name": "query", "type": "str", "desc": "User question"}, | ||
| {"name": "gremlin_tmpl_num", "type": "int", "desc": "Number of Gremlin templates to use. Set to 3 if the query contains clear graph query semantics that can be translated to Gremlin (such as finding relationships, paths, nodes, or graph traversal patterns). Set to -1 if the query semantics are ambiguous or cannot be clearly mapped to graph operations"}, | ||
| ] | ||
| } | ||
| """ |
There was a problem hiding this comment.
| RAGGRAPHVECTOR_FLOW_DESC = """ | ||
| { | ||
| "name": "rag_graph_vector", | ||
| "desc": "Hybrid graph and vector retrieval augmented generation workflow. Answers are generated by combining both graph and vector search results." | ||
| } | ||
| """ | ||
|
|
||
| RAGGRAPHVECTOR_FLOW_DETAIL = """ | ||
| { | ||
| "required_params": [ | ||
| {"name": "query", "type": "str", "desc": "User question"}, | ||
| {"name": "gremlin_tmpl_num", "type": "int", "desc": "Number of Gremlin templates to use. Set to 3 if the query contains clear graph query semantics that can be translated to Gremlin (such as finding relationships, paths, nodes, or graph traversal patterns). Set to -1 if the query semantics are ambiguous or cannot be clearly mapped to graph operations"}, | ||
| ] | ||
| } | ||
| """ |
There was a problem hiding this comment.
| RAGRAW_FLOW_DESC = """ | ||
| { | ||
| "name": "rag_raw", | ||
| "desc": "Direct LLM-based question answering without external knowledge augmentation. Suitable for pure LLM scenarios." | ||
| } | ||
| """ | ||
|
|
||
| RAGRAW_FLOW_DETAIL = """ | ||
| { | ||
| "required_params": [ | ||
| {"name": "query", "type": "str", "desc": "User question"}, | ||
| ] | ||
| } | ||
| """ |
There was a problem hiding this comment.
| RAGVECTORONLY_FLOW_DESC = """ | ||
| { | ||
| "name": "rag_vector_only", | ||
| "desc": "Vector-only retrieval augmented generation workflow. Answers are generated based solely on vector search results, without graph-based augmentation." | ||
| } | ||
| """ | ||
|
|
||
| RAGVECTORONLY_FLOW_DETAIL = """ | ||
| { | ||
| "required_params": [ | ||
| {"name": "query", "type": "str", "desc": "User question"}, | ||
| ] | ||
| } | ||
| """ |
There was a problem hiding this comment.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.py (1)
160-195: auto_mode 下的“至少选择一种模式”校验逻辑可以更清晰
rag_answer_streaming在进入 auto 模式分支之前先做:if raw_answer is False and not vector_search and not graph_search: gr.Warning("Please select at least one generate mode.") yield "", "", "", "" return在 auto_mode 场景下,用户直觉上不需要再关心这些单选项;但如果用户之前把四个模式都点成 False,再打开 auto_mode,就会在这里被挡住,看起来像是自动模式失效。
建议:
- 在 auto 模式下直接跳过这段校验,或者
- 把提示信息改成更贴合 auto 模式,例如:“手动模式下请至少选择一种生成模式;自动模式会自动选择,无需设置”。
示例调整:
- if raw_answer is False and not vector_search and not graph_search: + if (not auto_mode) and raw_answer is False and not vector_search and not graph_search: gr.Warning("Please select at least one generate mode.") yield "", "", "", "" return这样可以避免 auto 模式下出现“请选择模式”的困惑提示。
🧹 Nitpick comments (5)
hugegraph-llm/src/hugegraph_llm/nodes/llm_node/schema_build.py (1)
27-91: 考虑为方法添加类型注解以提高 mypy 兼容性。方法
node_init()和operator_schedule()缺少参数和返回值的类型注解。根据编码指南,应使用 mypy 进行类型检查。建议为方法签名添加类型提示,例如:
def node_init(self) -> CStatus: ... def operator_schedule(self, data_json: dict) -> dict: ...hugegraph-llm/src/hugegraph_llm/nodes/common_node/merge_rerank_node.py (1)
57-59: 建议将导入移至模块级别以提高一致性。CStatus 的导入位于异常处理块内部。虽然这样可以正常工作,但建议将其移至文件顶部的模块级别导入,以与代码库中其他文件保持一致。
可以应用以下 diff:
在文件顶部添加导入:
from typing import Dict, Any +from pycgraph import CStatus from hugegraph_llm.nodes.base_node import BaseNode然后移除异常处理块中的导入:
except ValueError as e: log.error("Failed to initialize MergeRerankNode: %s", e) - from pycgraph import CStatus - return CStatus(-1, f"MergeRerankNode initialization failed: {e}")hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.py (1)
19-19: RAGGRAPHONLY 描述字符串建议去掉多余逗号,保持合法 JSON(可选)
RAGGRAPHONLY_FLOW_DESC目前是 JSON 结构的字符串,但在"desc"字段后面多了一个逗号,如果后续有代码尝试对该字符串做json.loads,会直接解析失败。现在仅用于 LLM 提示问题不大,但建议顺手改成合法 JSON,和其它*_FLOW_DESC保持一致,避免未来踩坑。RAGGRAPHONLY_FLOW_DESC = """ { "name": "rag_graph_only", - "desc": "Graph-only retrieval augmented generation workflow. Answers are generated based solely on graph search results, without vector-based augmentation.", -} + "desc": "Graph-only retrieval augmented generation workflow. Answers are generated based solely on graph search results, without vector-based augmentation." +} """Also applies to: 32-46
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.py (2)
20-27: 导入语句有重复,可简化以避免风格工具告警当前同一模块内多次导入相同对象,例如:
import gradio as gr在行 25 和行 29 重复。from hugegraph_llm.flows.scheduler import SchedulerSingleton在行 26 和行 33 重复。虽然不会影响运行,但会被 ruff/pylint 之类工具标为多余导入,建议保留一处即可,保持 import 区块整洁。
Also applies to: 28-34
199-273: auto_mode 分支整体路由思路合理,可考虑小幅收紧参数合并逻辑(可选)自动模式分支中,通过
IntentDetectorSingleton获取tool_name与parameters,然后把parameters中的多个布尔标志和配置项透传到schedule_stream_flow。整体设计符合“检测 + 流路由”的目标,不过有两点可以作为后续优化参考(非阻塞):
对
result["parameters"]的存在判断可以简化
detect()始终会返回"parameters",因此每个参数都写成
result["parameters"].get(..., default) if "parameters" in result else default
有些冗余,可以直接用.get。可以只允许 LLM 覆盖与 flow 本身强相关的开关
例如vector_search/graph_search/graph_*_answer这些和 flow 类型强绑定的开关,让 LLM 决定是合理的;而graph_ratio、rerank_method、custom_related_information更偏 UI 配置,完全交给 LLM 可能会降低可预测性。
可以考虑只从flow_flags合并布尔标志,其他字段仍然完全沿用用户在 UI 中的选择。这些都不影响功能正确性,属于后续可以迭代的体验和可维护性优化点。
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (31)
hugegraph-llm/pyproject.toml(1 hunks)hugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.py(6 hunks)hugegraph-llm/src/hugegraph_llm/flows/build_example_index.py(1 hunks)hugegraph-llm/src/hugegraph_llm/flows/build_schema.py(1 hunks)hugegraph-llm/src/hugegraph_llm/flows/build_vector_index.py(1 hunks)hugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.py(1 hunks)hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py(1 hunks)hugegraph-llm/src/hugegraph_llm/flows/import_graph_data.py(1 hunks)hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py(1 hunks)hugegraph-llm/src/hugegraph_llm/flows/prompt_generate.py(1 hunks)hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.py(2 hunks)hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.py(2 hunks)hugegraph-llm/src/hugegraph_llm/flows/rag_flow_raw.py(1 hunks)hugegraph-llm/src/hugegraph_llm/flows/rag_flow_vector_only.py(2 hunks)hugegraph-llm/src/hugegraph_llm/flows/scheduler.py(1 hunks)hugegraph-llm/src/hugegraph_llm/flows/text2gremlin.py(1 hunks)hugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.py(1 hunks)hugegraph-llm/src/hugegraph_llm/nodes/base_node.py(1 hunks)hugegraph-llm/src/hugegraph_llm/nodes/common_node/merge_rerank_node.py(1 hunks)hugegraph-llm/src/hugegraph_llm/nodes/document_node/chunk_split.py(1 hunks)hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.py(1 hunks)hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py(1 hunks)hugegraph-llm/src/hugegraph_llm/nodes/index_node/build_gremlin_example_index.py(1 hunks)hugegraph-llm/src/hugegraph_llm/nodes/index_node/gremlin_example_index_query.py(1 hunks)hugegraph-llm/src/hugegraph_llm/nodes/index_node/semantic_id_query_node.py(1 hunks)hugegraph-llm/src/hugegraph_llm/nodes/index_node/vector_query_node.py(1 hunks)hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.py(1 hunks)hugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py(1 hunks)hugegraph-llm/src/hugegraph_llm/nodes/llm_node/schema_build.py(1 hunks)hugegraph-llm/src/hugegraph_llm/nodes/util.py(1 hunks)hugegraph-llm/src/hugegraph_llm/state/ai_state.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
hugegraph-llm/**/*.py
📄 CodeRabbit inference engine (hugegraph-llm/AGENTS.md)
hugegraph-llm/**/*.py: Adhere to ruff code style for Python code
Type-check Python code with mypy
Keep each Python file under 600 lines for maintainability
Files:
hugegraph-llm/src/hugegraph_llm/nodes/llm_node/schema_build.pyhugegraph-llm/src/hugegraph_llm/flows/graph_extract.pyhugegraph-llm/src/hugegraph_llm/nodes/index_node/vector_query_node.pyhugegraph-llm/src/hugegraph_llm/flows/build_vector_index.pyhugegraph-llm/src/hugegraph_llm/flows/prompt_generate.pyhugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.pyhugegraph-llm/src/hugegraph_llm/flows/build_schema.pyhugegraph-llm/src/hugegraph_llm/flows/text2gremlin.pyhugegraph-llm/src/hugegraph_llm/nodes/index_node/semantic_id_query_node.pyhugegraph-llm/src/hugegraph_llm/flows/import_graph_data.pyhugegraph-llm/src/hugegraph_llm/state/ai_state.pyhugegraph-llm/src/hugegraph_llm/nodes/util.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_raw.pyhugegraph-llm/src/hugegraph_llm/flows/scheduler.pyhugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.pyhugegraph-llm/src/hugegraph_llm/nodes/base_node.pyhugegraph-llm/src/hugegraph_llm/nodes/document_node/chunk_split.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.pyhugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.pyhugegraph-llm/src/hugegraph_llm/nodes/common_node/merge_rerank_node.pyhugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.pyhugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_vector_only.pyhugegraph-llm/src/hugegraph_llm/nodes/index_node/build_gremlin_example_index.pyhugegraph-llm/src/hugegraph_llm/nodes/index_node/gremlin_example_index_query.pyhugegraph-llm/src/hugegraph_llm/flows/intent_detector.pyhugegraph-llm/src/hugegraph_llm/flows/build_example_index.py
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/**/*.py
📄 CodeRabbit inference engine (hugegraph-llm/AGENTS.md)
Implement the Gradio UI application under src/hugegraph_llm/demo/rag_demo/
Files:
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.py
🧠 Learnings (18)
📓 Common learnings
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/operators/graph_rag_task.py : Maintain the Graph RAG pipeline in src/hugegraph_llm/operators/graph_rag_task.py
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/demo/rag_demo/**/*.py : Implement the Gradio UI application under src/hugegraph_llm/demo/rag_demo/
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/indices/**/*.py : Store vector and graph indexing code under src/hugegraph_llm/indices/
Applied to files:
hugegraph-llm/src/hugegraph_llm/nodes/llm_node/schema_build.pyhugegraph-llm/src/hugegraph_llm/flows/graph_extract.pyhugegraph-llm/src/hugegraph_llm/nodes/index_node/vector_query_node.pyhugegraph-llm/src/hugegraph_llm/flows/build_vector_index.pyhugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.pyhugegraph-llm/src/hugegraph_llm/flows/build_schema.pyhugegraph-llm/src/hugegraph_llm/flows/import_graph_data.pyhugegraph-llm/src/hugegraph_llm/nodes/util.pyhugegraph-llm/src/hugegraph_llm/flows/scheduler.pyhugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.pyhugegraph-llm/src/hugegraph_llm/nodes/base_node.pyhugegraph-llm/src/hugegraph_llm/nodes/document_node/chunk_split.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.pyhugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.pyhugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.pyhugegraph-llm/src/hugegraph_llm/nodes/index_node/build_gremlin_example_index.pyhugegraph-llm/src/hugegraph_llm/nodes/index_node/gremlin_example_index_query.pyhugegraph-llm/src/hugegraph_llm/flows/build_example_index.pyhugegraph-llm/pyproject.toml
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/operators/**/*.py : Put core processing pipelines under src/hugegraph_llm/operators/
Applied to files:
hugegraph-llm/src/hugegraph_llm/nodes/llm_node/schema_build.pyhugegraph-llm/src/hugegraph_llm/flows/graph_extract.pyhugegraph-llm/src/hugegraph_llm/flows/build_vector_index.pyhugegraph-llm/src/hugegraph_llm/flows/prompt_generate.pyhugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.pyhugegraph-llm/src/hugegraph_llm/flows/build_schema.pyhugegraph-llm/src/hugegraph_llm/flows/text2gremlin.pyhugegraph-llm/src/hugegraph_llm/flows/import_graph_data.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.pyhugegraph-llm/src/hugegraph_llm/flows/scheduler.pyhugegraph-llm/src/hugegraph_llm/nodes/base_node.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.pyhugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.pyhugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.pyhugegraph-llm/src/hugegraph_llm/flows/build_example_index.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/operators/graph_rag_task.py : Maintain the Graph RAG pipeline in src/hugegraph_llm/operators/graph_rag_task.py
Applied to files:
hugegraph-llm/src/hugegraph_llm/flows/graph_extract.pyhugegraph-llm/src/hugegraph_llm/flows/build_vector_index.pyhugegraph-llm/src/hugegraph_llm/flows/prompt_generate.pyhugegraph-llm/src/hugegraph_llm/flows/build_schema.pyhugegraph-llm/src/hugegraph_llm/flows/text2gremlin.pyhugegraph-llm/src/hugegraph_llm/flows/import_graph_data.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_raw.pyhugegraph-llm/src/hugegraph_llm/flows/scheduler.pyhugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.pyhugegraph-llm/src/hugegraph_llm/nodes/document_node/chunk_split.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.pyhugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.pyhugegraph-llm/src/hugegraph_llm/nodes/common_node/merge_rerank_node.pyhugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.pyhugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_vector_only.pyhugegraph-llm/src/hugegraph_llm/flows/build_example_index.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/operators/gremlin_generate_task.py : Maintain the Text2Gremlin pipeline in src/hugegraph_llm/operators/gremlin_generate_task.py
Applied to files:
hugegraph-llm/src/hugegraph_llm/flows/graph_extract.pyhugegraph-llm/src/hugegraph_llm/flows/build_vector_index.pyhugegraph-llm/src/hugegraph_llm/flows/prompt_generate.pyhugegraph-llm/src/hugegraph_llm/flows/build_schema.pyhugegraph-llm/src/hugegraph_llm/flows/text2gremlin.pyhugegraph-llm/src/hugegraph_llm/flows/import_graph_data.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.pyhugegraph-llm/src/hugegraph_llm/flows/scheduler.pyhugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.pyhugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.pyhugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.pyhugegraph-llm/src/hugegraph_llm/nodes/index_node/build_gremlin_example_index.pyhugegraph-llm/src/hugegraph_llm/nodes/index_node/gremlin_example_index_query.pyhugegraph-llm/src/hugegraph_llm/flows/build_example_index.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/**/*.py : Type-check Python code with mypy
Applied to files:
hugegraph-llm/src/hugegraph_llm/flows/graph_extract.pyhugegraph-llm/src/hugegraph_llm/flows/prompt_generate.pyhugegraph-llm/src/hugegraph_llm/flows/build_schema.pyhugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.pyhugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.pyhugegraph-llm/src/hugegraph_llm/flows/build_example_index.pyhugegraph-llm/pyproject.toml
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/operators/kg_construction_task.py : Maintain the KG Construction pipeline in src/hugegraph_llm/operators/kg_construction_task.py
Applied to files:
hugegraph-llm/src/hugegraph_llm/flows/graph_extract.pyhugegraph-llm/src/hugegraph_llm/flows/build_vector_index.pyhugegraph-llm/src/hugegraph_llm/flows/prompt_generate.pyhugegraph-llm/src/hugegraph_llm/flows/build_schema.pyhugegraph-llm/src/hugegraph_llm/flows/text2gremlin.pyhugegraph-llm/src/hugegraph_llm/flows/import_graph_data.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.pyhugegraph-llm/src/hugegraph_llm/flows/scheduler.pyhugegraph-llm/src/hugegraph_llm/nodes/document_node/chunk_split.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.pyhugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.pyhugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.pyhugegraph-llm/src/hugegraph_llm/flows/build_example_index.py
📚 Learning: 2025-10-21T07:20:54.516Z
Learnt from: weijinglin
Repo: hugegraph/hugegraph-ai PR: 54
File: hugegraph-llm/src/hugegraph_llm/flows/rag_flow_raw.py:55-55
Timestamp: 2025-10-21T07:20:54.516Z
Learning: In hugegraph-llm flows, the `prepared_input.schema` field in RAG flows (rag_flow_raw.py, rag_flow_vector_only.py, rag_flow_graph_vector.py, rag_flow_graph_only.py) is intentionally assigned `huge_settings.graph_name` (a string graph name) instead of using `prepared_input.graph_name`. This is legacy design where the underlying Operator's schema field is polymorphic and accepts either JSON schema objects or graph name strings, branching internally based on content type. This pattern should not be flagged as incorrect.
Applied to files:
hugegraph-llm/src/hugegraph_llm/flows/graph_extract.pyhugegraph-llm/src/hugegraph_llm/flows/build_schema.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_raw.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_vector_only.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/resources/demo/config_prompt.yaml : Keep prompt configuration in src/hugegraph_llm/resources/demo/config_prompt.yaml
Applied to files:
hugegraph-llm/src/hugegraph_llm/flows/prompt_generate.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py
📚 Learning: 2025-06-25T09:50:06.213Z
Learnt from: day0n
Repo: hugegraph/hugegraph-ai PR: 16
File: hugegraph-llm/src/hugegraph_llm/config/models/base_prompt_config.py:124-137
Timestamp: 2025-06-25T09:50:06.213Z
Learning: Language-specific prompt attributes (answer_prompt_CN, answer_prompt_EN, extract_graph_prompt_CN, extract_graph_prompt_EN, gremlin_generate_prompt_CN, gremlin_generate_prompt_EN, keywords_extract_prompt_CN, keywords_extract_prompt_EN, doc_input_text_CN, doc_input_text_EN) are defined in the PromptConfig class in hugegraph-llm/src/hugegraph_llm/config/prompt_config.py, which inherits from BasePromptConfig, making these attributes accessible in the parent class methods.
Applied to files:
hugegraph-llm/src/hugegraph_llm/flows/prompt_generate.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/api/**/*.py : Place FastAPI endpoint modules under src/hugegraph_llm/api/
Applied to files:
hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/config/**/*.py : Keep configuration management code under src/hugegraph_llm/config/
Applied to files:
hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/utils/**/*.py : Place utilities, logging, and decorators under src/hugegraph_llm/utils/
Applied to files:
hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.pyhugegraph-llm/src/hugegraph_llm/nodes/util.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/demo/rag_demo/**/*.py : Implement the Gradio UI application under src/hugegraph_llm/demo/rag_demo/
Applied to files:
hugegraph-llm/src/hugegraph_llm/flows/import_graph_data.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_raw.pyhugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.pyhugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/**/*.py : Adhere to ruff code style for Python code
Applied to files:
hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/models/**/*.py : Implement LLM, embedding, and reranker models under src/hugegraph_llm/models/
Applied to files:
hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.pyhugegraph-llm/src/hugegraph_llm/nodes/common_node/merge_rerank_node.pyhugegraph-llm/src/hugegraph_llm/nodes/index_node/build_gremlin_example_index.py
📚 Learning: 2025-06-25T09:45:10.751Z
Learnt from: day0n
Repo: hugegraph/hugegraph-ai PR: 16
File: hugegraph-llm/src/hugegraph_llm/config/models/base_prompt_config.py:100-116
Timestamp: 2025-06-25T09:45:10.751Z
Learning: In hugegraph-llm BasePromptConfig class, llm_settings is a runtime property that is loaded from config through dependency injection during object initialization, not a static class attribute. Static analysis tools may flag this as missing but it's intentional design.
Applied to files:
hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.pyhugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py
📚 Learning: 2025-05-27T06:55:13.779Z
Learnt from: cgwer
Repo: hugegraph/hugegraph-ai PR: 10
File: hugegraph-python-client/pyproject.toml:0-0
Timestamp: 2025-05-27T06:55:13.779Z
Learning: The hugegraph-python-client is a component within the hugegraph-ai project repository (apache/incubator-hugegraph-ai), not a standalone repository. When reviewing project URLs in pyproject.toml files within this project, they should point to the main hugegraph-ai repository.
Applied to files:
hugegraph-llm/pyproject.toml
🧬 Code graph analysis (2)
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.py (3)
hugegraph-llm/src/hugegraph_llm/flows/scheduler.py (3)
SchedulerSingleton(179-189)get_instance(184-189)schedule_stream_flow(141-176)hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py (3)
IntentDetectorSingleton(214-224)get_instance(219-224)detect(146-212)hugegraph-llm/src/hugegraph_llm/flows/__init__.py (1)
FlowName(21-34)
hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py (2)
hugegraph-llm/src/hugegraph_llm/flows/__init__.py (1)
FlowName(21-34)hugegraph-llm/src/hugegraph_llm/models/llms/base.py (1)
BaseLLM(22-74)
🪛 GitHub Actions: License header & 3rd-party check
hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py
[error] 1-1: license-eye header check failed: File does not have a valid license header. Command: 'license-eye -v info -c .licenserc.yaml header check'. Process exited with code 1.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build (3.11)
- GitHub Check: build (3.10)
🔇 Additional comments (30)
hugegraph-llm/src/hugegraph_llm/nodes/llm_node/schema_build.py (1)
18-18: 导入路径更新正确。导入从
PyCGraph改为pycgraph与 PR 目标一致,并符合 Python 模块命名约定。hugegraph-llm/src/hugegraph_llm/nodes/document_node/chunk_split.py (1)
16-16: LGTM!导入路径修正正确。此导入路径从
PyCGraph更正为pycgraph符合 PR 目标,是代码库范围内的标准化修复的一部分。更改不影响功能逻辑。hugegraph-llm/src/hugegraph_llm/nodes/util.py (1)
18-18: 导入路径更新正确此更改将导入从
PyCGraph更新为pycgraph,与整个代码库的重构保持一致。变更简单直接,没有逻辑修改。hugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py (1)
16-16: 导入更正已验证无误。验证结果显示:
- 文件第 16 行已正确更改为
from pycgraph import CStatus- 代码库中所有 28 个文件的 pycgraph 导入均已统一使用小写形式,无遗留的
PyCGraph引用CStatus在 prompt_generate.py 中正确导入并在第 41 行使用- pycgraph 已在 pyproject.toml 中声明为项目依赖
此项更改正确完整,无需进一步修改。
hugegraph-llm/src/hugegraph_llm/nodes/index_node/vector_query_node.py (1)
49-49: 导入路径修正已验证通过。导入语句从
PyCGraph修正为pycgraph符合本 PR 的全仓库范围重构目标。依赖验证确认:pycgraph==3.2.2 已在./hugegraph-llm/pyproject.toml中正确声明和锁定,导入路径与依赖配置完全匹配。代码更改无误。hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.py (1)
20-20: 导入路径修正无误,已验证。脚本结果确认
pycgraph模块正确导出了CStatus。在整个代码库中至少 18 个文件已成功使用from pycgraph import CStatus进行导入,且均正常运行。此修改是代码库范围内从PyCGraph到pycgraph的一致迁移,符合项目标准。hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py (1)
18-18: 导入变更已验证无误,无遗漏问题。根据脚本执行结果确认:
- 无任何遗留的旧
PyCGraph导入——所有导入已完全更新为小写pycgraph- 整个代码库中
CStatus的导入和使用完全一致(25+ 文件中的导入均为from pycgraph import CStatus)- 文件
schema.py在第 18 行的导入与其他文件完全一致,第 50、58 行的使用也验证无误- 文件总行数 69 行,符合 600 行的代码可维护性要求
代码变更经验证已完成,无需进一步修改。
hugegraph-llm/src/hugegraph_llm/nodes/index_node/gremlin_example_index_query.py (2)
51-55: 确认CStatus的构造函数签名。第 55 行使用
CStatus(-1, str(err))创建状态对象。请确保pycgraph.CStatus的构造函数签名与原PyCGraph.CStatus相同,接受状态码和错误消息作为参数。
20-20: 确认导入路径重构的一致性及pycgraph==3.2.2依赖项的可用性。根据脚本执行结果验证:
导入路径重构一致性:项目内已完全迁移,无残留
from PyCGraph导入,所有 26+ 个文件均已更新为from pycgraph格式,包括目标文件第 20 行的from pycgraph import CStatus变更。依赖项声明:
hugegraph-llm/pyproject.toml明确声明"pycgraph==3.2.2"作为项目依赖。包可用性风险:Web 搜索未能定位 PyPI 上的
pycgraph3.2.2 版本发布记录。需要确认:
- 该版本是否为 Apache HugeGraph 项目专用的内部/私有包
- 版本号
3.2.2是否准确有效- 包是否已正确发布至项目依赖源
建议:在合并此变更前,请手动验证依赖项解析是否成功,并确认
pycgraph包的实际版本可用性。hugegraph-llm/src/hugegraph_llm/nodes/base_node.py (1)
17-17: 验证 pycgraph 依赖包的可用性和 API 兼容性。CGraph 提供 Python 版本 PyCGraph,导入路径从
PyCGraph更新为pycgraph看似是包名规范化。虽然 确认存在 GNode 和 CStatus 类,但无法通过公开资源验证 pycgraph 3.2.2 版本的具体可用性、API 兼容性或是否为简单重命名。建议手动验证该包的实际可用性和与旧版本的兼容性,确保代码库运行正常。
hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.py (1)
16-16: 导入路径更正已验证,此项更改符合项目规范。此更改将导入从
PyCGraph更正为pycgraph,与 hugegraph-llm 的项目级导入规范化一致。CStatus 是 CGraph 框架中的核心类,用于函数返回状态处理,在文件的第 36 和 45 行用于错误处理,逻辑未受影响。历史代码中使用
from PyCGraph import CStatus导入,当前 PR 将其规范化为小写模块名称,符合 Python 命名约定。hugegraph-llm/src/hugegraph_llm/nodes/index_node/semantic_id_query_node.py (1)
18-18: 导入路径更正正确,整个项目已系统性迁移。验证结果:
- 导入语句
from pycgraph import CStatus在第18行正确- CStatus 在文件中正确使用(第44行、第83行作为状态返回值)
- 项目内28个文件已系统性完成从
PyCGraph到pycgraph的迁移,无遗漏- 所有文件导入一致,说明这是有意的、完整的包名修正
需要注意的是: Web 搜索未能找到公开的 pycgraph 3.2.2 版本发布记录,
pycgraph包可能是 HugeGraph 项目的内部依赖。代码变更本身没有问题,但请确认你的开发环境已正确配置 pycgraph 3.2.2 依赖。hugegraph-llm/src/hugegraph_llm/flows/scheduler.py (1)
18-18: LGTM!导入路径修正正确。从 PyCGraph 到 pycgraph 的导入路径修正符合本 PR 的整体重构目标,与 pyproject.toml 中将 pycgraph 固定到 3.2.2 版本的更改一致。
hugegraph-llm/src/hugegraph_llm/flows/prompt_generate.py (1)
16-16: LGTM!导入路径修正正确。导入路径从 PyCGraph 更正为 pycgraph,与整个代码库的重构保持一致。
hugegraph-llm/src/hugegraph_llm/state/ai_state.py (1)
17-17: LGTM!导入路径修正正确。将 GParam 和 CStatus 的导入从 PyCGraph 更正为 pycgraph,与项目的依赖更新保持一致。
hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.py (2)
19-19: LGTM!导入路径修正正确。导入路径已从 PyCGraph 更正为 pycgraph。
33-47: LGTM!流程描述常量添加得当。新增的 RAGGRAPHVECTOR_FLOW_DESC 和 RAGGRAPHVECTOR_FLOW_DETAIL 常量为混合图向量检索流程提供了清晰的元数据描述。这些常量很可能被新的意图检测器用于自动模式的流程路由和参数提取。
gremlin_tmpl_num 参数的描述详细说明了何时使用值 3(清晰的图查询语义)和 -1(语义模糊),这有助于自动参数提取的准确性。
hugegraph-llm/src/hugegraph_llm/flows/rag_flow_raw.py (2)
19-19: LGTM!导入路径修正正确。导入路径已从 PyCGraph 更正为 pycgraph。
27-40: LGTM!流程描述常量定义清晰。新增的 RAGRAW_FLOW_DESC 和 RAGRAW_FLOW_DETAIL 常量准确描述了直接 LLM 问答流程,该流程不使用外部知识增强。required_params 中仅要求 query 参数,符合纯 LLM 场景的预期。
hugegraph-llm/src/hugegraph_llm/flows/rag_flow_vector_only.py (2)
19-19: LGTM!导入路径修正正确。导入路径已从 PyCGraph 更正为 pycgraph。
29-42: LGTM!流程描述常量定义准确。新增的 RAGVECTORONLY_FLOW_DESC 和 RAGVECTORONLY_FLOW_DETAIL 常量清晰描述了仅基于向量检索的流程,不使用图增强。required_params 中仅要求 query 参数,符合纯向量检索场景的预期。
hugegraph-llm/pyproject.toml (1)
65-65: pycgraph 3.2.2 版本验证完成,无问题发现。验证结果确认:
- pycgraph 3.2.2 版本存在于 PyPI 中,且为最新发行版本
- 无已知安全漏洞
该依赖项版本固定方式恰当,确保了构建的可重现性。代码变更无需调整。
hugegraph-llm/src/hugegraph_llm/flows/build_vector_index.py (1)
18-18: GPipeline 导入路径切换为 pycgraph 看起来没问题这里将
GPipeline的导入从PyCGraph统一调整为pycgraph,与本仓库其它 flow 文件保持一致,构建和运行逻辑未改变,这里可以接受。hugegraph-llm/src/hugegraph_llm/flows/text2gremlin.py (1)
18-18: Text2GremlinFlow 使用 pycgraph.GPipeline 的修改是合理的仅调整了
GPipeline的导入来源为pycgraph,与其它 flow 的统一改动一致,文本转 Gremlin 的流程本身未受影响,这里没有问题。hugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.py (1)
16-16: 导入路径更正正确。与项目范围内的依赖调整保持一致,从
PyCGraph更新为pycgraph。hugegraph-llm/src/hugegraph_llm/flows/build_example_index.py (1)
19-19: 导入路径更新符合项目标准。与整个代码库的 pycgraph 导入标准化保持一致。
hugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.py (1)
18-18: 导入更改正确。pycgraph 导入路径已正确更新。
hugegraph-llm/src/hugegraph_llm/flows/build_schema.py (1)
18-18: 导入路径调整正确。与项目依赖更新保持一致。
hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py (1)
17-17: 导入路径更新正确。完成从
PyCGraph到pycgraph的标准化更新。hugegraph-llm/src/hugegraph_llm/flows/import_graph_data.py (1)
19-19: 导入路径已正确验证。pycgraph 3.2.2 版本已确认在 PyPI 上存在,项目依赖中精确指定为
pycgraph==3.2.2。导入语句from pycgraph import GPipeline正确,与代码库中其他 20+ 个文件的导入方式保持一致。GPipeline 类正确导出,line 49 的使用无误。
| INTENT_DETECTOR_PROMPT = """ | ||
| # ROLE | ||
| You are an expert AI assistant that functions as a flow router. Your primary responsibility is to analyze a user's query and select the most appropriate flow from a provided list to handle the request. | ||
|
|
||
| # INSTRUCTIONS | ||
| 1. Carefully examine the user's query to understand their underlying intent. | ||
| 2. Review the list of `AVAILABLE_FLOWS`. For each flow, pay close attention to its `desc` (description). | ||
| 3. Select the single best flow based on query characteristics: | ||
| - **Graph-only queries**: Use when the query focuses on relationships, connections, paths, network analysis, or graph traversal (e.g., "How are A and B connected?", "What's the shortest path between X and Y?", "Show me the network of relationships around Z") | ||
| - **Vector-only queries**: Use when the query seeks factual information, definitions, descriptions, or content similarity (e.g., "What kind of person is X?", "Tell me about Y", "Describe the characteristics of Z") | ||
| - **Hybrid queries**: Use when the query combines both relationship exploration AND factual retrieval, or when context from both graph structure and content semantics would enhance the answer | ||
| 4. If no flow is suitable for the query, you MUST choose "none". | ||
| 5. Your final output MUST be a single flow name string. Do not add any explanation or conversational text. | ||
|
|
||
| # AVAILABLE_FLOWS | ||
| Here is the list of flows you can choose from: | ||
| {{flow_list}} | ||
|
|
There was a problem hiding this comment.
intent 检测流程存在占位符拼写错误与 “none” 结果未处理,可能导致运行时异常
当前实现有几处会直接影响自动模式的稳定性:
-
Prompt 占位符名称不匹配(flow_list/tool_list)
- 模板中使用的是
{{flow_list}}(见行 29-31),但在detect()中通过INTENT_DETECTOR_PROMPT.replace("{{tool_list}}", tools_str)进行替换(行 154)。 - 结果是
{{flow_list}}根本没有被替换,LLM 收不到可用的 flow 列表描述,路由效果会明显变差甚至随机。
- 模板中使用的是
-
LLM 返回
"none"时会抛出 KeyError- Prompt 明确要求“不适合时必须输出
none”,但逻辑里直接访问self.flow_message[tool_result](行 159),当tool_result == "none"时这里会触发 KeyError,而不会优雅地回传“无合适 flow”。 - 后续
rag_block.rag_answer_streaming是按tool_name == "none"来判断 “No suitable flow found” 的,当前代码会在这之前就异常。
- Prompt 明确要求“不适合时必须输出
-
flow 标志字典与返回值类型可以更一致(非阻塞)
flow_flags的 key 使用FlowName,而tool_result为字符串;虽然FlowName(str, Enum)在运行时可兼容,但从 mypy 和可读性角度,建议统一都用FlowName,detect()返回的tool_name也用FlowName,调用方再去比较。
建议一并修复,示例 diff 如下(仅示意):
@@
- tool_descs = []
- for flow in flow_list:
- if flow in self.flow_message:
- tool_descs.append(self.flow_message[flow]["desc"])
- tools_str = "\n\n".join(tool_descs)
- prompt = INTENT_DETECTOR_PROMPT.replace("{{tool_list}}", tools_str)
+ tool_descs: list[str] = []
+ for flow in flow_list:
+ if flow in self.flow_message:
+ tool_descs.append(self.flow_message[flow]["desc"])
+ tools_str = "\n\n".join(tool_descs)
+ prompt = INTENT_DETECTOR_PROMPT.replace("{{flow_list}}", tools_str)
@@
- tool_result = await self.llm_client.agenerate(prompt=prompt)
- tool_result = tool_result.strip()
- # expected tool_result belong to [4 kinds of Flow]
- detail = None if self.flow_message[tool_result] is None else self.flow_message[tool_result]["detail"]
- if detail is None:
- raise ValueError("LLM返回的flow类型不在支持的RAGFlow范围内!")
+ tool_result = (await self.llm_client.agenerate(prompt=prompt)).strip()
+
+ # LLM 可能返回 "none" 表示无合适 flow,直接短路返回
+ if tool_result == "none":
+ return {"tool_name": "none", "parameters": {}}
+
+ if tool_result not in self.flow_message:
+ raise ValueError("LLM返回的 flow 类型不在支持的 RAGFlow 范围内!")
+
+ # 这里仍然用字符串 key 访问,保持与 flow_message 定义一致
+ detail = self.flow_message[tool_result]["detail"]
@@
- flow_flags = {
- FlowName.RAG_RAW: {
+ flow_flags: dict[str, dict[str, Any]] = {
+ FlowName.RAG_RAW: {
@@
- if tool_result in flow_flags:
- result["parameters"].update(flow_flags[tool_result])
+ if tool_result in flow_flags:
+ result["parameters"].update(flow_flags[tool_result])同时,结合实际调用处(rag_block.py 使用的是 FlowName.RAG_RAW 等),也可以考虑把 detect() 的签名改为 flow_list: list[FlowName],以及让 tool_name 返回 FlowName 类型,类型层面会更清晰。
Also applies to: 146-212
🤖 Prompt for AI Agents
In hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py around lines 15-32
(and related logic in 146-212), the prompt placeholder and result handling are
wrong: update the prompt replacement to use "{{flow_list}}" (not
"{{tool_list}}") so the LLM receives the available flows; add explicit handling
for the LLM returning "none" (check if tool_result == "none" before indexing
into flow map and return a clear sentinel like None or FlowName.NONE or raise a
controlled exception instead of allowing a KeyError); and make the flow
flags/return values consistent by using the FlowName enum type end-to-end
(consider changing detect() signature to accept/return FlowName or
Optional[FlowName] to maintain type-safety and avoid string/enum mismatches).
| # limitations under the License. | ||
|
|
||
| from PyCGraph import CStatus | ||
| from pycgraph import CStatus |
There was a problem hiding this comment.
🧩 Analysis chain
导入路径修正正确。
将导入从 PyCGraph 更正为 pycgraph 符合 Python 包命名约定(使用小写)。此更改与整个 PR 中的一致性修正相符。
请验证 pycgraph 版本 3.2.2 是否存在且正确导出 CStatus:
🏁 Script executed:
#!/bin/bash
# 描述:验证 pycgraph 包版本和 CStatus 导出
# 检查 pyproject.toml 中的版本固定
echo "=== Checking pycgraph version in pyproject.toml ==="
fd -e toml pyproject.toml --exec cat {} | grep -A 2 -B 2 pycgraph
# 验证 CStatus 在代码库中的所有使用
echo -e "\n=== Verifying CStatus usage across codebase ==="
rg -n --type py "CStatus" -C 2Length of output: 20362
🌐 Web query:
pycgraph 3.2.2 pypi package
💡 Result:
I couldn't find a PyPI release pycgraph 3.2.2. The PyPI project shows latest release 3.1.2 (uploaded Sep 6, 2025). [1]
If you want to confirm or install the latest published version: pip index versions pycgraph or pip install pycgraph (or pip install pycgraph==3.1.2). [1]
Sources:
[1] PyPI — pycgraph project page.
无法安装:pycgraph 版本 3.2.2 不存在于 PyPI。
导入路径修正本身是正确的(pycgraph 正确导出 CStatus),但 pyproject.toml 中的版本固定存在问题:PyPI 上最新发布版本是 3.1.2(2025年9月6日),不存在版本 3.2.2。
需要修正:
- 更新 pyproject.toml 中
pycgraph==3.2.2为pycgraph==3.1.2(或确认 3.2.2 是否为预发版本)
此修正是在修改导入路径之前的前置条件,否则依赖安装将失败。
🤖 Prompt for AI Agents
In
hugegraph-llm/src/hugegraph_llm/nodes/index_node/build_gremlin_example_index.py
around line 16, the review notes that pycgraph==3.2.2 pinned in pyproject.toml
does not exist on PyPI (latest is 3.1.2), which will make dependency
installation fail before the import change; update pyproject.toml to pin
pycgraph==3.1.2 (or explicitly allow a prerelease if 3.2.2 is intentional) and
re-run dependency resolution to ensure the import change succeeds.
Code Modifications Summary
Modified Files Overview
Feature 1: Automatic Flow Selection
Feature 2: Intelligent Parameter Extraction
Feature 3: Two-Stage LLM Prompting Architecture
Feature 4: UI Auto Mode Toggle
Query Processing Flow
sequenceDiagram participant User as User Interface participant RAGBlock as RAG Demo Block participant Detector as IntentDetector participant LLM as LLM Service participant Flows as RAG Flows participant KB as Knowledge Base User->>RAGBlock: Submit Query + Auto Mode Flag alt Auto Mode Enabled RAGBlock->>Detector: detect(query, flow_list) Detector->>Detector: Build Stage 1 Prompt<br/>(flow descriptions) Detector->>LLM: Request Flow Classification LLM-->>Detector: Return flow_name<br/>(e.g., "rag_graph_only") Detector->>Detector: Build Stage 2 Prompt<br/>(flow details + query) Detector->>LLM: Request Parameter Extraction LLM-->>Detector: Return parameters JSON<br/>(with gremlin_tmpl_num) Detector->>Detector: Add Flow Flags<br/>(vector_search, graph_search, etc.) Detector-->>RAGBlock: Return {tool_name, parameters} else Manual Mode User->>RAGBlock: Select flow manually RAGBlock->>RAGBlock: Use manual selection end RAGBlock->>Flows: schedule_stream_flow(flow_key, parameters) alt RAG_GRAPH_ONLY or RAG_GRAPH_VECTOR Flows->>KB: Query with gremlin_tmpl_num alt gremlin_tmpl_num == 3 KB->>KB: Use Text-to-Gremlin else gremlin_tmpl_num == -1 KB->>KB: Use Subquery Approach end KB-->>Flows: Graph Results end alt RAG_VECTOR_ONLY or RAG_GRAPH_VECTOR Flows->>KB: Vector Search KB-->>Flows: Vector Results end Flows->>Flows: Reranking (if needed) Flows->>LLM: Generate Answer with Context LLM-->>Flows: Final Answer Flows-->>RAGBlock: Stream Results RAGBlock-->>User: Display AnswerDecision Tree for Flow Selection
graph TD A["User Query Received"] --> B{Is Query About<br/>Relationships?<br/>Connections?<br/>Paths?} B -->|Yes| C{Need Factual<br/>Context Too?} C -->|Yes| D["RAG_GRAPH_VECTOR<br/>Hybrid Mode"] C -->|No| E["RAG_GRAPH_ONLY<br/>Pure Graph Mode"] B -->|No| F{Is Query About<br/>Factual Info?<br/>Descriptions?<br/>Definitions?} F -->|Yes| G["RAG_VECTOR_ONLY<br/>Vector Search Mode"] F -->|No| H{Can Query Be<br/>Answered by<br/>LLM Alone?} H -->|Yes| I["RAG_RAW<br/>Pure LLM Mode"] H -->|No| J["No Suitable Flow<br/>Return Error"] D --> K["Extract gremlin_tmpl_num"] E --> K K --> L{Is Graph Query<br/>Clear & Unambiguous?} L -->|Yes| M["gremlin_tmpl_num = 3<br/>Text-to-Gremlin"] L -->|No| N["gremlin_tmpl_num = -1<br/>Subquery Approach"] G --> O["Execute RAG Pipeline"] M --> O N --> O I --> OSummary
These two commits implement a comprehensive Intelligent RAG Flow Routing System that transforms the HugeGraph-LLM system from manual configuration to fully automated, LLM-driven query processing. The system intelligently selects the optimal retrieval strategy and graph query algorithm based on query semantics, significantly improving usability and system adaptability.
Summary by CodeRabbit
发布说明
新特性
优化