Skip to content
This repository was archived by the owner on Dec 28, 2025. It is now read-only.

feat: support auto mode for rag usecases#58

Open
weijinglin wants to merge 4 commits intomainfrom
auto-flow
Open

feat: support auto mode for rag usecases#58
weijinglin wants to merge 4 commits intomainfrom
auto-flow

Conversation

@weijinglin
Copy link
Collaborator

@weijinglin weijinglin commented Nov 15, 2025

Code Modifications Summary

Modified Files Overview

  1. rag_block.py
  • Purpose: UI component for RAG (Retrieval-Augmented Generation) demonstration
  • Changes:
    • Added auto_mode parameter to enable automatic flow selection
    • Integrated IntentDetectorSingleton for intelligent flow routing
    • Implemented conditional logic: when auto_mode=True, uses LLM-based intent detection; otherwise, uses manual selection
    • Added toggle_manual_options() callback to disable manual options when auto mode is active
  1. intent_detector.py
  • Purpose: LLM-based intent detection module for automatic RAG flow selection
  • Key Components:
    • IntentDetector class: Core logic for query analysis and flow selection
    • Two-stage LLM prompting system:
      1. Stage 1 (Flow Selection): Analyzes user query and selects appropriate flow (RAG_RAW, RAG_VECTOR_ONLY, RAG_GRAPH_ONLY, RAG_GRAPH_VECTOR)
      2. Stage 2 (Parameter Extraction): Extracts relevant parameters (especially gremlin_tmpl_num) based on query semantics
    • IntentDetectorSingleton: Thread-safe singleton pattern for instance management
    • Smart parameter extraction: Determines whether to use text-to-Gremlin (value=3) or subquery-based approach (value=-1)
  1. Flow Definition Files
  • Changes: Refactored prompt exports into two types:
    • *_FLOW_DESC: Brief description for flow router selection
    • *_FLOW_DETAIL: Detailed parameter specifications for parameter extraction
  • Purpose: Provides metadata for the two-stage intent detection process

Feature 1: Automatic Flow Selection

  • What: LLM intelligently routes queries to the most suitable RAG flow
  • Selection Logic:
    • Graph-only: Relationship, connection, path, network queries (e.g., "How are Tesla and Elon connected?")
    • Vector-only: Factual, definition, description queries (e.g., "What kind of person is Elon Musk?")
    • Hybrid (Graph+Vector): Combined relationship and factual queries
    • Raw: Fallback to pure LLM when no graph/vector search is suitable
  • Impact: Reduces manual configuration overhead; improves UX for non-technical users

Feature 2: Intelligent Parameter Extraction

  • What: Automatic extraction of critical parameters from natural language queries
  • Key Parameter: gremlin_tmpl_num (controls graph query generation strategy)
    • 3: Uses text-to-Gremlin approach (direct query translation)
    • -1: Uses subquery approach (for ambiguous queries)
  • Impact: Enables dynamic optimization of graph search algorithm selection

Feature 3: Two-Stage LLM Prompting Architecture

  • Stage 1: Flow type classification (simple output: flow name only)
  • Stage 2: Parameter extraction (detailed output: JSON with parameters)
  • Impact: Reduces token usage; improves reliability through separation of concerns

Feature 4: UI Auto Mode Toggle

  • What: New radio button in Gradio UI to switch between manual and automatic modes
  • Behavior:
    • When enabled: disables all manual flow selection options
    • When disabled: allows user to manually select flow options
  • Impact: Seamless mode switching without page reload

Query Processing Flow

sequenceDiagram
    participant User as User Interface
    participant RAGBlock as RAG Demo Block
    participant Detector as IntentDetector
    participant LLM as LLM Service
    participant Flows as RAG Flows
    participant KB as Knowledge Base
    
    User->>RAGBlock: Submit Query + Auto Mode Flag
    
    alt Auto Mode Enabled
        RAGBlock->>Detector: detect(query, flow_list)
        
        Detector->>Detector: Build Stage 1 Prompt<br/>(flow descriptions)
        Detector->>LLM: Request Flow Classification
        LLM-->>Detector: Return flow_name<br/>(e.g., "rag_graph_only")
        
        Detector->>Detector: Build Stage 2 Prompt<br/>(flow details + query)
        Detector->>LLM: Request Parameter Extraction
        LLM-->>Detector: Return parameters JSON<br/>(with gremlin_tmpl_num)
        
        Detector->>Detector: Add Flow Flags<br/>(vector_search, graph_search, etc.)
        Detector-->>RAGBlock: Return {tool_name, parameters}
    else Manual Mode
        User->>RAGBlock: Select flow manually
        RAGBlock->>RAGBlock: Use manual selection
    end
    
    RAGBlock->>Flows: schedule_stream_flow(flow_key, parameters)
    
    alt RAG_GRAPH_ONLY or RAG_GRAPH_VECTOR
        Flows->>KB: Query with gremlin_tmpl_num
        alt gremlin_tmpl_num == 3
            KB->>KB: Use Text-to-Gremlin
        else gremlin_tmpl_num == -1
            KB->>KB: Use Subquery Approach
        end
        KB-->>Flows: Graph Results
    end
    
    alt RAG_VECTOR_ONLY or RAG_GRAPH_VECTOR
        Flows->>KB: Vector Search
        KB-->>Flows: Vector Results
    end
    
    Flows->>Flows: Reranking (if needed)
    Flows->>LLM: Generate Answer with Context
    LLM-->>Flows: Final Answer
    
    Flows-->>RAGBlock: Stream Results
    RAGBlock-->>User: Display Answer
Loading

Decision Tree for Flow Selection

graph TD
    A["User Query Received"] --> B{Is Query About<br/>Relationships?<br/>Connections?<br/>Paths?}
    
    B -->|Yes| C{Need Factual<br/>Context Too?}
    C -->|Yes| D["RAG_GRAPH_VECTOR<br/>Hybrid Mode"]
    C -->|No| E["RAG_GRAPH_ONLY<br/>Pure Graph Mode"]
    
    B -->|No| F{Is Query About<br/>Factual Info?<br/>Descriptions?<br/>Definitions?}
    
    F -->|Yes| G["RAG_VECTOR_ONLY<br/>Vector Search Mode"]
    F -->|No| H{Can Query Be<br/>Answered by<br/>LLM Alone?}
    
    H -->|Yes| I["RAG_RAW<br/>Pure LLM Mode"]
    H -->|No| J["No Suitable Flow<br/>Return Error"]
    
    D --> K["Extract gremlin_tmpl_num"]
    E --> K
    
    K --> L{Is Graph Query<br/>Clear & Unambiguous?}
    
    L -->|Yes| M["gremlin_tmpl_num = 3<br/>Text-to-Gremlin"]
    L -->|No| N["gremlin_tmpl_num = -1<br/>Subquery Approach"]
    
    G --> O["Execute RAG Pipeline"]
    M --> O
    N --> O
    I --> O
Loading

Summary

These two commits implement a comprehensive Intelligent RAG Flow Routing System that transforms the HugeGraph-LLM system from manual configuration to fully automated, LLM-driven query processing. The system intelligently selects the optimal retrieval strategy and graph query algorithm based on query semantics, significantly improving usability and system adaptability.

Summary by CodeRabbit

发布说明

  • 新特性

    • 在 RAG 回答中新增“自动模式”,可基于输入自动选择并切换检索与融合流程
    • 在界面新增自动模式开关,切换时会禁用/启用手动选项并保持流式响应输出
  • 优化

    • 稳定依赖管理:固定 pycgraph 版本以提升构建和运行稳定性

@github-actions
Copy link

@codecov-ai-reviewer review

@coderabbitai
Copy link

coderabbitai bot commented Nov 15, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

该 PR 将 pycgraph 依赖锁定为 3.2.2,移除 pycgraph 的外部 Git 源;在 RAG 演示中为 rag_answer_streaming 添加了 auto_mode(并接入新建的 IntentDetector 自动路由及参数提取),并将大量导入路径从 PyCGraph 更正为 pycgraph,同时为多个 RAG 流程添加描述/参数常量。

Changes

内聚组 / 文件(s) 变更摘要
依赖管理
\pyproject.toml``
pycgraph 锁定到版本 3.2.2;移除 pycgraph 的外部 Git 源配置
RAG 自动模式逻辑 & UI
\src/hugegraph_llm/demo/rag_demo/rag_block.py``
rag_answer_streaming 添加 auto_mode: bool 参数;当启用时通过 IntentDetectorSingleton 自动检测并映射至 RAG 流程,使用 schedule_stream_flow 流式调度;保留手动选择路径;添加 UI 层 Auto Mode 开关及状态联动
意图检测器(新增模块)
\src/hugegraph_llm/flows/intent_detector.py``
新增 IntentDetector(基于 LLM 做流程选择与参数抽取)和线程安全单例 IntentDetectorSingleton;包含两个大段提示模板和参数解析逻辑
RAG 流程描述常量(新增)
\src/hugegraph_llm/flows/rag_flow_raw.py`, `.../rag_flow_vector_only.py`, `.../rag_flow_graph_only.py`, `.../rag_flow_graph_vector.py``(若干)
为各 RAG 流程新增 *_FLOW_DESC*_FLOW_DETAIL 常量(JSON 风格的元数据/参数说明)
流程模块:导入路径修正
\src/hugegraph_llm/flows/build_example_index.py`, `.../build_schema.py`, `.../build_vector_index.py`, `.../get_graph_index_info.py`, `.../graph_extract.py`, `.../import_graph_data.py`, `.../prompt_generate.py`, `.../scheduler.py`, `.../text2gremlin.py`, `.../update_vid_embeddings.py``
批量将 from PyCGraph import ... 替换为 from pycgraph import ...(如 GPipeline / GPipelineManager
节点模块:导入路径修正
\src/hugegraph_llm/nodes/.../*.py``(多文件,见变更)
批量修正导入来源:GNode/CStatus/GParam 等由 PyCGraphpycgraph
状态管理导入修正
\src/hugegraph_llm/state/ai_state.py``
修正导入:GParamCStatusPyCGraphpycgraph

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant UI as RAG UI
    participant Stream as rag_answer_streaming
    participant Detector as IntentDetectorSingleton
    participant LLM
    participant Scheduler as schedule_stream_flow
    participant Flow as RAG Flow

    User->>UI: 切换 Auto Mode / 提交查询
    UI->>Stream: 调用(text, auto_mode=..., 其他参数...)

    alt auto_mode = true
        Stream->>Detector: detect(text, flow_list)
        Detector->>LLM: 选择流(INTENT_DETECTOR_PROMPT)
        LLM-->>Detector: 返回选中流名
        Detector->>LLM: 提取参数(PARAMETER_EXTRACTOR_PROMPT)
        LLM-->>Detector: 返回参数 JSON
        Detector-->>Stream: {tool_name, parameters, flags}
        Stream->>Scheduler: schedule_stream_flow(flow_key, parameters)
    else auto_mode = false
        Stream->>Stream: 根据手动选项决定 flow_key & parameters
        Stream->>Scheduler: schedule_stream_flow(flow_key, parameters)
    end

    Scheduler->>Flow: 启动流式执行
    Flow-->>Stream: 流式结果片段
    Stream-->>UI: 异步迭代推送结果
    UI-->>User: 展示答案
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

重点审查项:

  • src/hugegraph_llm/demo/rag_demo/rag_block.py:自动/手动路径参数传递、错误分支与 reranker 切换警告、与 UI 的状态连通性。
  • src/hugegraph_llm/flows/intent_detector.py:LLM 提示构建、二次参数抽取与 JSON 解析的鲁棒性、异常处理与单例线程安全实现。
  • 新增/修改的 RAG 流程常量:确保描述与参数 schema 与调用方保持一致。
  • 大量导入替换:检查是否遗漏文件或导致命名/大小写相关导入错误。

Possibly related PRs

Poem

🐇 我是一只编程兔,跳进代码田野,
自动之门悄然开,意图随风辨方向。
pycgraph 名正言顺来,流程常量诉详详。
流式输出轻轻唱,开发路上更欢畅。 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed PR标题'feat: support auto mode for rag usecases'准确反映了主要改动——为RAG用例添加自动模式功能。
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch auto-flow

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 741c653 and 9ec3ee4.

📒 Files selected for processing (1)
  • hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
hugegraph-llm/**/*.py

📄 CodeRabbit inference engine (hugegraph-llm/AGENTS.md)

hugegraph-llm/**/*.py: Adhere to ruff code style for Python code
Type-check Python code with mypy
Keep each Python file under 600 lines for maintainability

Files:

  • hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/operators/graph_rag_task.py : Maintain the Graph RAG pipeline in src/hugegraph_llm/operators/graph_rag_task.py
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/demo/rag_demo/**/*.py : Implement the Gradio UI application under src/hugegraph_llm/demo/rag_demo/
🧬 Code graph analysis (1)
hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py (2)
hugegraph-llm/src/hugegraph_llm/flows/__init__.py (1)
  • FlowName (21-34)
hugegraph-llm/src/hugegraph_llm/models/llms/base.py (1)
  • BaseLLM (22-74)
🪛 GitHub Actions: Pylint
hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py

[warning] 174-174: Bad indentation. Found 10 spaces, expected 12 (bad-indentation)


[warning] 137-137: R1711: Useless return at end of function or method (useless-return)

🔇 Additional comments (4)
hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py (4)

1-13: 许可证头部已正确添加

Apache License 头部现已正确添加,符合项目标准格式。


14-27: 导入语句正确

所有导入均为必需且结构合理。


28-130: 提示词模板设计合理

两阶段提示词设计清晰:第一阶段进行流程分类,第二阶段提取参数。模板包含明确的角色定义、指令和示例。


227-237: 单例模式实现正确

使用双重检查锁定模式实现的线程安全单例,确保在多线程环境下只创建一个 IntentDetector 实例。实现标准且正确。


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the llm label Nov 15, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @weijinglin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the RAG system by implementing an intelligent, LLM-driven automatic flow routing and parameter extraction mechanism. This allows the system to dynamically select the optimal RAG strategy and graph query approach based on the user's natural language query, greatly improving usability and adaptability by reducing the need for manual configuration.

Highlights

  • Automatic RAG Flow Selection: Introduces an "auto mode" where an LLM intelligently routes user queries to the most suitable RAG (Retrieval-Augmented Generation) flow (e.g., graph-only, vector-only, hybrid, or raw LLM).
  • Intelligent Parameter Extraction: Implements LLM-based extraction of critical parameters, such as "gremlin_tmpl_num", from natural language queries to dynamically optimize graph search algorithms.
  • Two-Stage LLM Prompting Architecture: A new architecture for intent detection that first classifies the flow type and then extracts detailed parameters, improving reliability and token efficiency.
  • UI Auto Mode Toggle: Adds a new radio button in the Gradio UI for the RAG demo to switch between manual and automatic RAG flow selection, disabling manual options when auto mode is active.
  • Dependency Management Update: The "pycgraph" dependency is now explicitly pinned to version "3.2.2" and its direct Git repository reference has been removed from "pyproject.toml".
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has checked 372 files.

Valid Invalid Ignored Fixed
304 1 67 0
Click to see the invalid file list
  • hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py
Use this command to fix any missing license headers
```bash

docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header fix

</details>

@@ -0,0 +1,224 @@
import threading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import threading
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import threading

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an excellent new feature: an 'auto mode' for RAG use cases, driven by an LLM-based intent detector. This significantly improves usability by automatically selecting the appropriate RAG flow. The implementation is comprehensive, covering the intent detection logic, UI updates in Gradio, and necessary refactoring of flow definitions. My review focuses on improving code maintainability, robustness, and fixing a few minor issues. I've pointed out areas with code duplication and verbose logic in rag_block.py that could be refactored, a critical bug in the prompt formatting in intent_detector.py, and some opportunities to make the code more robust and consistent.

if flow in self.flow_message:
tool_descs.append(self.flow_message[flow]["desc"])
tools_str = "\n\n".join(tool_descs)
prompt = INTENT_DETECTOR_PROMPT.replace("{{tool_list}}", tools_str)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There's a typo in the placeholder name. The prompt INTENT_DETECTOR_PROMPT defines {{flow_list}}, but the code is trying to replace {{tool_list}}. This will prevent the prompt from being formatted correctly, causing the LLM to receive an incomplete list of available flows.

Suggested change
prompt = INTENT_DETECTOR_PROMPT.replace("{{tool_list}}", tools_str)
prompt = INTENT_DETECTOR_PROMPT.replace("{{flow_list}}", tools_str)

tool_result = await self.llm_client.agenerate(prompt=prompt)
tool_result = tool_result.strip()
# expected tool_result belong to [4 kinds of Flow]
detail = None if self.flow_message[tool_result] is None else self.flow_message[tool_result]["detail"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This line uses direct dictionary access self.flow_message[tool_result], which will raise a KeyError if tool_result from the LLM is not a valid flow name (e.g., if it returns 'none' or an unexpected value). It's safer to use .get() to handle cases where the key might not exist, preventing the application from crashing.

Suggested change
detail = None if self.flow_message[tool_result] is None else self.flow_message[tool_result]["detail"]
detail = self.flow_message.get(tool_result, {}).get("detail")

from hugegraph_llm.flows.scheduler import SchedulerSingleton
from hugegraph_llm.flows.intent_detector import IntentDetectorSingleton
import pandas as pd
import gradio as gr

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This import gradio as gr is a duplicate of the import on line 25. Please remove it. Additionally, SchedulerSingleton is imported on both line 26 and line 33, which should also be consolidated to avoid redundancy.

Comment on lines +199 to 323
if auto_mode:
intent_detector = IntentDetectorSingleton.get_instance()
result = await intent_detector.detect(
text,
[FlowName.RAG_RAW, FlowName.RAG_VECTOR_ONLY, FlowName.RAG_GRAPH_ONLY, FlowName.RAG_GRAPH_VECTOR],
)
if result["tool_name"] is None or result["tool_name"] == "none":
raise RuntimeError("No suitable flow found")
elif result["tool_name"] in [
FlowName.RAG_RAW,
FlowName.RAG_VECTOR_ONLY,
FlowName.RAG_GRAPH_ONLY,
FlowName.RAG_GRAPH_VECTOR
]:
flow_key = result["tool_name"]
else:
raise RuntimeError("Unsupported flow type")
async for res in scheduler.schedule_stream_flow(
flow_key,
query=text,
vector_search=result["parameters"].get("vector_search", vector_search)
if "parameters" in result
else vector_search,
graph_search=result["parameters"].get("graph_search", graph_search)
if "parameters" in result
else graph_search,
raw_answer=result["parameters"].get("raw_answer", False)
if "parameters" in result
else False,
vector_only_answer=result["parameters"].get("vector_only_answer", False)
if "parameters" in result
else False,
graph_only_answer=result["parameters"].get("graph_only_answer", False)
if "parameters" in result
else False,
graph_vector_answer=result["parameters"].get(
"graph_vector_answer", False
)
if "parameters" in result
else False,
graph_ratio=result["parameters"].get("graph_ratio", graph_ratio)
if "parameters" in result
else graph_ratio,
rerank_method=result["parameters"].get("rerank_method", rerank_method)
if "parameters" in result
else rerank_method,
near_neighbor_first=result["parameters"].get(
"near_neighbor_first", near_neighbor_first
)
if "parameters" in result
else near_neighbor_first,
custom_related_information=result["parameters"].get(
"custom_related_information", custom_related_information
)
if "parameters" in result
else custom_related_information,
answer_prompt=result["parameters"].get("answer_prompt", answer_prompt)
if "parameters" in result
else answer_prompt,
keywords_extract_prompt=result["parameters"].get(
"keywords_extract_prompt", keywords_extract_prompt
)
if "parameters" in result
else keywords_extract_prompt,
gremlin_tmpl_num=result["parameters"].get(
"gremlin_tmpl_num", gremlin_tmpl_num
)
if "parameters" in result
else gremlin_tmpl_num,
gremlin_prompt=result["parameters"].get(
"gremlin_prompt", gremlin_prompt
)
if "parameters" in result
else gremlin_prompt,
):
if res.get("switch_to_bleu"):
gr.Warning(
"Online reranker fails, automatically switches to local bleu rerank."
)
yield (
res.get("raw_answer", ""),
res.get("vector_only_answer", ""),
res.get("graph_only_answer", ""),
res.get("graph_vector_answer", ""),
)
else:
raise RuntimeError("Unsupported flow type")
if graph_vector_answer or (graph_only_answer and vector_only_answer):
flow_key = FlowName.RAG_GRAPH_VECTOR
elif vector_only_answer:
flow_key = FlowName.RAG_VECTOR_ONLY
elif graph_only_answer:
flow_key = FlowName.RAG_GRAPH_ONLY
elif raw_answer:
flow_key = FlowName.RAG_RAW
else:
raise RuntimeError("Unsupported flow type")

async for res in scheduler.schedule_stream_flow(
flow_key,
query=text,
vector_search=vector_search,
graph_search=graph_search,
raw_answer=raw_answer,
vector_only_answer=vector_only_answer,
graph_only_answer=graph_only_answer,
graph_vector_answer=graph_vector_answer,
graph_ratio=graph_ratio,
rerank_method=rerank_method,
near_neighbor_first=near_neighbor_first,
custom_related_information=custom_related_information,
answer_prompt=answer_prompt,
keywords_extract_prompt=keywords_extract_prompt,
gremlin_tmpl_num=gremlin_tmpl_num,
gremlin_prompt=gremlin_prompt,
):
if res.get("switch_to_bleu"):
gr.Warning(
"Online reranker fails, automatically switches to local bleu rerank."
async for res in scheduler.schedule_stream_flow(
flow_key,
query=text,
vector_search=vector_search,
graph_search=graph_search,
raw_answer=raw_answer,
vector_only_answer=vector_only_answer,
graph_only_answer=graph_only_answer,
graph_vector_answer=graph_vector_answer,
graph_ratio=graph_ratio,
rerank_method=rerank_method,
near_neighbor_first=near_neighbor_first,
custom_related_information=custom_related_information,
answer_prompt=answer_prompt,
keywords_extract_prompt=keywords_extract_prompt,
gremlin_tmpl_num=gremlin_tmpl_num,
gremlin_prompt=gremlin_prompt,
):
if res.get("switch_to_bleu"):
gr.Warning(
"Online reranker fails, automatically switches to local bleu rerank."
)
yield (
res.get("raw_answer", ""),
res.get("vector_only_answer", ""),
res.get("graph_only_answer", ""),
res.get("graph_vector_answer", ""),
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic within this try block can be refactored to improve readability and reduce code duplication.

  1. Verbose Parameter Passing: The method of passing parameters to scheduler.schedule_stream_flow in auto_mode is very verbose, with a redundant if "parameters" in result else ... check for every parameter. This can be simplified significantly.
  2. Code Duplication: The async for res in scheduler.schedule_stream_flow(...) loop and its body are duplicated in both the if auto_mode: and else: branches.

Consider refactoring to first determine the flow_key and a flow_params dictionary, and then use a single schedule_stream_flow call and loop. This will make the code much cleaner and easier to maintain.

Here is an example of how it could be refactored:

scheduler = SchedulerSingleton.get_instance()
flow_params = {}
flow_key = ""

if auto_mode:
    intent_detector = IntentDetectorSingleton.get_instance()
    result = await intent_detector.detect(
        text,
        [FlowName.RAG_RAW, FlowName.RAG_VECTOR_ONLY, FlowName.RAG_GRAPH_ONLY, FlowName.RAG_GRAPH_VECTOR],
    )
    tool_name = result.get("tool_name")
    if tool_name is None or tool_name == "none":
        raise RuntimeError("No suitable flow found")
    
    flow_key = tool_name
    if flow_key not in [FlowName.RAG_RAW, FlowName.RAG_VECTOR_ONLY, FlowName.RAG_GRAPH_ONLY, FlowName.RAG_GRAPH_VECTOR]:
        raise RuntimeError("Unsupported flow type")

    params = result.get("parameters", {})
    flow_params = {
        "query": text,
        "vector_search": params.get("vector_search", vector_search),
        "graph_search": params.get("graph_search", graph_search),
        # ... other params
    }
else:
    if graph_vector_answer or (graph_only_answer and vector_only_answer):
        flow_key = FlowName.RAG_GRAPH_VECTOR
    # ... other manual flow selections
    else:
        raise RuntimeError("Unsupported flow type")
    
    flow_params = {
        "query": text,
        "vector_search": vector_search,
        "graph_search": graph_search,
        # ... other params
    }

async for res in scheduler.schedule_stream_flow(flow_key, **flow_params):
    if res.get("switch_to_bleu"):
        gr.Warning(
            "Online reranker fails, automatically switches to local bleu rerank."
        )
    yield (
        res.get("raw_answer", ""),
        res.get("vector_only_answer", ""),
        res.get("graph_only_answer", ""),
        res.get("graph_vector_answer", ""),
    )

"desc": RAGGRAPHVECTOR_FLOW_DESC,
"detail": RAGGRAPHVECTOR_FLOW_DETAIL,
}
return

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

An explicit return at the end of an __init__ method is unnecessary in Python and can be removed.

# expected tool_result belong to [4 kinds of Flow]
detail = None if self.flow_message[tool_result] is None else self.flow_message[tool_result]["detail"]
if detail is None:
raise ValueError("LLM返回的flow类型不在支持的RAGFlow范围内!")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message is in Chinese. For consistency with the rest of the codebase and to make it accessible to a wider audience, it's better to use English for error messages.

Suggested change
raise ValueError("LLM返回的flow类型不在支持的RAGFlow范围内!")
raise ValueError("The flow type returned by the LLM is not in the supported RAGFlows.")

Comment on lines +32 to +46
RAGGRAPHONLY_FLOW_DESC = """
{
"name": "rag_graph_only",
"desc": "Graph-only retrieval augmented generation workflow. Answers are generated based solely on graph search results, without vector-based augmentation.",
}
"""

RAGGRAPHONLY_FLOW_DETAIL = """
{
"required_params": [
{"name": "query", "type": "str", "desc": "User question"},
{"name": "gremlin_tmpl_num", "type": "int", "desc": "Number of Gremlin templates to use. Set to 3 if the query contains clear graph query semantics that can be translated to Gremlin (such as finding relationships, paths, nodes, or graph traversal patterns). Set to -1 if the query semantics are ambiguous or cannot be clearly mapped to graph operations"},
]
}
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The JSON strings in RAGGRAPHONLY_FLOW_DESC and RAGGRAPHONLY_FLOW_DETAIL have trailing commas. While Python's json module might handle this, it is not valid in standard JSON. It's best practice to remove these trailing commas for correctness and compatibility with other JSON parsers.

Comment on lines +33 to +47
RAGGRAPHVECTOR_FLOW_DESC = """
{
"name": "rag_graph_vector",
"desc": "Hybrid graph and vector retrieval augmented generation workflow. Answers are generated by combining both graph and vector search results."
}
"""

RAGGRAPHVECTOR_FLOW_DETAIL = """
{
"required_params": [
{"name": "query", "type": "str", "desc": "User question"},
{"name": "gremlin_tmpl_num", "type": "int", "desc": "Number of Gremlin templates to use. Set to 3 if the query contains clear graph query semantics that can be translated to Gremlin (such as finding relationships, paths, nodes, or graph traversal patterns). Set to -1 if the query semantics are ambiguous or cannot be clearly mapped to graph operations"},
]
}
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The JSON strings in RAGGRAPHVECTOR_FLOW_DESC and RAGGRAPHVECTOR_FLOW_DETAIL have trailing commas. While Python's json module might handle this, it is not valid in standard JSON. It's best practice to remove these trailing commas for correctness and compatibility with other JSON parsers.

Comment on lines +27 to +40
RAGRAW_FLOW_DESC = """
{
"name": "rag_raw",
"desc": "Direct LLM-based question answering without external knowledge augmentation. Suitable for pure LLM scenarios."
}
"""

RAGRAW_FLOW_DETAIL = """
{
"required_params": [
{"name": "query", "type": "str", "desc": "User question"},
]
}
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The JSON strings in RAGRAW_FLOW_DESC and RAGRAW_FLOW_DETAIL have trailing commas. While Python's json module might handle this, it is not valid in standard JSON. It's best practice to remove these trailing commas for correctness and compatibility with other JSON parsers.

Comment on lines +29 to +42
RAGVECTORONLY_FLOW_DESC = """
{
"name": "rag_vector_only",
"desc": "Vector-only retrieval augmented generation workflow. Answers are generated based solely on vector search results, without graph-based augmentation."
}
"""

RAGVECTORONLY_FLOW_DETAIL = """
{
"required_params": [
{"name": "query", "type": "str", "desc": "User question"},
]
}
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The JSON strings in RAGVECTORONLY_FLOW_DESC and RAGVECTORONLY_FLOW_DETAIL have trailing commas. While Python's json module might handle this, it is not valid in standard JSON. It's best practice to remove these trailing commas for correctness and compatibility with other JSON parsers.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.py (1)

160-195: auto_mode 下的“至少选择一种模式”校验逻辑可以更清晰

rag_answer_streaming 在进入 auto 模式分支之前先做:

if raw_answer is False and not vector_search and not graph_search:
    gr.Warning("Please select at least one generate mode.")
    yield "", "", "", ""
    return

在 auto_mode 场景下,用户直觉上不需要再关心这些单选项;但如果用户之前把四个模式都点成 False,再打开 auto_mode,就会在这里被挡住,看起来像是自动模式失效。

建议:

  • 在 auto 模式下直接跳过这段校验,或者
  • 把提示信息改成更贴合 auto 模式,例如:“手动模式下请至少选择一种生成模式;自动模式会自动选择,无需设置”。

示例调整:

-    if raw_answer is False and not vector_search and not graph_search:
+    if (not auto_mode) and raw_answer is False and not vector_search and not graph_search:
         gr.Warning("Please select at least one generate mode.")
         yield "", "", "", ""
         return

这样可以避免 auto 模式下出现“请选择模式”的困惑提示。

🧹 Nitpick comments (5)
hugegraph-llm/src/hugegraph_llm/nodes/llm_node/schema_build.py (1)

27-91: 考虑为方法添加类型注解以提高 mypy 兼容性。

方法 node_init()operator_schedule() 缺少参数和返回值的类型注解。根据编码指南,应使用 mypy 进行类型检查。

建议为方法签名添加类型提示,例如:

def node_init(self) -> CStatus:
    ...

def operator_schedule(self, data_json: dict) -> dict:
    ...
hugegraph-llm/src/hugegraph_llm/nodes/common_node/merge_rerank_node.py (1)

57-59: 建议将导入移至模块级别以提高一致性。

CStatus 的导入位于异常处理块内部。虽然这样可以正常工作,但建议将其移至文件顶部的模块级别导入,以与代码库中其他文件保持一致。

可以应用以下 diff:

在文件顶部添加导入:

 from typing import Dict, Any
+from pycgraph import CStatus
 from hugegraph_llm.nodes.base_node import BaseNode

然后移除异常处理块中的导入:

         except ValueError as e:
             log.error("Failed to initialize MergeRerankNode: %s", e)
-            from pycgraph import CStatus
-
             return CStatus(-1, f"MergeRerankNode initialization failed: {e}")
hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.py (1)

19-19: RAGGRAPHONLY 描述字符串建议去掉多余逗号,保持合法 JSON(可选)

RAGGRAPHONLY_FLOW_DESC 目前是 JSON 结构的字符串,但在 "desc" 字段后面多了一个逗号,如果后续有代码尝试对该字符串做 json.loads,会直接解析失败。现在仅用于 LLM 提示问题不大,但建议顺手改成合法 JSON,和其它 *_FLOW_DESC 保持一致,避免未来踩坑。

 RAGGRAPHONLY_FLOW_DESC = """
 {
   "name": "rag_graph_only",
-  "desc": "Graph-only retrieval augmented generation workflow. Answers are generated based solely on graph search results, without vector-based augmentation.",
-}
+  "desc": "Graph-only retrieval augmented generation workflow. Answers are generated based solely on graph search results, without vector-based augmentation."
+}
 """

Also applies to: 32-46

hugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.py (2)

20-27: 导入语句有重复,可简化以避免风格工具告警

当前同一模块内多次导入相同对象,例如:

  • import gradio as gr 在行 25 和行 29 重复。
  • from hugegraph_llm.flows.scheduler import SchedulerSingleton 在行 26 和行 33 重复。

虽然不会影响运行,但会被 ruff/pylint 之类工具标为多余导入,建议保留一处即可,保持 import 区块整洁。

Also applies to: 28-34


199-273: auto_mode 分支整体路由思路合理,可考虑小幅收紧参数合并逻辑(可选)

自动模式分支中,通过 IntentDetectorSingleton 获取 tool_nameparameters,然后把 parameters 中的多个布尔标志和配置项透传到 schedule_stream_flow。整体设计符合“检测 + 流路由”的目标,不过有两点可以作为后续优化参考(非阻塞):

  1. result["parameters"] 的存在判断可以简化
    detect() 始终会返回 "parameters",因此每个参数都写成
    result["parameters"].get(..., default) if "parameters" in result else default
    有些冗余,可以直接用 .get

  2. 可以只允许 LLM 覆盖与 flow 本身强相关的开关
    例如 vector_search / graph_search / graph_*_answer 这些和 flow 类型强绑定的开关,让 LLM 决定是合理的;而 graph_ratiorerank_methodcustom_related_information 更偏 UI 配置,完全交给 LLM 可能会降低可预测性。
    可以考虑只从 flow_flags 合并布尔标志,其他字段仍然完全沿用用户在 UI 中的选择。

这些都不影响功能正确性,属于后续可以迭代的体验和可维护性优化点。

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 101f10f and 741c653.

📒 Files selected for processing (31)
  • hugegraph-llm/pyproject.toml (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.py (6 hunks)
  • hugegraph-llm/src/hugegraph_llm/flows/build_example_index.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/flows/build_schema.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/flows/build_vector_index.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/flows/import_graph_data.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/flows/prompt_generate.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.py (2 hunks)
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.py (2 hunks)
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_raw.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_vector_only.py (2 hunks)
  • hugegraph-llm/src/hugegraph_llm/flows/scheduler.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/flows/text2gremlin.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/nodes/base_node.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/nodes/common_node/merge_rerank_node.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/nodes/document_node/chunk_split.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/nodes/index_node/build_gremlin_example_index.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/nodes/index_node/gremlin_example_index_query.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/nodes/index_node/semantic_id_query_node.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/nodes/index_node/vector_query_node.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/schema_build.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/nodes/util.py (1 hunks)
  • hugegraph-llm/src/hugegraph_llm/state/ai_state.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
hugegraph-llm/**/*.py

📄 CodeRabbit inference engine (hugegraph-llm/AGENTS.md)

hugegraph-llm/**/*.py: Adhere to ruff code style for Python code
Type-check Python code with mypy
Keep each Python file under 600 lines for maintainability

Files:

  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/schema_build.py
  • hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py
  • hugegraph-llm/src/hugegraph_llm/nodes/index_node/vector_query_node.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_vector_index.py
  • hugegraph-llm/src/hugegraph_llm/flows/prompt_generate.py
  • hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_schema.py
  • hugegraph-llm/src/hugegraph_llm/flows/text2gremlin.py
  • hugegraph-llm/src/hugegraph_llm/nodes/index_node/semantic_id_query_node.py
  • hugegraph-llm/src/hugegraph_llm/flows/import_graph_data.py
  • hugegraph-llm/src/hugegraph_llm/state/ai_state.py
  • hugegraph-llm/src/hugegraph_llm/nodes/util.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_raw.py
  • hugegraph-llm/src/hugegraph_llm/flows/scheduler.py
  • hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.py
  • hugegraph-llm/src/hugegraph_llm/nodes/base_node.py
  • hugegraph-llm/src/hugegraph_llm/nodes/document_node/chunk_split.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.py
  • hugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py
  • hugegraph-llm/src/hugegraph_llm/nodes/common_node/merge_rerank_node.py
  • hugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.py
  • hugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_vector_only.py
  • hugegraph-llm/src/hugegraph_llm/nodes/index_node/build_gremlin_example_index.py
  • hugegraph-llm/src/hugegraph_llm/nodes/index_node/gremlin_example_index_query.py
  • hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_example_index.py
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/**/*.py

📄 CodeRabbit inference engine (hugegraph-llm/AGENTS.md)

Implement the Gradio UI application under src/hugegraph_llm/demo/rag_demo/

Files:

  • hugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.py
🧠 Learnings (18)
📓 Common learnings
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/operators/graph_rag_task.py : Maintain the Graph RAG pipeline in src/hugegraph_llm/operators/graph_rag_task.py
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/demo/rag_demo/**/*.py : Implement the Gradio UI application under src/hugegraph_llm/demo/rag_demo/
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/indices/**/*.py : Store vector and graph indexing code under src/hugegraph_llm/indices/

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/schema_build.py
  • hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py
  • hugegraph-llm/src/hugegraph_llm/nodes/index_node/vector_query_node.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_vector_index.py
  • hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_schema.py
  • hugegraph-llm/src/hugegraph_llm/flows/import_graph_data.py
  • hugegraph-llm/src/hugegraph_llm/nodes/util.py
  • hugegraph-llm/src/hugegraph_llm/flows/scheduler.py
  • hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.py
  • hugegraph-llm/src/hugegraph_llm/nodes/base_node.py
  • hugegraph-llm/src/hugegraph_llm/nodes/document_node/chunk_split.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.py
  • hugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.py
  • hugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.py
  • hugegraph-llm/src/hugegraph_llm/nodes/index_node/build_gremlin_example_index.py
  • hugegraph-llm/src/hugegraph_llm/nodes/index_node/gremlin_example_index_query.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_example_index.py
  • hugegraph-llm/pyproject.toml
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/operators/**/*.py : Put core processing pipelines under src/hugegraph_llm/operators/

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/schema_build.py
  • hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_vector_index.py
  • hugegraph-llm/src/hugegraph_llm/flows/prompt_generate.py
  • hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_schema.py
  • hugegraph-llm/src/hugegraph_llm/flows/text2gremlin.py
  • hugegraph-llm/src/hugegraph_llm/flows/import_graph_data.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.py
  • hugegraph-llm/src/hugegraph_llm/flows/scheduler.py
  • hugegraph-llm/src/hugegraph_llm/nodes/base_node.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.py
  • hugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.py
  • hugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_example_index.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/operators/graph_rag_task.py : Maintain the Graph RAG pipeline in src/hugegraph_llm/operators/graph_rag_task.py

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_vector_index.py
  • hugegraph-llm/src/hugegraph_llm/flows/prompt_generate.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_schema.py
  • hugegraph-llm/src/hugegraph_llm/flows/text2gremlin.py
  • hugegraph-llm/src/hugegraph_llm/flows/import_graph_data.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_raw.py
  • hugegraph-llm/src/hugegraph_llm/flows/scheduler.py
  • hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.py
  • hugegraph-llm/src/hugegraph_llm/nodes/document_node/chunk_split.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.py
  • hugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py
  • hugegraph-llm/src/hugegraph_llm/nodes/common_node/merge_rerank_node.py
  • hugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.py
  • hugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_vector_only.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_example_index.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/operators/gremlin_generate_task.py : Maintain the Text2Gremlin pipeline in src/hugegraph_llm/operators/gremlin_generate_task.py

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_vector_index.py
  • hugegraph-llm/src/hugegraph_llm/flows/prompt_generate.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_schema.py
  • hugegraph-llm/src/hugegraph_llm/flows/text2gremlin.py
  • hugegraph-llm/src/hugegraph_llm/flows/import_graph_data.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.py
  • hugegraph-llm/src/hugegraph_llm/flows/scheduler.py
  • hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py
  • hugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.py
  • hugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.py
  • hugegraph-llm/src/hugegraph_llm/nodes/index_node/build_gremlin_example_index.py
  • hugegraph-llm/src/hugegraph_llm/nodes/index_node/gremlin_example_index_query.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_example_index.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/**/*.py : Type-check Python code with mypy

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py
  • hugegraph-llm/src/hugegraph_llm/flows/prompt_generate.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_schema.py
  • hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py
  • hugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_example_index.py
  • hugegraph-llm/pyproject.toml
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/operators/kg_construction_task.py : Maintain the KG Construction pipeline in src/hugegraph_llm/operators/kg_construction_task.py

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_vector_index.py
  • hugegraph-llm/src/hugegraph_llm/flows/prompt_generate.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_schema.py
  • hugegraph-llm/src/hugegraph_llm/flows/text2gremlin.py
  • hugegraph-llm/src/hugegraph_llm/flows/import_graph_data.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.py
  • hugegraph-llm/src/hugegraph_llm/flows/scheduler.py
  • hugegraph-llm/src/hugegraph_llm/nodes/document_node/chunk_split.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py
  • hugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.py
  • hugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_example_index.py
📚 Learning: 2025-10-21T07:20:54.516Z
Learnt from: weijinglin
Repo: hugegraph/hugegraph-ai PR: 54
File: hugegraph-llm/src/hugegraph_llm/flows/rag_flow_raw.py:55-55
Timestamp: 2025-10-21T07:20:54.516Z
Learning: In hugegraph-llm flows, the `prepared_input.schema` field in RAG flows (rag_flow_raw.py, rag_flow_vector_only.py, rag_flow_graph_vector.py, rag_flow_graph_only.py) is intentionally assigned `huge_settings.graph_name` (a string graph name) instead of using `prepared_input.graph_name`. This is legacy design where the underlying Operator's schema field is polymorphic and accepts either JSON schema objects or graph name strings, branching internally based on content type. This pattern should not be flagged as incorrect.

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py
  • hugegraph-llm/src/hugegraph_llm/flows/build_schema.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_raw.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_vector_only.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/resources/demo/config_prompt.yaml : Keep prompt configuration in src/hugegraph_llm/resources/demo/config_prompt.yaml

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/flows/prompt_generate.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py
📚 Learning: 2025-06-25T09:50:06.213Z
Learnt from: day0n
Repo: hugegraph/hugegraph-ai PR: 16
File: hugegraph-llm/src/hugegraph_llm/config/models/base_prompt_config.py:124-137
Timestamp: 2025-06-25T09:50:06.213Z
Learning: Language-specific prompt attributes (answer_prompt_CN, answer_prompt_EN, extract_graph_prompt_CN, extract_graph_prompt_EN, gremlin_generate_prompt_CN, gremlin_generate_prompt_EN, keywords_extract_prompt_CN, keywords_extract_prompt_EN, doc_input_text_CN, doc_input_text_EN) are defined in the PromptConfig class in hugegraph-llm/src/hugegraph_llm/config/prompt_config.py, which inherits from BasePromptConfig, making these attributes accessible in the parent class methods.

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/flows/prompt_generate.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/api/**/*.py : Place FastAPI endpoint modules under src/hugegraph_llm/api/

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/config/**/*.py : Keep configuration management code under src/hugegraph_llm/config/

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/utils/**/*.py : Place utilities, logging, and decorators under src/hugegraph_llm/utils/

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py
  • hugegraph-llm/src/hugegraph_llm/nodes/util.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/demo/rag_demo/**/*.py : Implement the Gradio UI application under src/hugegraph_llm/demo/rag_demo/

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/flows/import_graph_data.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_raw.py
  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.py
  • hugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/**/*.py : Adhere to ruff code style for Python code

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_only.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py
📚 Learning: 2025-09-16T06:40:44.968Z
Learnt from: CR
Repo: hugegraph/hugegraph-ai PR: 0
File: hugegraph-llm/AGENTS.md:0-0
Timestamp: 2025-09-16T06:40:44.968Z
Learning: Applies to hugegraph-llm/src/hugegraph_llm/models/**/*.py : Implement LLM, embedding, and reranker models under src/hugegraph_llm/models/

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py
  • hugegraph-llm/src/hugegraph_llm/nodes/common_node/merge_rerank_node.py
  • hugegraph-llm/src/hugegraph_llm/nodes/index_node/build_gremlin_example_index.py
📚 Learning: 2025-06-25T09:45:10.751Z
Learnt from: day0n
Repo: hugegraph/hugegraph-ai PR: 16
File: hugegraph-llm/src/hugegraph_llm/config/models/base_prompt_config.py:100-116
Timestamp: 2025-06-25T09:45:10.751Z
Learning: In hugegraph-llm BasePromptConfig class, llm_settings is a runtime property that is loaded from config through dependency injection during object initialization, not a static class attribute. Static analysis tools may flag this as missing but it's intentional design.

Applied to files:

  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.py
  • hugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py
📚 Learning: 2025-05-27T06:55:13.779Z
Learnt from: cgwer
Repo: hugegraph/hugegraph-ai PR: 10
File: hugegraph-python-client/pyproject.toml:0-0
Timestamp: 2025-05-27T06:55:13.779Z
Learning: The hugegraph-python-client is a component within the hugegraph-ai project repository (apache/incubator-hugegraph-ai), not a standalone repository. When reviewing project URLs in pyproject.toml files within this project, they should point to the main hugegraph-ai repository.

Applied to files:

  • hugegraph-llm/pyproject.toml
🧬 Code graph analysis (2)
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/rag_block.py (3)
hugegraph-llm/src/hugegraph_llm/flows/scheduler.py (3)
  • SchedulerSingleton (179-189)
  • get_instance (184-189)
  • schedule_stream_flow (141-176)
hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py (3)
  • IntentDetectorSingleton (214-224)
  • get_instance (219-224)
  • detect (146-212)
hugegraph-llm/src/hugegraph_llm/flows/__init__.py (1)
  • FlowName (21-34)
hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py (2)
hugegraph-llm/src/hugegraph_llm/flows/__init__.py (1)
  • FlowName (21-34)
hugegraph-llm/src/hugegraph_llm/models/llms/base.py (1)
  • BaseLLM (22-74)
🪛 GitHub Actions: License header & 3rd-party check
hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py

[error] 1-1: license-eye header check failed: File does not have a valid license header. Command: 'license-eye -v info -c .licenserc.yaml header check'. Process exited with code 1.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.10)
🔇 Additional comments (30)
hugegraph-llm/src/hugegraph_llm/nodes/llm_node/schema_build.py (1)

18-18: 导入路径更新正确。

导入从 PyCGraph 改为 pycgraph 与 PR 目标一致,并符合 Python 模块命名约定。

hugegraph-llm/src/hugegraph_llm/nodes/document_node/chunk_split.py (1)

16-16: LGTM!导入路径修正正确。

此导入路径从 PyCGraph 更正为 pycgraph 符合 PR 目标,是代码库范围内的标准化修复的一部分。更改不影响功能逻辑。

hugegraph-llm/src/hugegraph_llm/nodes/util.py (1)

18-18: 导入路径更新正确

此更改将导入从 PyCGraph 更新为 pycgraph,与整个代码库的重构保持一致。变更简单直接,没有逻辑修改。

hugegraph-llm/src/hugegraph_llm/nodes/llm_node/prompt_generate.py (1)

16-16: 导入更正已验证无误。

验证结果显示:

  • 文件第 16 行已正确更改为 from pycgraph import CStatus
  • 代码库中所有 28 个文件的 pycgraph 导入均已统一使用小写形式,无遗留的 PyCGraph 引用
  • CStatus 在 prompt_generate.py 中正确导入并在第 41 行使用
  • pycgraph 已在 pyproject.toml 中声明为项目依赖

此项更改正确完整,无需进一步修改。

hugegraph-llm/src/hugegraph_llm/nodes/index_node/vector_query_node.py (1)

49-49: 导入路径修正已验证通过。

导入语句从 PyCGraph 修正为 pycgraph 符合本 PR 的全仓库范围重构目标。依赖验证确认:pycgraph==3.2.2 已在 ./hugegraph-llm/pyproject.toml 中正确声明和锁定,导入路径与依赖配置完全匹配。代码更改无误。

hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.py (1)

20-20: 导入路径修正无误,已验证。

脚本结果确认 pycgraph 模块正确导出了 CStatus。在整个代码库中至少 18 个文件已成功使用 from pycgraph import CStatus 进行导入,且均正常运行。此修改是代码库范围内从 PyCGraphpycgraph 的一致迁移,符合项目标准。

hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py (1)

18-18: 导入变更已验证无误,无遗漏问题。

根据脚本执行结果确认:

  1. 无任何遗留的旧 PyCGraph 导入——所有导入已完全更新为小写 pycgraph
  2. 整个代码库中 CStatus 的导入和使用完全一致(25+ 文件中的导入均为 from pycgraph import CStatus
  3. 文件 schema.py 在第 18 行的导入与其他文件完全一致,第 50、58 行的使用也验证无误
  4. 文件总行数 69 行,符合 600 行的代码可维护性要求

代码变更经验证已完成,无需进一步修改。

hugegraph-llm/src/hugegraph_llm/nodes/index_node/gremlin_example_index_query.py (2)

51-55: 确认 CStatus 的构造函数签名。

第 55 行使用 CStatus(-1, str(err)) 创建状态对象。请确保 pycgraph.CStatus 的构造函数签名与原 PyCGraph.CStatus 相同,接受状态码和错误消息作为参数。


20-20: 确认导入路径重构的一致性及 pycgraph==3.2.2 依赖项的可用性。

根据脚本执行结果验证:

  1. 导入路径重构一致性:项目内已完全迁移,无残留 from PyCGraph 导入,所有 26+ 个文件均已更新为 from pycgraph 格式,包括目标文件第 20 行的 from pycgraph import CStatus 变更。

  2. 依赖项声明hugegraph-llm/pyproject.toml 明确声明 "pycgraph==3.2.2" 作为项目依赖。

  3. 包可用性风险:Web 搜索未能定位 PyPI 上的 pycgraph 3.2.2 版本发布记录。需要确认:

    • 该版本是否为 Apache HugeGraph 项目专用的内部/私有包
    • 版本号 3.2.2 是否准确有效
    • 包是否已正确发布至项目依赖源

建议:在合并此变更前,请手动验证依赖项解析是否成功,并确认 pycgraph 包的实际版本可用性。

hugegraph-llm/src/hugegraph_llm/nodes/base_node.py (1)

17-17: 验证 pycgraph 依赖包的可用性和 API 兼容性。

CGraph 提供 Python 版本 PyCGraph,导入路径从 PyCGraph 更新为 pycgraph 看似是包名规范化。虽然 确认存在 GNode 和 CStatus 类,但无法通过公开资源验证 pycgraph 3.2.2 版本的具体可用性、API 兼容性或是否为简单重命名。

建议手动验证该包的实际可用性和与旧版本的兼容性,确保代码库运行正常。

hugegraph-llm/src/hugegraph_llm/nodes/llm_node/extract_info.py (1)

16-16: 导入路径更正已验证,此项更改符合项目规范。

此更改将导入从 PyCGraph 更正为 pycgraph,与 hugegraph-llm 的项目级导入规范化一致。CStatus 是 CGraph 框架中的核心类,用于函数返回状态处理,在文件的第 36 和 45 行用于错误处理,逻辑未受影响。

历史代码中使用 from PyCGraph import CStatus 导入,当前 PR 将其规范化为小写模块名称,符合 Python 命名约定。

hugegraph-llm/src/hugegraph_llm/nodes/index_node/semantic_id_query_node.py (1)

18-18: 导入路径更正正确,整个项目已系统性迁移。

验证结果:

  • 导入语句 from pycgraph import CStatus 在第18行正确
  • CStatus 在文件中正确使用(第44行、第83行作为状态返回值)
  • 项目内28个文件已系统性完成从 PyCGraphpycgraph 的迁移,无遗漏
  • 所有文件导入一致,说明这是有意的、完整的包名修正

需要注意的是: Web 搜索未能找到公开的 pycgraph 3.2.2 版本发布记录,pycgraph 包可能是 HugeGraph 项目的内部依赖。代码变更本身没有问题,但请确认你的开发环境已正确配置 pycgraph 3.2.2 依赖。

hugegraph-llm/src/hugegraph_llm/flows/scheduler.py (1)

18-18: LGTM!导入路径修正正确。

从 PyCGraph 到 pycgraph 的导入路径修正符合本 PR 的整体重构目标,与 pyproject.toml 中将 pycgraph 固定到 3.2.2 版本的更改一致。

hugegraph-llm/src/hugegraph_llm/flows/prompt_generate.py (1)

16-16: LGTM!导入路径修正正确。

导入路径从 PyCGraph 更正为 pycgraph,与整个代码库的重构保持一致。

hugegraph-llm/src/hugegraph_llm/state/ai_state.py (1)

17-17: LGTM!导入路径修正正确。

将 GParam 和 CStatus 的导入从 PyCGraph 更正为 pycgraph,与项目的依赖更新保持一致。

hugegraph-llm/src/hugegraph_llm/flows/rag_flow_graph_vector.py (2)

19-19: LGTM!导入路径修正正确。

导入路径已从 PyCGraph 更正为 pycgraph。


33-47: LGTM!流程描述常量添加得当。

新增的 RAGGRAPHVECTOR_FLOW_DESC 和 RAGGRAPHVECTOR_FLOW_DETAIL 常量为混合图向量检索流程提供了清晰的元数据描述。这些常量很可能被新的意图检测器用于自动模式的流程路由和参数提取。

gremlin_tmpl_num 参数的描述详细说明了何时使用值 3(清晰的图查询语义)和 -1(语义模糊),这有助于自动参数提取的准确性。

hugegraph-llm/src/hugegraph_llm/flows/rag_flow_raw.py (2)

19-19: LGTM!导入路径修正正确。

导入路径已从 PyCGraph 更正为 pycgraph。


27-40: LGTM!流程描述常量定义清晰。

新增的 RAGRAW_FLOW_DESC 和 RAGRAW_FLOW_DETAIL 常量准确描述了直接 LLM 问答流程,该流程不使用外部知识增强。required_params 中仅要求 query 参数,符合纯 LLM 场景的预期。

hugegraph-llm/src/hugegraph_llm/flows/rag_flow_vector_only.py (2)

19-19: LGTM!导入路径修正正确。

导入路径已从 PyCGraph 更正为 pycgraph。


29-42: LGTM!流程描述常量定义准确。

新增的 RAGVECTORONLY_FLOW_DESC 和 RAGVECTORONLY_FLOW_DETAIL 常量清晰描述了仅基于向量检索的流程,不使用图增强。required_params 中仅要求 query 参数,符合纯向量检索场景的预期。

hugegraph-llm/pyproject.toml (1)

65-65: pycgraph 3.2.2 版本验证完成,无问题发现。

验证结果确认:

  • pycgraph 3.2.2 版本存在于 PyPI 中,且为最新发行版本
  • 无已知安全漏洞

该依赖项版本固定方式恰当,确保了构建的可重现性。代码变更无需调整。

hugegraph-llm/src/hugegraph_llm/flows/build_vector_index.py (1)

18-18: GPipeline 导入路径切换为 pycgraph 看起来没问题

这里将 GPipeline 的导入从 PyCGraph 统一调整为 pycgraph,与本仓库其它 flow 文件保持一致,构建和运行逻辑未改变,这里可以接受。

hugegraph-llm/src/hugegraph_llm/flows/text2gremlin.py (1)

18-18: Text2GremlinFlow 使用 pycgraph.GPipeline 的修改是合理的

仅调整了 GPipeline 的导入来源为 pycgraph,与其它 flow 的统一改动一致,文本转 Gremlin 的流程本身未受影响,这里没有问题。

hugegraph-llm/src/hugegraph_llm/flows/update_vid_embeddings.py (1)

16-16: 导入路径更正正确。

与项目范围内的依赖调整保持一致,从 PyCGraph 更新为 pycgraph

hugegraph-llm/src/hugegraph_llm/flows/build_example_index.py (1)

19-19: 导入路径更新符合项目标准。

与整个代码库的 pycgraph 导入标准化保持一致。

hugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.py (1)

18-18: 导入更改正确。

pycgraph 导入路径已正确更新。

hugegraph-llm/src/hugegraph_llm/flows/build_schema.py (1)

18-18: 导入路径调整正确。

与项目依赖更新保持一致。

hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py (1)

17-17: 导入路径更新正确。

完成从 PyCGraphpycgraph 的标准化更新。

hugegraph-llm/src/hugegraph_llm/flows/import_graph_data.py (1)

19-19: 导入路径已正确验证。

pycgraph 3.2.2 版本已确认在 PyPI 上存在,项目依赖中精确指定为 pycgraph==3.2.2。导入语句 from pycgraph import GPipeline 正确,与代码库中其他 20+ 个文件的导入方式保持一致。GPipeline 类正确导出,line 49 的使用无误。

Comment on lines +15 to +32
INTENT_DETECTOR_PROMPT = """
# ROLE
You are an expert AI assistant that functions as a flow router. Your primary responsibility is to analyze a user's query and select the most appropriate flow from a provided list to handle the request.

# INSTRUCTIONS
1. Carefully examine the user's query to understand their underlying intent.
2. Review the list of `AVAILABLE_FLOWS`. For each flow, pay close attention to its `desc` (description).
3. Select the single best flow based on query characteristics:
- **Graph-only queries**: Use when the query focuses on relationships, connections, paths, network analysis, or graph traversal (e.g., "How are A and B connected?", "What's the shortest path between X and Y?", "Show me the network of relationships around Z")
- **Vector-only queries**: Use when the query seeks factual information, definitions, descriptions, or content similarity (e.g., "What kind of person is X?", "Tell me about Y", "Describe the characteristics of Z")
- **Hybrid queries**: Use when the query combines both relationship exploration AND factual retrieval, or when context from both graph structure and content semantics would enhance the answer
4. If no flow is suitable for the query, you MUST choose "none".
5. Your final output MUST be a single flow name string. Do not add any explanation or conversational text.

# AVAILABLE_FLOWS
Here is the list of flows you can choose from:
{{flow_list}}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

intent 检测流程存在占位符拼写错误与 “none” 结果未处理,可能导致运行时异常

当前实现有几处会直接影响自动模式的稳定性:

  1. Prompt 占位符名称不匹配(flow_list/tool_list)

    • 模板中使用的是 {{flow_list}}(见行 29-31),但在 detect() 中通过 INTENT_DETECTOR_PROMPT.replace("{{tool_list}}", tools_str) 进行替换(行 154)。
    • 结果是 {{flow_list}} 根本没有被替换,LLM 收不到可用的 flow 列表描述,路由效果会明显变差甚至随机。
  2. LLM 返回 "none" 时会抛出 KeyError

    • Prompt 明确要求“不适合时必须输出 none”,但逻辑里直接访问 self.flow_message[tool_result](行 159),当 tool_result == "none" 时这里会触发 KeyError,而不会优雅地回传“无合适 flow”。
    • 后续 rag_block.rag_answer_streaming 是按 tool_name == "none" 来判断 “No suitable flow found” 的,当前代码会在这之前就异常。
  3. flow 标志字典与返回值类型可以更一致(非阻塞)

    • flow_flags 的 key 使用 FlowName,而 tool_result 为字符串;虽然 FlowName(str, Enum) 在运行时可兼容,但从 mypy 和可读性角度,建议统一都用 FlowNamedetect() 返回的 tool_name 也用 FlowName,调用方再去比较。

建议一并修复,示例 diff 如下(仅示意):

@@
-        tool_descs = []
-        for flow in flow_list:
-            if flow in self.flow_message:
-                tool_descs.append(self.flow_message[flow]["desc"])
-        tools_str = "\n\n".join(tool_descs)
-        prompt = INTENT_DETECTOR_PROMPT.replace("{{tool_list}}", tools_str)
+        tool_descs: list[str] = []
+        for flow in flow_list:
+            if flow in self.flow_message:
+                tool_descs.append(self.flow_message[flow]["desc"])
+        tools_str = "\n\n".join(tool_descs)
+        prompt = INTENT_DETECTOR_PROMPT.replace("{{flow_list}}", tools_str)
@@
-        tool_result = await self.llm_client.agenerate(prompt=prompt)
-        tool_result = tool_result.strip()
-        # expected tool_result belong to [4 kinds of Flow]
-        detail = None if self.flow_message[tool_result] is None else self.flow_message[tool_result]["detail"]
-        if detail is None:
-          raise ValueError("LLM返回的flow类型不在支持的RAGFlow范围内!")
+        tool_result = (await self.llm_client.agenerate(prompt=prompt)).strip()
+
+        # LLM 可能返回 "none" 表示无合适 flow,直接短路返回
+        if tool_result == "none":
+            return {"tool_name": "none", "parameters": {}}
+
+        if tool_result not in self.flow_message:
+            raise ValueError("LLM返回的 flow 类型不在支持的 RAGFlow 范围内!")
+
+        # 这里仍然用字符串 key 访问,保持与 flow_message 定义一致
+        detail = self.flow_message[tool_result]["detail"]
@@
-        flow_flags = {
-            FlowName.RAG_RAW: {
+        flow_flags: dict[str, dict[str, Any]] = {
+            FlowName.RAG_RAW: {
@@
-        if tool_result in flow_flags:
-            result["parameters"].update(flow_flags[tool_result])
+        if tool_result in flow_flags:
+            result["parameters"].update(flow_flags[tool_result])

同时,结合实际调用处(rag_block.py 使用的是 FlowName.RAG_RAW 等),也可以考虑把 detect() 的签名改为 flow_list: list[FlowName],以及让 tool_name 返回 FlowName 类型,类型层面会更清晰。

Also applies to: 146-212

🤖 Prompt for AI Agents
In hugegraph-llm/src/hugegraph_llm/flows/intent_detector.py around lines 15-32
(and related logic in 146-212), the prompt placeholder and result handling are
wrong: update the prompt replacement to use "{{flow_list}}" (not
"{{tool_list}}") so the LLM receives the available flows; add explicit handling
for the LLM returning "none" (check if tool_result == "none" before indexing
into flow map and return a clear sentinel like None or FlowName.NONE or raise a
controlled exception instead of allowing a KeyError); and make the flow
flags/return values consistent by using the FlowName enum type end-to-end
(consider changing detect() signature to accept/return FlowName or
Optional[FlowName] to maintain type-safety and avoid string/enum mismatches).

# limitations under the License.

from PyCGraph import CStatus
from pycgraph import CStatus
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

导入路径修正正确。

将导入从 PyCGraph 更正为 pycgraph 符合 Python 包命名约定(使用小写)。此更改与整个 PR 中的一致性修正相符。

请验证 pycgraph 版本 3.2.2 是否存在且正确导出 CStatus


🏁 Script executed:

#!/bin/bash
# 描述:验证 pycgraph 包版本和 CStatus 导出

# 检查 pyproject.toml 中的版本固定
echo "=== Checking pycgraph version in pyproject.toml ==="
fd -e toml pyproject.toml --exec cat {} | grep -A 2 -B 2 pycgraph

# 验证 CStatus 在代码库中的所有使用
echo -e "\n=== Verifying CStatus usage across codebase ==="
rg -n --type py "CStatus" -C 2

Length of output: 20362


🌐 Web query:

pycgraph 3.2.2 pypi package

💡 Result:

I couldn't find a PyPI release pycgraph 3.2.2. The PyPI project shows latest release 3.1.2 (uploaded Sep 6, 2025). [1]

If you want to confirm or install the latest published version: pip index versions pycgraph or pip install pycgraph (or pip install pycgraph==3.1.2). [1]

Sources:
[1] PyPI — pycgraph project page.


无法安装:pycgraph 版本 3.2.2 不存在于 PyPI。

导入路径修正本身是正确的(pycgraph 正确导出 CStatus),但 pyproject.toml 中的版本固定存在问题:PyPI 上最新发布版本是 3.1.2(2025年9月6日),不存在版本 3.2.2。

需要修正:

  • 更新 pyproject.toml 中 pycgraph==3.2.2pycgraph==3.1.2(或确认 3.2.2 是否为预发版本)

此修正是在修改导入路径之前的前置条件,否则依赖安装将失败。

🤖 Prompt for AI Agents
In
hugegraph-llm/src/hugegraph_llm/nodes/index_node/build_gremlin_example_index.py
around line 16, the review notes that pycgraph==3.2.2 pinned in pyproject.toml
does not exist on PyPI (latest is 3.1.2), which will make dependency
installation fail before the import change; update pyproject.toml to pin
pycgraph==3.1.2 (or explicitly allow a prerelease if 3.2.2 is intentional) and
re-run dependency resolution to ensure the import change succeeds.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant