Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add document structure into GraphRAG #2033

Merged
merged 43 commits into from
Oct 18, 2024

Conversation

KingSkyLi
Copy link
Contributor

image

@Aries-ckt
Copy link
Collaborator

it looks like db upsert error.

 document embedding, failed:xx.txt, Query execution failed: {code: ParserException} {message: ParserException: line 3:12 no

 viable alternative at input 'CALL db.upsertVertex("chunk", [{id: "62c3475f-3ecf-4520-ae97-88336aaca5fe", name: 

"none_header_chunk", content: d748118572'}

@fanzhidongyzby fanzhidongyzby changed the title Add the document structure of the graph. feat: add document structure into GraphRAG Sep 27, 2024
@github-actions github-actions bot added the enhancement New feature or request label Sep 27, 2024
@fanzhidongyzby fanzhidongyzby added the GraphRAG Module: GraphRAG label Sep 27, 2024
dbgpt/storage/knowledge_graph/community_summary.py Outdated Show resolved Hide resolved
dbgpt/storage/graph_store/graph.py Outdated Show resolved Hide resolved
dbgpt/storage/knowledge_graph/knowledge_graph.py Outdated Show resolved Hide resolved
dbgpt/rag/transformer/graph_extractor.py Outdated Show resolved Hide resolved
dbgpt/storage/graph_store/tugraph_store.py Outdated Show resolved Hide resolved
dbgpt/storage/graph_store/tugraph_store.py Outdated Show resolved Hide resolved
dbgpt/storage/knowledge_graph/community_summary.py Outdated Show resolved Hide resolved
@Appointat
Copy link
Contributor

@KingSkyLi Hi, I was assigned to help review this pr, could you please explain what kind of the doc structure needed to be added in the graph? Thanks a lot!

@Appointat Appointat mentioned this pull request Oct 17, 2024
6 tasks
@Aries-ckt
Copy link
Collaborator

@KingSkyLi @Appointat
i have met the problem when upload file.

2024-10-17 18:06:45 B-V0ECMD6R-0244.local dbgpt.serve.rag.service.service[90267] ERROR document embedding, failed:读写分离.md, 'Vertex' object has no attribute 'type'

Copy link
Collaborator

@fanzhidongyzby fanzhidongyzby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good refactoring work, although there are still some further improvement points.

dbgpt/storage/graph_store/graph.py Outdated Show resolved Hide resolved
dbgpt/storage/graph_store/graph.py Outdated Show resolved Hide resolved
dbgpt/storage/graph_store/graph.py Show resolved Hide resolved
dbgpt/storage/graph_store/tugraph_store.py Show resolved Hide resolved
dbgpt/storage/graph_store/tugraph_store.py Outdated Show resolved Hide resolved
dbgpt/storage/knowledge_graph/community_summary.py Outdated Show resolved Hide resolved
dbgpt/storage/knowledge_graph/community_summary.py Outdated Show resolved Hide resolved
dbgpt/storage/knowledge_graph/community_summary.py Outdated Show resolved Hide resolved
dbgpt/storage/knowledge_graph/community_summary.py Outdated Show resolved Hide resolved
@Aries-ckt
Copy link
Collaborator

Test Success
image

image

Aries-ckt
Aries-ckt previously approved these changes Oct 18, 2024
Copy link
Collaborator

@Aries-ckt Aries-ckt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@Appointat
Copy link
Contributor

image image

Appointat and others added 5 commits October 18, 2024 15:19
Co-authored-by: tpoisonooo <khj.application@aliyun.com>
Co-authored-by: vritser <vritser@163.com>
Copy link
Collaborator

@csunny csunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@Aries-ckt Aries-ckt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Aries-ckt Aries-ckt merged commit 88e3d12 into eosphoros-ai:main Oct 18, 2024
4 checks passed
@Aries-ckt Aries-ckt added the hacktoberfest-accepted hacktoberfest-accepted label Oct 18, 2024
Copy link
Collaborator

@fanzhidongyzby fanzhidongyzby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix comments below by another pr


def delete_graph(self, graph_name: str) -> None:
"""Delete a graph."""
"""Delete a graph in the Neo4j database if it exists."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


DOCUMENT = "document"
CHUNK = "chunk"
ENTITY = "entity" # view as general vertex in the general case
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be described as default node type in knowledge graph

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

DOCUMENT = "document"
CHUNK = "chunk"
ENTITY = "entity" # view as general vertex in the general case
RELATION = "relation" # view as general edge in the general case
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be described as default edge type in knowledge graph

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


def is_edge(self) -> bool:
"""Check if the element is an edge."""
return not self.is_vertex()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not vertex != edge, enumerate all valid edge types here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


async def discover_communities(self, **kwargs) -> List[str]:
"""Run community discovery with leiden."""
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return []

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

)

for graph in graphs:
graph_of_all.upsert_graph(graph)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update graph edge _chunk_id

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


self._graph_store_apdater.upsert_graph(graph_of_all)

# use asyncio.gather
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unused comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for graph in graphs:
self._graph_store.insert_graph(graph)
# Support graph search by the document and the chunks
if self._graph_store.get_config().enable_document_graph:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this part of code into _parse_chunks, and rename it to load_document_graph(chunks) -> List[Chunk]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


subgraph_for_doc = self._graph_store_apdater.explore(
subs=keywords_for_document_graph,
limit=5,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use config: KNOWLEDGE_GRAPH_CHUNK_SEARCH_TOP_SIZE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if not subs:
return MemoryGraph()

if depth is None or depth < 0 or depth > self.MAX_HIERARCHY_LEVEL:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depth = 3 by default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Appointat Appointat deleted the feature-textlink branch November 12, 2024 03:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request GraphRAG Module: GraphRAG hacktoberfest hacktoberfest-accepted hacktoberfest-accepted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][GraphRAG] Knowledge graph extraction supports document structure information
5 participants