Skip to content

Merge entities #1875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

majidsh97
Copy link

@majidsh97 majidsh97 commented Apr 10, 2025

Description

This pull request introduces an optional workflow called merge_entities, which can be run after the extract_graph workflow. It aims to merge duplicate or near-duplicate entities (e.g., car and cars, or PCA and principal component analysis) in the entity and relationship tables.

Motivation

Currently, Graphrag may extract entities that are semantically similar but not identical. These duplicates increase the number of sparse or fragmented nodes in the knowledge graph and may negatively affect community detection and other downstream tasks.

By merging these entities, the graph becomes more semantically compact and meaningful, with improved structure and potentially better community coherence.

I created a graph about the soldering process. In this graph, You can see that without merging entities "Increased board complexity" was a separate fragment, and no community report was created but after merging entities, it is connected to the main node "soldering" and a community is created.

image
image
image

Proposed Changes
Add a new optional merge_entities workflow
Add config for merge_entities workflow (i.e. enable: true/false, ....)
Add workflow to default workflows
Add merge_entities prompt
Add a JSON log file of llm output to the output folder

Checklist

  • ✅ I have tested these changes locally.
  • ✅ I have reviewed the code changes.
  • ❌ I have updated the documentation (if necessary).
  • ❌ I have added appropriate unit tests (if applicable).

I really appreciate it if you provide me with some feedback and if you think this is a good feature I will work on document and unit tests.

Here are some examples of merged entities:

SOLDER
Merged from: SOLDER, MOLTEN SOLDER, SOLDER JOINTS, SOLDER JOINT, SOLDERED JOINT

CLEANING
Merged from: CLEANING, CLEANING PROCESSES, CLEANING PROCESS

WAVE SOLDERING
Merged from: WAVE SOLDERING, CS (WAVE SOLDERING) PROCESS

MACHINE SOLDERING
Merged from: MACHINE SOLDERING, SOLDERING MACHINE

@majidsh97 majidsh97 requested review from a team as code owners April 10, 2025 20:57
@majidsh97
Copy link
Author

@microsoft-github-policy-service agree

@majidsh97 majidsh97 changed the title Majid contribution Merge entities Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant