Skip to content

marcomcleo/HyDRA

 
 

Repository files navigation

✨ Towards Temporal Knowledge Graph Alignment in the Wild ✨

—————— Under Review at IEEE TPAMI ——————

📰 Introduction | 🏗️ Architecture | ⚙️ Installation | 🚀 Quick Start
📦 Datasets | 📖 Usage | 🔬 Reproducibility | 📜 License | 📬 Contact 📑 Citation


📰 Latest News

🆕 Updates 📅 Date 📝 Description
🎉 Code Release - HyDRA codebase and datasets now available

📰 Introduction

Temporal Knowledge Graph Alignment in the Wild (TKGA-Wild) addresses a critical challenge in temporal knowledge graph integration. To the best of our knowledge, this is the first work to formally formulate and solve this problem, which we term TKGA-Wild. This task presents unique challenges due to Multi-Scale Temporal Elements (i.e., multi-granular temporal coexistence and temporal span disparity) and Asymmetric Temporal Structures (i.e., heterogeneous temporal structures and temporal structural incompleteness) that are common in real-world scenarios.

We have officially introduced complete and high-quality TKGA-Wild benchmarks and proposed HyDRA, a new paradigm based on multi-scale hypergraph retrieval-augmented generation to systematically address the unique challenges of TKGA-Wild. HyDRA effectively captures complex structural dependencies, models multi-granular temporal features, mitigates temporal disparities, and introduces a new scale-weave synergy mechanism to coordinate information across different temporal scales.

🔥 Key Features

Feature Icon Description
Multi-Granularity Temporal Encoding 🔄 Captures temporal information at different scales (year, month, day)
Scale-Adaptive Entity Projection 📐 Adaptive entity projection across different graph scales and dimensions
Multi-Scale Hypergraph Retrieval 🔍 Efficient neural retrieval for hypergraph-based search
Scale-Weave Synergy 🔗 Coordinates information across different temporal scales
State-of-the-Art Performance 📈 Consistently outperforming 28 competitive baselines, achieving up to 43.3% improvement in Hits@1

🏗️ Architecture

HyDRA adopts a multi-scale hypergraph retrieval-augmented generation paradigm, comprising several key stages:

Stage 1: Encoding and Integration 🔄

Stage 2: Scale-Adaptive Entity Projection 📐

Stage 3: Multi-Scale Hypergraph Retrieval 🔍

Stage 4: Multi-Scale Fusion 🔗

📖 For detailed architecture descriptions and theoretical foundations, refer to the accompanying paper.


⚙️ Installation

📋 Prerequisites

First, install dependencies:

pip install -r requirements.txt

📦 Main Dependencies

Package Version Purpose
🐍 Python >= 3.7 Core language (tested on 3.8.10)
🔥 PyTorch >= 1.10.0 Deep learning framework
🔍 Faiss >= 1.7.0 Efficient similarity search (CPU/GPU)
📊 NumPy >= 1.21.0 Numerical computing
🐼 Pandas >= 1.3.0 Data manipulation
Tqdm >= 4.62.0 Progress bars
🌐 NetworkX >= 2.6.0 Graph analysis

💡 Note: For GPU-accelerated FAISS, use faiss-gpu instead of faiss-cpu.


📦 Datasets

For our newly proposed TKGA-Wild scenario, we introduce two novel benchmark datasets: BETA and WildBETA.

Dataset Description Fact Size
BETA Benchmark dataset for TKGA-Wild 362K+
WildBETA Extended benchmark dataset for TKGA-Wild 563K+

🔗 Download Links

Baidu Netdisk Google Drive

🔐 Baidu Netdisk: Extraction Code: pnax | Password: tkgawild

Dataset Format:

Take the dataset icews_wiki as an example, the folder data/icews_wiki/ should contain:

  • ent_ids_1: Entity IDs in source KG

  • ent_ids_2: Entity IDs in target KG

  • triples_1: Relation triples encoded by IDs in source KG

  • triples_2: Relation triples encoded by IDs in target KG

  • rel_ids_1: Relation IDs in the source KG

  • rel_ids_2: Relation IDs in the target KG

  • time_id: Time IDs in the source KG and the target KG

  • ref_ent_ids: All aligned entity pairs, list of pairs like (e_s \t e_t)

Note: The representative datasets used in experiments are derived from Dual-AMN, JAPE, GCN-Align, BETA, DAEA, AGROLD, DOREMUS and related works.


🚀 Quick Start

Step 1: Clone the Repository 📥

git clone https://github.com/eduzrh/HyDRA.git

cd HyDRA

Step 2: Prepare Datasets 📦

Download and extract datasets to ./data/

Step 3: Run the Main Experiment ▶️

python HyDRA_main.py --data_dir data/WildBETA

Step 4: View Results 📊

Metric Description
Hits@1 Proportion of correct alignments ranked first
Hits@10 Proportion in top-10 candidates
MRR Mean Reciprocal Rank

📖 Usage

Basic Usage

Run complete pipeline:

python HyDRA_main.py --data_dir data/WildBETA

Advanced Options

Configure training parameters:

python HyDRA_main.py --data_dir data/WildBETA \

    --cuda 0 \

    --epochs 1500 \

    --max_iterations 5 \

    --min_kg1_entities 100

Parameter Descriptions:

Parameter Type Default Description
--data_dir str Required Path to dataset directory
--cuda int 0 CUDA device ID for training
--epochs int 500 Number of training epochs for encoding stage
--max_iterations int 3 Maximum pipeline iterations
--min_kg1_entities int 50 Minimum entities threshold for stopping

Multi-Granularity Time Modeling

HyDRA supports multi-granularity temporal modeling (year and month levels) to handle Multi-Granular Temporal Coexistence. This feature can be enabled through the encoding stage configuration.


🔬 Reproducibility

We are committed to ensuring full reproducibility of our results. The following resources are provided:

📋 Experimental Configuration

  • Hyperparameters: All hyperparameter settings are documented in the code and can be configured via command-line arguments

  • Random Seeds: Seed configurations are embedded in the training scripts for reproducibility

  • Environment: Tested on Python 3.8.10 with dependencies as specified in requirements.txt

📊 Reproducing Main Results

To reproduce the main experimental results reported in the paper:

  1. Download datasets following the format described in the Datasets section

  2. Run the complete pipeline with default settings:

python HyDRA_main.py --data_dir data/WildBETA
  1. Evaluate results using the output files in data/icews_wiki/message_pool/

🏗️ Code Organization

The codebase is organized into modular components for clarity:

  • encoding_and_integration/: Multi-granularity temporal entity encoding and integration

  • scale_adaptive_entity_projection/: Relation alignment and entity projection

  • multi_scale_hypergraph_retrieval/: Neural retrieval and hypergraph decomposition

  • multi_scale_fusion/: Multi-scale fusion and alignment refinement

  • HyDRA_main.py: Main pipeline orchestrator

📝 Documentation

  • Comprehensive inline code comments explaining key design decisions

  • Clear module structure with standardized naming conventions

  • This README with step-by-step usage instructions


📊 Evaluation Metrics

We employ standard knowledge graph alignment metrics for transparency and comparability:

  • Hits@1: Proportion of correct alignments ranked first

  • Hits@10: Proportion of correct alignments in top-10 candidates

  • MRR (Mean Reciprocal Rank): Average reciprocal rank of correct alignments

📜 License

MIT License - Copyright notices preserved.

📬 Contact

Responses targeted within 2-3 business days.

📑 Citation

If you find this work helpful for your research or applications, we would appreciate it if you could cite the following paper:

@article{DBLP:journals/corr/abs-2507-14475,
  author       = {Runhao Zhao and
                  Weixin Zeng and
                  Wentao Zhang and
                  Xiang Zhao and
                  Jiuyang Tang and
                  Lei Chen},
  title        = {Towards Temporal Knowledge Graph Alignment in the Wild},
  journal      = {CoRR},
  volume       = {abs/2507.14475},
  year         = {2025},
  url          = {https://doi.org/10.48550/arXiv.2507.14475},
  doi          = {10.48550/ARXIV.2507.14475},
  eprinttype   = {arXiv},
  eprint       = {2507.14475}
}

🔗 References

🙏 Acknowledgement

The following open source projects were partially referenced in this work. We sincerely appreciate their contributions:

Dual-AMN, JAPE, GCN-Align, Simple-HHEA, BETA, Dual-Match, Faiss, NetworkX, AdaCoAgentEA, DAEA, AGROLD, DOREMUS


This repository corresponds to the paper Towards Temporal Knowledge Graph Alignment in the Wild (under review at IEEE TPAMI), and is an extension of our previous work BETA.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%