Skip to content

πŸ‘Ύ A comprehensive knowledge graph built from digimon.net/reference to analyze relationships between Digimon based on their characteristics, evolution patterns, and shared attributes.

License

Notifications You must be signed in to change notification settings

Ricoledan/digimon-knowledge-graph

Repository files navigation

Digimon Knowledge Graph Project

A comprehensive knowledge graph built from digimon.net/reference to analyze relationships between Digimon based on their characteristics, evolution patterns, and shared attributes.

Project Overview

What It Does

This project creates a searchable, analyzable network of all Digimon and their relationships by:

  1. Collecting - Scraping comprehensive data from the official Japanese Digimon reference
  2. Translating - Converting Japanese content to English for accessibility
  3. Structuring - Parsing unstructured HTML into organized data
  4. Connecting - Building a graph database of relationships
  5. Analyzing - Discovering patterns and insights through network analysis

Goals

  • Comprehensive Data Collection: Capture all 1,249+ Digimon with their complete profiles
  • Relationship Mapping: Identify evolution chains, type similarities, and shared attributes
  • Pattern Discovery: Uncover hidden connections and clustering in the Digimon universe
  • Research Platform: Provide a queryable database for fans and researchers
  • Technical Demonstration: Showcase modern data engineering practices

Expected Outcomes

  • Complete Digimon Database: Neo4j graph with all Digimon as nodes
  • Relationship Network: Edges representing evolutions, shared types, attributes, and moves
  • Analytical Insights: Statistics on type distributions, evolution patterns, and network centrality
  • Visual Reports: Network visualizations and analysis charts
  • Query Interface: Cypher queries for exploring specific relationships

Documentation

Analysis Documentation

Analysis Notebooks Overview

  1. Data Exploration & Profiling: Dataset statistics and quality assessment
  2. Evolution Network Analysis: Evolution chains and branching patterns
  3. Type-Attribute Correlation: Statistical relationships and pattern mining
  4. Move Network Analysis: Move-based connections and clustering
  5. Community Detection: Graph clustering and natural groupings
  6. Centrality & Influence: Network importance metrics
  7. Machine Learning: Predictive models with 85%+ accuracy
  8. Recommendation System: Similarity metrics and team optimization

Architecture

System Architecture

Overall Architecture

The system follows a modular pipeline architecture where each component has a specific responsibility in the data processing flow.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 β”‚     β”‚                 β”‚     β”‚                 β”‚
β”‚  digimon.net    │────▢│   Scraper       │────▢│  Raw HTML       β”‚
β”‚  (Data Source)  β”‚     β”‚   (Async)       β”‚     β”‚  Storage        β”‚
β”‚                 β”‚     β”‚                 β”‚     β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                          β”‚
                                                          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 β”‚     β”‚                 β”‚     β”‚                 β”‚
β”‚  Translation    │◀────│   Parser        │◀────│  Structured     β”‚
β”‚  (Google API)   β”‚     β”‚   (BS4)         β”‚     β”‚  JSON Data      β”‚
β”‚                 β”‚     β”‚                 β”‚     β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 β”‚     β”‚                 β”‚     β”‚                 β”‚
β”‚  Neo4j Graph    │◀────│   Loader        β”‚     β”‚   Analysis      β”‚
β”‚  Database       β”‚     β”‚   (py2neo)      │────▢│   (NetworkX)    β”‚
β”‚                 β”‚     β”‚                 β”‚     β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow Pipeline

This diagram shows how data flows through the system from source to analysis, including all intermediate storage layers.

flowchart LR
    subgraph DS["Data Sources"]
        A[digimon.net/reference]
    end
    
    subgraph DP["Data Pipeline"]
        B[Scraper<br/>BeautifulSoup4]
        C[Parser<br/>HTML β†’ JSON]
        D[Translator<br/>JP β†’ EN]
        E[Loader<br/>JSON β†’ Neo4j]
    end
    
    subgraph ST["Storage"]
        F[(Raw HTML<br/>Files)]
        G[(Parsed JSON<br/>Files)]
        H[(Translated<br/>JSON)]
        I[(Neo4j<br/>Graph DB)]
    end
    
    subgraph AN["Analysis"]
        J[NetworkX<br/>Analyzer]
        K[Notebooks<br/>& Visualizations]
    end
    
    A -->|HTTP Requests| B
    B -->|Save| F
    F -->|Read| C
    C -->|Save| G
    G -->|Read| D
    D -->|Cache| H
    H -->|Read| E
    E -->|Import| I
    I -->|Query| J
    J -->|Generate| K
    
    style DS fill:#666,stroke:#333,stroke-width:2px,color:#fff
    style DP fill:#666,stroke:#333,stroke-width:2px,color:#fff
    style ST fill:#666,stroke:#333,stroke-width:2px,color:#fff
    style AN fill:#666,stroke:#333,stroke-width:2px,color:#fff
    style A fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff
    style B fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff
    style C fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff
    style D fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff
    style E fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff
    style F fill:#444,stroke:#666,stroke-width:1px,color:#ccc
    style G fill:#444,stroke:#666,stroke-width:1px,color:#ccc
    style H fill:#444,stroke:#666,stroke-width:1px,color:#ccc
    style I fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff
    style J fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff
    style K fill:#444,stroke:#666,stroke-width:1px,color:#ccc
Loading

System Components

This diagram illustrates the modular architecture showing how the CLI interface connects to core modules and infrastructure.

graph TB
    subgraph CI["CLI Interface"]
        CLI[ygg CLI<br/>Click Framework]
    end
    
    subgraph CM["Core Modules"]
        SCR[Scraper Module<br/>β€’ Rate Limiting<br/>β€’ Async Support<br/>β€’ Error Handling]
        PRS[Parser Module<br/>β€’ BeautifulSoup4<br/>β€’ CSS Selectors<br/>β€’ Data Extraction]
        TRN[Translator Module<br/>β€’ Google Translate<br/>β€’ Caching System<br/>β€’ Batch Processing]
        LDR[Loader Module<br/>β€’ Neo4j Driver<br/>β€’ Schema Creation<br/>β€’ Relationship Building]
        ANL[Analyzer Module<br/>β€’ NetworkX<br/>β€’ Graph Algorithms<br/>β€’ Statistics]
    end
    
    subgraph IN["Infrastructure"]
        NEO[Neo4j Database<br/>Community Edition]
        FS[File System<br/>β€’ HTML Storage<br/>β€’ JSON Storage<br/>β€’ Cache Files]
    end
    
    CLI --> SCR
    CLI --> PRS
    CLI --> TRN
    CLI --> LDR
    CLI --> ANL
    
    SCR --> FS
    PRS --> FS
    TRN --> FS
    LDR --> NEO
    ANL --> NEO
    
    style CI fill:#666,stroke:#333,stroke-width:2px,color:#fff
    style CM fill:#666,stroke:#333,stroke-width:2px,color:#fff
    style IN fill:#666,stroke:#333,stroke-width:2px,color:#fff
    style CLI fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff
    style SCR fill:#444,stroke:#666,stroke-width:1px,color:#ccc
    style PRS fill:#444,stroke:#666,stroke-width:1px,color:#ccc
    style TRN fill:#444,stroke:#666,stroke-width:1px,color:#ccc
    style LDR fill:#444,stroke:#666,stroke-width:1px,color:#ccc
    style ANL fill:#444,stroke:#666,stroke-width:1px,color:#ccc
    style NEO fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff
    style FS fill:#444,stroke:#666,stroke-width:1px,color:#ccc
Loading

Data Flow

  1. Data Collection Phase

    • API fetcher retrieves list of all Digimon URLs
    • Async scraper downloads HTML pages with rate limiting
    • Raw HTML and images stored locally
  2. Processing Phase

    • Parser extracts structured data from HTML
    • Identifies Japanese/English names, types, attributes, moves
    • Saves as JSON with consistent schema
  3. Translation Phase

    • Translates Japanese profile text to English
    • Uses caching to avoid duplicate API calls
    • Preserves original Japanese for reference
  4. Graph Construction Phase

    • Creates nodes for Digimon, Types, Attributes, Moves
    • Establishes relationships between entities
    • Indexes for efficient querying
  5. Analysis Phase

    • Network analysis identifies central Digimon
    • Community detection finds clusters
    • Evolution chain analysis
    • Statistical reports generation

Key Components

  • Scraper (src/scraper/): Async web scraping with robots.txt compliance
  • Parser (src/parser/): BeautifulSoup-based HTML parsing
  • Translator (src/processor/): Google Translate API integration with caching
  • Graph Loader (src/graph/): Neo4j database population
  • Analyzer (src/analysis/): NetworkX-based graph analysis
  • CLI (yggdrasil_cli.py): Unified command-line interface

Quick Start

# Clone the repository
git clone https://github.com/yourusername/project-yggdrasil.git
cd project-yggdrasil

# Enter Nix development environment
nix develop

# Install the CLI
pip install -e .

# Start Neo4j and run full pipeline
ygg start
ygg run

That's it! These commands start Neo4j and run the entire pipeline.

Prerequisites

  • Docker & Docker Compose
  • Python 3.11+
  • One of: Nix (recommended), Poetry, or standard pip/venv

Environment Setup

Option 1: Nix (Recommended)

# Install Nix if you haven't already
curl -L https://nixos.org/nix/install | sh

# Enable flakes (add to ~/.config/nix/nix.conf)
experimental-features = nix-command flakes

# Enter development shell
nix develop

# Or with direnv
direnv allow

Option 2: Poetry

# Install Poetry
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install

# Activate shell
poetry shell

Option 3: pyenv + virtualenv

# Install Python 3.11 with pyenv
pyenv install 3.11.8
pyenv local 3.11.8

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Option 4: Standard virtualenv

# Create virtual environment
python3.11 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Complete Pipeline

# Run everything at once
ygg run

# Or run individual steps
ygg scrape       # Scrape data
ygg parse        # Parse HTML to JSON  
ygg translate    # Translate to English
ygg load         # Load into Neo4j
ygg analyze      # Run analysis

Typical Workflows

First Time Setup

# 1. Clone and enter project
git clone https://github.com/yourusername/project-yggdrasil.git
cd project-yggdrasil

# 2. Enter Nix environment (installs Python, dependencies, etc.)
nix develop

# 3. Install the CLI tool
pip install -e .

# 4. Start Neo4j
ygg start

# 5. Run the full pipeline
ygg run

Returning to the Project

# 1. Enter project and Nix environment
cd project-yggdrasil
nix develop  # or use direnv

# 2. Check current status
ygg status

# 3. Start Neo4j if needed
ygg start

# 4. Continue where you left off
ygg run  # or specific step like 'ygg translate'

Common Scenarios

Scenario: Scraping failed midway

# Check what was scraped
ygg status

# Clean up partial data
ygg prune --keep-cache

# Restart scraping
ygg scrape --fetch-api

Scenario: Want to test with a small dataset

# Scrape just a few pages for testing
python -m src.scraper.main --limit 10

# Then run the rest of the pipeline
ygg parse
ygg translate
ygg load
ygg analyze

Scenario: Need to restart from scratch

# Stop Neo4j
ygg stop

# Clean everything including Neo4j database
ygg prune --include-neo4j

# Start fresh
ygg start
ygg run

Scenario: Just want to explore the data

# Make sure Neo4j is running
ygg start

# Open Neo4j Browser
# Go to: http://localhost:7474
# Login: neo4j / digimon123

# Example queries:
# - MATCH (d:Digimon) RETURN d LIMIT 25
# - MATCH (d:Digimon {name_en: "Agumon"})-[r]->(other) RETURN d, r, other

Troubleshooting

Issue: "command not found: ygg"

# Make sure you're in Nix environment
nix develop

# Reinstall the CLI
pip install -e .

Issue: Scraping shows "success=0"

# The save_html fix might not be applied
pip install -e . --force-reinstall --no-deps

# Clean and restart
ygg prune --keep-cache
ygg scrape --fetch-api

Issue: Neo4j won't start

# Check if Docker is running
docker ps

# Check logs
ygg logs

# Try manual start
docker-compose up -d

Issue: Translation taking too long

# Translation uses caching, so you can safely interrupt (Ctrl+C)
# and resume later - it won't retranslate cached items
ygg translate

Time Estimates

  • Scraping: ~40-50 minutes for all 1,249 Digimon
  • Parsing: ~5 minutes
  • Translation: ~60-90 minutes (first time, much faster with cache)
  • Loading: ~5 minutes
  • Analysis: ~1 minute
  • Total: ~2-3 hours for complete pipeline

Analysis Methodology

Statistical Methods

  • Chi-Square Tests: Testing independence between type and attribute distributions
  • CramΓ©r's V: Measuring association strength in categorical variables
  • Markov Chains: Modeling evolution transition probabilities
  • Permutation Tests: Validating network properties against random models

Network Analysis Algorithms

  • Centrality Measures: Degree, Betweenness, Closeness, Eigenvector, PageRank
  • Community Detection: Louvain, Label Propagation, Spectral Clustering
  • Path Analysis: Shortest paths, evolution chains, cycle detection
  • Graph Embeddings: Node2Vec, DeepWalk for similarity computation

Machine Learning Approaches

  • Classification: Random Forest, XGBoost, Neural Networks for type/attribute prediction
  • Link Prediction: Graph Neural Networks for evolution prediction
  • Feature Engineering: Graph features, text embeddings, move similarity
  • Model Validation: Cross-validation, learning curves, SHAP interpretability

Expected Insights

  • Network Properties: Small-world network with diameter 6-10, scale-free distribution
  • Evolution Patterns: 2-4 paths per Digimon, 72% type stability through evolution
  • Community Structure: 8-12 natural communities aligned with thematic groups
  • Predictive Power: 85%+ accuracy in type prediction using graph features

Project Structure

project-yggdrasil/
β”œβ”€β”€ src/                    # Source code
β”‚   β”œβ”€β”€ scraper/           # Web scraping & API integration
β”‚   β”‚   β”œβ”€β”€ fetcher.py     # Async HTML scraper
β”‚   β”‚   β”œβ”€β”€ api_fetcher.py # API endpoint discovery
β”‚   β”‚   └── robots_checker.py # Robots.txt compliance
β”‚   β”œβ”€β”€ parser/            # HTML parsing & data extraction
β”‚   β”‚   β”œβ”€β”€ html_parser.py # BeautifulSoup parser
β”‚   β”‚   └── main.py        # Parser orchestration
β”‚   β”œβ”€β”€ processor/         # Data processing & translation
β”‚   β”‚   β”œβ”€β”€ translator.py  # Google Translate integration
β”‚   β”‚   └── main.py        # Processing pipeline
β”‚   β”œβ”€β”€ graph/             # Neo4j database layer
β”‚   β”‚   β”œβ”€β”€ loader.py      # Graph construction
β”‚   β”‚   └── main.py        # Database operations
β”‚   β”œβ”€β”€ analysis/          # Network analysis & insights
β”‚   β”‚   └── main.py        # NetworkX analysis
β”‚   └── utils/             # Shared utilities
β”‚       β”œβ”€β”€ config.py      # Configuration management
β”‚       β”œβ”€β”€ cache.py       # Translation caching
β”‚       └── logger.py      # Logging setup
β”‚
β”œβ”€β”€ data/                  # Data storage
β”‚   β”œβ”€β”€ raw/              # Original scraped content
β”‚   β”‚   β”œβ”€β”€ html/         # HTML pages
β”‚   β”‚   └── images/       # Digimon images
β”‚   β”œβ”€β”€ processed/        # Parsed JSON data
β”‚   β”œβ”€β”€ translated/       # English translations
β”‚   └── cache/            # Translation cache
β”‚
β”œβ”€β”€ notebooks/            # Analysis notebooks
β”‚   β”œβ”€β”€ 01_data_exploration.ipynb
β”‚   β”œβ”€β”€ 02_evolution_analysis.ipynb
β”‚   β”œβ”€β”€ 03_type_correlation.ipynb
β”‚   β”œβ”€β”€ 04_move_network.ipynb
β”‚   β”œβ”€β”€ 05_community_detection.ipynb
β”‚   β”œβ”€β”€ 06_centrality_analysis.ipynb
β”‚   β”œβ”€β”€ 07_machine_learning.ipynb
β”‚   └── 08_recommendations.ipynb
β”‚
β”œβ”€β”€ docs/                 # Documentation
β”‚   β”œβ”€β”€ analysis-specification.md
β”‚   β”œβ”€β”€ methodology.md
β”‚   β”œβ”€β”€ visualization-guide.md
β”‚   └── insights-summary.md
β”‚
β”œβ”€β”€ yggdrasil_cli.py      # CLI interface (ygg command)
β”œβ”€β”€ docker-compose.yml    # Neo4j container setup
β”œβ”€β”€ config.yaml           # Application configuration
β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ pyproject.toml        # Poetry/packaging config
└── flake.nix            # Nix development environment

Configuration

Edit .env file:

# Scraping settings
SCRAPE_DELAY=1.0  # Be respectful!
MAX_RETRIES=3

# Neo4j connection
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=digimon123

Data Model

Neo4j Graph Schema

graph TD
    subgraph NT["Node Types"]
        D[Digimon<br/>β€’ name_jp<br/>β€’ name_en<br/>β€’ profile<br/>β€’ image_url]
        L[Level<br/>β€’ name<br/>β€’ order]
        T[Type<br/>β€’ name]
        A[Attribute<br/>β€’ name]
        M[Move<br/>β€’ name<br/>β€’ description]
    end
    
    D -->|HAS_LEVEL| L
    D -->|HAS_TYPE| T
    D -->|HAS_ATTRIBUTE| A
    D -->|CAN_USE| M
    D -->|RELATED_TO| D
    
    subgraph SR["Similarity Relationships"]
        D2[Digimon] -.->|SHARES_TYPE| D3[Digimon]
        D2 -.->|SHARES_LEVEL| D3
        D2 -.->|SHARES_ATTRIBUTE| D3
        D2 -.->|SHARES_MOVE| D3
    end
    
    style NT fill:#666,stroke:#333,stroke-width:2px,color:#fff
    style SR fill:#666,stroke:#333,stroke-width:2px,color:#fff
    style D fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff
    style L fill:#444,stroke:#666,stroke-width:1px,color:#ccc
    style T fill:#444,stroke:#666,stroke-width:1px,color:#ccc
    style A fill:#444,stroke:#666,stroke-width:1px,color:#ccc
    style M fill:#444,stroke:#666,stroke-width:1px,color:#ccc
    style D2 fill:#444,stroke:#666,stroke-width:1px,color:#ccc
    style D3 fill:#444,stroke:#666,stroke-width:1px,color:#ccc
Loading

Graph Schema Details

Nodes:
β”œβ”€β”€ Digimon (Primary Entity)
β”‚   β”œβ”€β”€ name_jp: Japanese name
β”‚   β”œβ”€β”€ name_en: English name
β”‚   β”œβ”€β”€ profile_jp: Original description
β”‚   β”œβ”€β”€ profile_en: Translated description
β”‚   └── image_url: Character image
β”‚
β”œβ”€β”€ Level (Evolution Stage)
β”‚   └── name: Baby, Rookie, Champion, Ultimate, Mega, etc.
β”‚
β”œβ”€β”€ Type (Species Classification)
β”‚   └── name: Dragon, Machine, Beast, Angel, Demon, etc.
β”‚
β”œβ”€β”€ Attribute (Alignment)
β”‚   └── name: Vaccine, Virus, Data, Free, Variable
β”‚
└── Move (Special Attacks)
    └── name: Attack/technique name

Relationships:
β”œβ”€β”€ (Digimon)-[:HAS_LEVEL]->(Level)
β”œβ”€β”€ (Digimon)-[:HAS_TYPE]->(Type)
β”œβ”€β”€ (Digimon)-[:HAS_ATTRIBUTE]->(Attribute)
β”œβ”€β”€ (Digimon)-[:CAN_USE]->(Move)
β”œβ”€β”€ (Digimon)-[:EVOLVES_FROM]->(Digimon)
β”œβ”€β”€ (Digimon)-[:RELATED_TO]->(Digimon)
β”œβ”€β”€ (Digimon)-[:SHARES_TYPE]->(Digimon)
β”œβ”€β”€ (Digimon)-[:SHARES_LEVEL]->(Digimon)
β”œβ”€β”€ (Digimon)-[:SHARES_ATTRIBUTE]->(Digimon)
└── (Digimon)-[:SHARES_MOVE]->(Digimon)

Example Insights & Queries

Network Analysis Results

After analyzing the complete graph, the system discovers:

  1. Most Connected Digimon - Network hubs that share many relationships
  2. Evolution Chains - Complete paths from Baby to Mega level
  3. Type Clusters - Groups of similar Digimon based on shared characteristics
  4. Rare Combinations - Unique type/attribute pairings
  5. Move Popularity - Most common special attacks across species

Sample Neo4j Queries

// Find all Dragon-type Mega level Digimon
MATCH (d:Digimon)-[:HAS_TYPE]->(t:Type {name: "Dragon Type"})
MATCH (d)-[:HAS_LEVEL]->(l:Level {name: "Mega"})
RETURN d.name_en, d.name_jp
ORDER BY d.name_en;

// Discover evolution paths to a specific Digimon
MATCH path = (start:Digimon)-[:EVOLVES_FROM*]->(end:Digimon {name_en: "Omegamon"})
RETURN path;

// Find Digimon that share the most moves with Agumon
MATCH (agumon:Digimon {name_en: "Agumon"})-[:CAN_USE]->(m:Move)
MATCH (other:Digimon)-[:CAN_USE]->(m)
WHERE other <> agumon
RETURN other.name_en, COUNT(m) as shared_moves
ORDER BY shared_moves DESC
LIMIT 10;

// Identify type distribution by level
MATCH (d:Digimon)-[:HAS_LEVEL]->(l:Level)
MATCH (d)-[:HAS_TYPE]->(t:Type)
RETURN l.name as Level, t.name as Type, COUNT(d) as Count
ORDER BY Level, Count DESC;

// Find the shortest path between two Digimon
MATCH path = shortestPath(
  (d1:Digimon {name_en: "Agumon"})-[*]-(d2:Digimon {name_en: "Gabumon"})
)
RETURN path;

Development

CLI Commands

ygg start        # Start Neo4j database
ygg stop         # Stop Neo4j database
ygg status       # Check pipeline progress
ygg run          # Run complete pipeline
ygg prune        # Clean up data files
ygg prune --include-neo4j  # Clean data AND Neo4j
ygg --help       # Show all commands

Run Tests

pytest tests/

Code Formatting

black src/
ruff check src/

Type Checking

mypy src/

Jupyter Notebooks

# Run locally after activating your Python environment
jupyter notebook
# Or with JupyterLab
jupyter lab

Docker Services

Environment Variables

Variable Description Default
NEO4J_URI Neo4j connection string bolt://localhost:7687
SCRAPE_DELAY Seconds between requests 1.0
LOG_LEVEL Logging verbosity INFO
DEBUG Enable debug mode false

Quick Reference

Essential Commands

ygg start        # Start Neo4j
ygg stop         # Stop Neo4j
ygg status       # Check progress
ygg run          # Run full pipeline
ygg prune        # Clean data files

Pipeline Steps (in order)

ygg scrape --fetch-api  # 1. Scrape (40-50 min)
ygg parse               # 2. Parse (5 min)
ygg translate           # 3. Translate (60-90 min)
ygg load                # 4. Load to Neo4j (5 min)
ygg analyze             # 5. Analyze (1 min)

Maintenance

ygg prune               # Clean all data files
ygg prune --keep-cache  # Keep translations
ygg prune --include-neo4j  # Clean everything
ygg logs                # View Neo4j logs
ygg db-status           # Check database

Key Files

  • .env - Configuration
  • data/raw/html/ - Scraped HTML
  • data/processed/ - Parsed JSON
  • data/translated/ - English data
  • data/cache/translations.json - Translation cache

License

MIT License - see LICENSE file

Author

Ricardo Ledan ricardoledan@proton.me

Acknowledgments

About

πŸ‘Ύ A comprehensive knowledge graph built from digimon.net/reference to analyze relationships between Digimon based on their characteristics, evolution patterns, and shared attributes.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages