Skip to content

genezhang/clickgraph

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ClickGraph

ClickGraph - A Fork of Brahmand

A high-performance, stateless, read-only graph query engine for ClickHouse with Neo4j ecosystem compatibility.

Note: ClickGraph is a fork of Brahmand with additional features including Neo4j Bolt protocol support and view-based graph analysis. This is a read-only analytical query engine - write operations are not supported.


πŸš€ What's New (November 9, 2025)

Major Architectural Improvements ✨

Complete Multi-Schema Support

  • βœ… Full schema isolation: Different schemas can map same labels to different tables
  • βœ… Per-request schema selection: USE clause, schema_name parameter, or default
  • βœ… Clean architecture: Single source of truth for schema management (removed redundant GLOBAL_GRAPH_SCHEMA)
  • βœ… Thread-safe: Schema flows through entire query execution path
  • βœ… End-to-end tested: All 4 multi-schema tests passing

Code Quality Improvements

  • 🧹 Removed technical debt: Eliminated duplicate schema storage system
  • πŸ”§ Cleaner codebase: Simplified render layer helper functions
  • πŸ“Š All tests passing: 325 unit tests + 32 integration tests (100% non-benchmark)

πŸš€ Previous Updates (November 1, 2025)

Large-Scale Testing & Bug Fixes

ClickGraph tested successfully on 5 MILLION users and 50 MILLION relationships!

Benchmark Dataset Size Success Rate Status
Large 5M users, 50M follows 9/10 (90%) βœ… Stress Tested
Medium 10K users, 50K follows 10/10 (100%) βœ… Well Validated
Small 1K users, 5K follows 10/10 (100%) βœ… Fully Tested

What We Learned:

  • βœ… Direct relationships: Handling 50M edges successfully
  • βœ… Multi-hop traversals: Working on 5M node graphs
  • βœ… Variable-length paths: Scaling to large datasets
  • βœ… Aggregations: Pattern matching across millions of rows
  • βœ… Performance: ~2 seconds for most queries, even at large scale
  • ⚠️ Shortest paths: Memory limits on largest dataset (ClickHouse config dependent)

Recent Bug Fixes:

  • πŸ› ChainedJoin CTE wrapper for exact hop variable-length paths (*2, *3)
  • πŸ› Shortest path filter rewriting for WHERE clauses on end nodes
  • πŸ› Aggregation table name schema lookup for GROUP BY queries

Tooling:

  • πŸ“Š Comprehensive benchmarking suite with 3 scale levels
  • πŸ”§ ClickHouse-native data generation for efficient loading
  • πŸ“ˆ Performance metrics collection and analysis

πŸ“– Documentation:


Features

Core Capabilities

  • Read-Only Graph Analytics: Translates Cypher graph queries into optimized ClickHouse SQL for analytical workloads
  • ClickHouse-native: Extends ClickHouse with native graph modeling, merging OLAP speed with graph-analysis power
  • Stateless Architecture: Offloads all storage and query execution to ClickHouseβ€”no extra datastore required
  • Cypher Query Language: Industry-standard Cypher read syntax for intuitive, expressive property-graph querying
  • Variable-Length Paths: Recursive traversals with *1..3 syntax using ClickHouse WITH RECURSIVE CTEs
  • Path Variables & Functions: Capture and analyze path data with length(p), nodes(p), relationships(p) functions
  • Analytical-scale Performance: Optimized for very large datasets and complex multi-hop traversals
  • Query Performance Metrics: Phase-by-phase timing with HTTP headers and structured logging for monitoring and optimization

Neo4j Ecosystem Compatibility

  • Bolt Protocol v4.4: Full Neo4j driver compatibility for seamless integration
  • Multi-Schema Support: βœ… Fully working - Complete schema isolation with per-request schema selection:
    • USE clause: Cypher USE database_name syntax (highest priority)
    • Session/request parameter: Bolt session database or HTTP schema_name parameter
    • Default schema: Fallback to "default" schema
    • Schema isolation: Different schemas map same labels to different ClickHouse tables
  • Dual Server Architecture: HTTP REST API and Bolt protocol running simultaneously
  • Authentication Support: Multiple authentication schemes including basic auth
  • Tool Compatibility: Works with existing Neo4j drivers, browsers, and applications

View-Based Graph Model

  • Zero Migration: Transform existing relational data into graph format through YAML configuration
  • Native Performance: Leverages ClickHouse's columnar storage and query optimization
  • Robust Implementation: Comprehensive validation, error handling, and optimization passes

Architecture

ClickGraph runs as a lightweight graph wrapper alongside ClickHouse with dual protocol support:

acrhitecture

HTTP API (Port 8080)

  1. Client sends HTTP POST request with Cypher query to ClickGraph
  2. ClickGraph parses & plans the query, translates to ClickHouse SQL
  3. ClickHouse executes the SQL and returns results
  4. ClickGraph sends JSON results back to the client

Bolt Protocol (Port 7687)

  1. Neo4j Driver/Tool connects via Bolt protocol to ClickGraph
  2. ClickGraph handles Bolt handshake, authentication, and message protocol
  3. Cypher queries are processed through the same query engine as HTTP
  4. Results are streamed back via Bolt protocol format

Both protocols share the same underlying query engine and ClickHouse backend.


πŸš€ Quick Start

New to ClickGraph? See the Getting Started Guide for a complete walkthrough.

⚠️ Windows Users: The HTTP server has a known issue on Windows. Use Docker or WSL for development. See KNOWN_ISSUES.md for details.

5-Minute Setup (Docker - Recommended)

  1. Clone and start services:

    git clone https://github.com/genezhang/clickgraph
    cd clickgraph
    docker-compose up -d

    This starts both ClickHouse and ClickGraph with test data pre-loaded.

  2. Test the setup:

    curl -X POST http://localhost:8080/query \
      -H "Content-Type: application/json" \
      -d '{"query": "MATCH (u:User) RETURN u.full_name LIMIT 5"}'

Native Build (Linux/macOS/WSL)

  1. Start ClickHouse:

    docker-compose up -d clickhouse
  2. Configure and run:

    export CLICKHOUSE_URL="http://localhost:8123"
    export CLICKHOUSE_USER="test_user"
    export CLICKHOUSE_PASSWORD="test_pass"
    export CLICKHOUSE_DATABASE="brahmand"
    
    cargo run --bin clickgraph
  3. Test with HTTP API:

    curl -X POST http://localhost:8080/query \
      -H "Content-Type: application/json" \
      -d '{"query": "RETURN 1 as test"}'
  4. Test with Neo4j driver:

    from neo4j import GraphDatabase
    
    driver = GraphDatabase.driver("bolt://localhost:7687")
    with driver.session() as session:
        result = session.run("RETURN 1 as test")
  5. Use the USE clause for multi-database queries:

    -- Query specific database using Neo4j-compatible USE clause
    USE social_network
    MATCH (u:User)-[:FOLLOWS]->(friend)
    RETURN u.name, collect(friend.name) AS friends
    
    -- USE overrides session/request parameters
    USE ecommerce
    MATCH (p:Product) WHERE p.price > 100 RETURN p.name

πŸ“– Complete Setup Guide β†’

πŸ“Š View-Based Graph Model

Transform existing relational data into graph format through YAML configuration:

Example: Map your users and user_follows tables to a social network graph:

views:
  - name: social_network
    nodes:
      user:                    # Node label in Cypher queries
        source_table: users
        id_column: user_id
        property_mappings:
          name: full_name
    relationships:
      follows:                 # Relationship type in Cypher queries
        source_table: user_follows
        from_node: user        # Source node label
        to_node: user          # Target node label
        from_id: follower_id
        to_id: followed_id

Then query with standard Cypher:

MATCH (u:user)-[:follows]->(friend:user)
WHERE u.name = 'Alice'
RETURN friend.name

OPTIONAL MATCH for handling optional patterns:

-- Find all users and their friends (if any)
MATCH (u:user)
OPTIONAL MATCH (u)-[:follows]->(friend:user)
RETURN u.name, friend.name

-- Mixed required and optional patterns
MATCH (u:user)-[:authored]->(p:post)
OPTIONAL MATCH (p)-[:liked_by]->(liker:user)
RETURN u.name, p.title, COUNT(liker) as likes

β†’ Generates efficient LEFT JOIN SQL with NULL handling for unmatched patterns

πŸš€ Examples

⚑ Quick Start - 5 Minutes to Graph Analytics

Perfect for first-time users! Simple social network demo with:

  • 3 users, friendships - minimal setup with Memory tables
  • Basic Cypher queries - find friends, mutual connections
  • HTTP & Neo4j drivers - both integration methods
  • 5-minute setup - zero to working graph analytics

πŸ“Š E-commerce Analytics - Comprehensive Demo

Complete end-to-end demonstration with:

  • Complete data setup with realistic e-commerce schema (customers, products, orders, reviews)
  • Advanced graph queries for customer segmentation, product recommendations, and market basket analysis
  • Real-world workflows with both HTTP REST API and Neo4j driver examples
  • Performance optimization techniques and expected benchmarks
  • Business insights from customer journeys, seasonal patterns, and cross-selling opportunities

Start with Quick Start, then explore E-commerce Analytics for advanced usage! 🎯

οΏ½πŸ”§ Configuration

ClickGraph supports flexible configuration via command-line arguments and environment variables:

# View all options
cargo run --bin clickgraph -- --help

# Custom ports
cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688

# Disable Bolt protocol (HTTP only)
cargo run --bin clickgraph -- --disable-bolt

# Custom host binding
cargo run --bin clickgraph -- --http-host 127.0.0.1 --bolt-host 127.0.0.1

# Configure CTE depth limit for variable-length paths (default: 100)
cargo run --bin clickgraph -- --max-cte-depth 150
export CLICKGRAPH_MAX_CTE_DEPTH=150  # Or via environment variable

See docs/configuration.md for complete configuration documentation.

οΏ½ Running in Background (Windows)

For Windows users, ClickGraph supports running in the background using PowerShell jobs:

PowerShell Background Jobs (Recommended)

# Start server in background
.\start_server_background.ps1

# Check if server is running
Invoke-WebRequest -Uri "http://localhost:8080/health"

# Stop the server (replace JOB_ID with actual job ID shown)
Stop-Job -Id JOB_ID; Remove-Job -Id JOB_ID

Alternative: New Command Window

Use the batch file to start the server in a separate command window:

start_server_background.bat

Manual Daemon Mode

The server also supports a --daemon flag for Unix-like daemon behavior:

cargo run --bin clickgraph -- --daemon --http-port 8080

οΏ½πŸ“š Documentation

User Guides

Technical Documentation

For Contributors

Reference

πŸš€ Performance

Preliminary informal tests on a MacBook Pro (M3 Pro, 18 GB RAM) running ClickGraph in Docker against a ~12 million-node Stack Overflow dataset show multihop traversals running approximately 10Γ— faster than Neo4j v2025.03. These early, unoptimized results are for reference only; a full benchmark report is coming soon.

πŸ§ͺ Development Status

Latest Update: November 1, 2025 - 100% Benchmark Success Rate πŸŽ‰

Production-Ready Features

  • βœ… All Graph Query Types Working: 10/10 benchmark queries passing (100% success rate)
    • Simple node lookups and filtered scans
    • Direct and multi-hop relationship traversals
    • Variable-length paths with exact (*2) and range (*1..3) specifications
    • Shortest path algorithms with WHERE clause filtering
    • Aggregations with GROUP BY and ORDER BY
    • Bidirectional patterns (mutual relationships)
  • βœ… Query Performance Metrics: Phase-by-phase timing with HTTP headers and structured logging
  • βœ… Neo4j Bolt Protocol v4.4: Full compatibility with Neo4j drivers and tools
  • βœ… PageRank Algorithm: Graph centrality analysis with CALL pagerank(iterations: 10, damping: 0.85)
  • βœ… OPTIONAL MATCH: LEFT JOIN semantics for optional graph patterns with NULL handling
  • βœ… Variable-Length Paths: Recursive CTEs with chained JOIN optimization for exact hop counts
  • βœ… Shortest Path Functions: shortestPath() and allShortestPaths() with early termination
  • βœ… Path Variables & Functions: MATCH p = (a)-[*]->(b) RETURN length(p), nodes(p), relationships(p)
  • βœ… Multiple Relationship Types: [:FOLLOWS|FRIENDS_WITH] with UNION ALL SQL generation
  • βœ… View-Based Graph Model: Transform existing tables to graphs via YAML configuration
  • βœ… Dual Server Architecture: HTTP REST API and Bolt protocol simultaneously
  • βœ… Comprehensive Testing: 312/312 tests passing (100% success rate)
  • βœ… Flexible Configuration: CLI options, environment variables, Docker deployment

Recent Bug Fixes (November 1, 2025)

  • πŸ› Fixed: ChainedJoin CTE wrapper for exact hop queries (*2, *3)
  • πŸ› Fixed: Shortest path filter rewriting for WHERE clauses
  • πŸ› Fixed: Table name schema lookup for aggregation queries
  • πŸ“Š Validated: All fixes confirmed with production benchmark suite

Known Considerations

  • ⚠️ Read-Only Engine: Write operations (CREATE, SET, DELETE, MERGE) are not supported
  • ⚠️ Schema warnings: Cosmetic warnings about internal catalog system (functionality unaffected)
  • πŸ”§ Memory vs MergeTree: Use Memory engine for development, MergeTree for persistent storage
  • 🐳 Docker permissions: May require volume permission fixes on some systems

Benchmark Results

Tested with 1,000 users, 4,997 relationships on social_benchmark.yaml:

  • Success Rate: 10/10 queries (100%)
  • Performance: All query types executing correctly
  • Documentation: See notes/benchmarking.md for detailed results

🀝 Contributing

ClickGraph welcomes contributions! Key areas for development:

  • Additional Cypher language features
  • Query optimization improvements
  • Neo4j compatibility enhancements
  • Performance benchmarking
  • Documentation improvements

πŸ“„ License

ClickGraph is licensed under the Apache License, Version 2.0. See the LICENSE file for details.

This project is a fork of Brahmand with significant enhancements for Neo4j ecosystem compatibility and enterprise deployment capabilities.

About

ClickGraph is an open-source graph query layer on top of ClickHouse.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 69.0%
  • Python 23.6%
  • Jupyter Notebook 5.3%
  • PowerShell 1.7%
  • Batchfile 0.3%
  • Cypher 0.1%