This project is NOT production-ready and should NOT be used in production environments.
This is a DEMONSTRATION project exploring the feasibility of building a full SQL layer with ACID transactions on top of Apache Cassandra. It is intended for Proof-of-concept demonstrations only. The intent is to spark a discussion about what "SQL" could look like for Apache Cassandra, if the Cassandra community believes that's an appropriate direction. This is to demonstrate that one COULD build a SQL-like database on top of Cassandra, without worrying about problems like wide partitions.
Motivation This type of project approaches feasibility in 2026 due to a series of recent improvements, the combination of which start to provide us with the primitives one needs to build a distributed SQL database.
In particular:
- Transactional Cluster Metadata makes it viable to run ByteOrderedPartitioner, because the frequent range splitting should become safe and reliable.
- Accord provides multi-key transactions, which allows us to break out of the classic Cassandra partition boundary
Without these two improvements, one would be forced to try to translate SQL types and grammar into CQL partitions / model, and deal with wide partitions and similar classic Cassandra data model challenges.
With these two improvements, one can treat Cassandra as an ordered, transactional key-value store, which is the same basic interface nearly every other distributed SQL database builds upon.
Known Limitations:
⚠️ Not intended for production use - This is a proof of concept only. Don't run this in production.⚠️ Security features excluded - Authentication, authorization, and encryption are deliberately excluded to prevent production use. Don't run this in production.⚠️ Performance not optimized - Not optimized for production workloads. There are numerous patterns here that are safe but not performant. For example, Sequences / IDs likely hot spot onto a single host/process in real life. That's not great.⚠️ Incomplete error handling - Many edge cases and error conditions not fully handled.⚠️ Limited test coverage - See docs/TESTING.md for details⚠️ No formal correctness proofs - Basic testing exists (see src/test/java/com/geico/poc/cassandrasql/kv/TransactionSafetyTest.java) but no formal verification
Cassandra Limitations:
⚠️ Cassandra doesn't support reverse range reads - Some orderings are profoundly slow⚠️ Accord doesn't support variable sized keys - Byte Order Partitioner + Accord is poorly tested, journals are not compacting, gets slower over time
Cassandra-SQL provides a PostgreSQL wire protocol interface to Apache Cassandra, enabling standard SQL queries with ACID transaction support. The project demonstrates how to build a distributed SQL database by combining:
- Apache Cassandra for distributed storage and replication
- Cassandra Accord for distributed transactions and consensus
- Apache Calcite for SQL parsing and query planning
- PostgreSQL Wire Protocol for client compatibility
- KV Storage Backend: MVCC-based key-value storage with optimistic concurrency control
- Implementation:
kv/KvStore.java
- Implementation:
- Percolator-Style Transactions: Two-phase commit protocol for ACID guarantees
- Implementation:
kv/KvTransactionCoordinator.java - Uses Accord transactions for atomicity - see docs/TRANSACTIONS.md
- Implementation:
- Cost-Based Query Optimizer: Index selection, join ordering, and predicate pushdown
- Implementation:
optimizer/QueryOptimizer.java
- Implementation:
- Lazy Schema Operations: O(1) DROP TABLE and TRUNCATE using table versioning
- Verified by:
LazyDropTest.java
- Verified by:
- Background Jobs: Vacuum, constraint checking, and index consistency maintenance
- Java 17 or higher
- Apache Cassandra 5.0+ with Accord enabled
- PostgreSQL client (
psql) for testing
Cassandra must be configured with Accord enabled in cassandra.yaml:
# Set the partitioner to ByteOrderedPartitioner
partitioner: org.apache.cassandra.dht.ByteOrderedPartitioner
# Enable accord
accord:
enabled: true-
Clone the repository:
git clone https://github.com/geico/cassandra-sql.git cd cassandra-sql -
Start Cassandra:
./bin/start-cassandra.sh
-
Build and run the SQL server:
./gradlew bootRun
-
Connect with psql:
psql -h localhost -p 5432 -d cassandra_sql
An e-commerce demo script showcases the SQL capabilities:
./demo-ecommerce.shThis demonstrates:
- Complex DDL (tables, indexes, enums, views)
- Transactions with multiple statements
- JOINs, aggregations, and subqueries
- Materialized views
See docs/DEMO_ECOMMERCE.md for detailed demo walkthrough.
See docs/SQL_GRAMMAR.md for detailed implementation status.
DDL (Data Definition Language):
- ✅ CREATE/DROP TABLE
- ✅ CREATE/DROP INDEX
- ✅ CREATE/DROP VIEW (including MATERIALIZED VIEW)
- ✅ CREATE TYPE (ENUM)
- ✅ ALTER TABLE (limited)
- ✅ TRUNCATE TABLE
DML (Data Manipulation Language):
- ✅ SELECT with WHERE, ORDER BY, LIMIT, OFFSET
- ✅ INSERT, UPDATE, DELETE
- ✅ INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN
- ✅ Subqueries (correlated and uncorrelated) - see
kv/FromSubqueryTest.javaandSubqueryEnhancedTest.java - ✅ UNION, UNION ALL - see
kv/UnionQueryTest.java - ✅ Common Table Expressions (WITH) - see
kv/DemoScriptFeaturesTest.java
Data Types:
- ✅ INT, BIGINT, DOUBLE, DECIMAL
- ✅ VARCHAR, TEXT
- ✅ BOOLEAN
- ✅ TIMESTAMP
- ✅ ARRAY
- ✅ ENUM (custom types)
Aggregations:
- ✅ COUNT, SUM, AVG, MIN, MAX
- ✅ GROUP BY, HAVING
- ✅ DISTINCT
Transactions:
- ✅ BEGIN, COMMIT, ROLLBACK - see
kv/SimpleTransactionTest.java - ✅ Multi-statement transactions - see
kv/TransactionAtomicityTest.testMultiKeyAtomicity() - ✅ Serializable isolation level - see
kv/TransactionAtomicityTest.testSnapshotIsolation()
- ❌ Authentication - Deliberately excluded so you don't use this in production
- ❌ Foreign keys - Constraints defined but not enforced (see
kv/ForeignKeyConstraintTest.javafor current status) - ❌ Triggers - Not implemented
- ❌ Stored procedures - Not implemented
- ❌ Window functions - Parsed but not executed (see
window/WindowFunctionTest.java) - ❌ Full-text search - Not implemented
- ❌ JSON/JSONB types - Not implemented
- ❌ User-defined functions - Not implemented
- ❌ Partitioning - Not implemented
- ❌ Replication control - Uses Cassandra's native replication
See docs/COMPARISON.md for a detailed comparison.
| Feature | PostgreSQL | Cassandra-SQL |
|---|---|---|
| Architecture | Single-node or primary-replica | Distributed, leaderless |
| Consistency | Strong (ACID) | Strong (Accord enabled tables) |
| Transactions | MVCC with 2PL | Percolator-like 2PC with Accord |
| Scalability | Vertical (with read replicas) | Horizontal (distributed) |
| SQL Compliance | ~99% | ~40% (core features only) |
| Maturity | Production-ready (30+ years) | Proof of concept (Do not use) |
| Performance | Optimized for single-node | Not optimized |
| Use Case | General-purpose OLTP | Distributed OLTP (experimental) |
- ✅ Production applications
- ✅ Complex SQL queries and analytics
- ✅ Strong consistency requirements
- ✅ Mature ecosystem and tooling
- ✅ Single-datacenter deployments
- 🔬 Convincing Cassandra Engineers that SQL is closer than they thought
Again: Do NOT use in production!
See docs/TRANSACTIONS.md for detailed information.
Cassandra-SQL implements Serializable isolation using a Percolator-style two-phase commit protocol:
- Prewrite Phase: Acquire locks and write uncommitted data
- Commit Phase: Atomically commit primary lock, then secondary locks
- ✅ Atomicity: All statements in a transaction commit or rollback together
- Verified by:
kv/TransactionAtomicityTest.java
- Verified by:
- ✅ Consistency: Constraints are checked (though enforcement is limited)
- ✅ Isolation: Transactions are serializable (no dirty reads, no phantom reads)
- Verified by:
kv/TransactionAtomicityTest.testSnapshotIsolation()
- Verified by:
- ✅ Durability: Committed data persists (via Cassandra replication)
⚠️ Not formally verified: The implementation has not been proven correct- See docs/TRANSACTIONS.md for detailed warnings
⚠️ Will contain bugs: Edge cases and race conditions almost certainly exist⚠️ Limited testing: Test coverage is incomplete (~95 test files, many edge cases not covered)- See docs/TESTING.md for test coverage details
⚠️ Lock conflicts: Concurrent writes to the same keys will conflict and abort- Tested by:
kv/SimpleConflictTest.java
- Tested by:
See src/test/java/com/geico/poc/cassandrasql/ for transaction tests:
kv/TransactionSafetyTest.java- Atomic lock acquisition, commit verification, timestamp allocationkv/TransactionAtomicityTest.java- Multi-key atomicity, snapshot isolation, write conflictskv/PercolatorTransactionTest.java- Percolator protocol correctnesskv/SimpleTransactionTest.java- Basic transaction semanticskv/TransactionCorrectnessTest.java- Transaction correctness verificationkv/KvCorrectnessTest.java- MVCC snapshot isolation, delete/update correctnessLazyDropTest.java- Table versioning and vacuum
See docs/TESTING.md for testing philosophy and coverage.
- Unit Tests: Individual component testing
- Integration Tests: End-to-end SQL execution
- Transaction Tests: ACID property verification
- Concurrency Tests: Multi-client scenarios
- Performance Tests: Benchmarking (not comprehensive)
# Run all tests
./gradlew test
# Run specific test class
./gradlew test --tests "com.geico.poc.cassandrasql.LazyDropTest"
# Run with verbose output
./gradlew test --info- Test what matters: Focus on correctness of core features
- Integration over unit: Prefer end-to-end tests for SQL semantics
- Isolation: Each test cleans up after itself
- Deterministic: Tests should pass consistently
- Fast feedback: Tests should run quickly (< 1 minute per class)
- ✅ 95+ Java test files (see src/test/java/com/geico/poc/cassandrasql/)
- ✅ Core SQL operations (SELECT, INSERT, UPDATE, DELETE) - see
kv/KvCorrectnessTest.java - ✅ JOIN execution (binary and multi-way) - see
kv/JoinOrderByLimitTest.java - ✅ Transaction isolation - see
kv/TransactionAtomicityTest.java - ✅ Lazy DROP TABLE and TRUNCATE - see
LazyDropTest.javaandTruncateTest.java ⚠️ Some edge cases not covered - See docs/TESTING.md for coverage gaps⚠️ Performance tests are minimal - No comprehensive benchmarking suite
See docs/ARCHITECTURE.md for detailed architecture documentation. Key components are implemented in:
- Transaction Coordinator:
kv/KvTransactionCoordinator.java- Percolator-style two-phase commit - KV Store:
kv/KvStore.java- MVCC key-value storage - Query Executor:
kv/KvQueryExecutor.java- SQL query execution - Timestamp Oracle:
kv/TimestampOracle.java- Distributed timestamp allocation
┌─────────────────────────────────────────────────────────────┐
│ PostgreSQL Clients │
│ (psql, JDBC, pgAdmin, etc.) │
└────────────────────────┬────────────────────────────────────┘
│ PostgreSQL Wire Protocol
┌────────────────────────▼────────────────────────────────────┐
│ PostgreSQL Protocol Server │
│ (PostgresConnectionHandler.java) │
└────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────┐
│ Query Service │
│ (QueryService.java) │
└────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────┐
│ Calcite SQL Parser │
│ (CalciteParser.java) │
└────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────┐
│ Query Executor (KV Mode) │
│ (KvQueryExecutor.java) │
│ ┌──────────────┬──────────────┬──────────────┬──────────┐ │
│ │ Join Exec │ Agg Exec │Subquery Exec │Union Exec│ │
│ └──────────────┴──────────────┴──────────────┴──────────┘ │
└────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────┐
│ Transaction Coordinator │
│ (KvTransactionCoordinator.java) │
│ (Percolator-Like Protocol) │
└────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────┐
│ KV Store Layer │
│ (KvStore.java) │
│ (MVCC, Timestamp Oracle) │
└────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────┐
│ Apache Cassandra 5.0+ │
│ (with Accord) │
└─────────────────────────────────────────────────────────────┘
- KV Storage Backend: All data stored as key-value pairs with encoded keys
- See
kv/KeyEncoder.javafor key encoding logic - See
bin/decode-key.shfor key decoding utility
- See
- MVCC: Multi-version concurrency control with timestamp-based versioning
- See
kv/KvStore.javafor MVCC implementation
- See
- Percolator Transactions: Two-phase commit with lock tables
- Lazy Schema Operations: Table versioning instead of immediate data deletion
- Verified by:
LazyDropTest.java
- Verified by:
- Background Jobs: Asynchronous vacuum and constraint checking
- See docs/BACKGROUND_JOBS.md for details
See docs/CONFIGURATION.md for all configuration options.
# application.yml
cassandra-sql:
# PostgreSQL protocol server
postgres:
port: 5432
max-connections: 100
# Cassandra connection
cassandra:
contact-points: localhost
port: 9042
keyspace: cassandra_sql
# Background jobs
background-jobs:
vacuum:
enabled: true
interval-minutes: 60
retention-hours: 24This is a proof-of-concept project. Contributions are welcome for:
- Bug fixes
- Test coverage improvements
- Documentation improvements
- Performance optimizations
- New SQL features
Important: This project is not intended for production use. Major architectural changes should be discussed first via GitHub issues.
- Prerequisites: Java 17+, Apache Cassandra 5.0+ with Accord enabled
- Build:
./gradlew build - Test:
./gradlew test - Run:
./gradlew bootRun
See docs/CONFIGURATION.md for detailed setup instructions.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Apache Cassandra - Distributed database foundation
- Apache Calcite - SQL parsing and query planning
- PostgreSQL - Wire protocol specification
- Google Percolator - Transaction protocol inspiration
For questions, issues, or discussions, please open an issue on GitHub.
This software is provided "as is", without warranty of any kind, express or implied. The authors and contributors are not liable for any damages arising from the use of this software.
Use at your own risk. Do not use in production.