JParq

JParq is a JDBC driver for Apache Parquet files. It treats a directory as a database and every .parquet file in that directory as a table. The table name is the filename without the .parquet extension. Sub directories are treated as schemas. For example, given the following directory structure:

/data
  ├── employees.parquet
  ├── departments.parquet
  └── sales
      ├── orders.parquet
      └── customers.parquet

The tables employees and departments would be in the default PUBLIC schema, while orders and customers would be in the sales schema.

JParq uses Apache Arrow and Apache Parquet for efficient columnar reads and jsqlparser to parse SQL statements. It aims to be 100% compliant with the read part of the SQL standard. There are a few common extensions supported as well e.g. LIMIT, ASCII, REPEAT, DIFFERENCE.

Note: When common implementations differ from the SQL standard, we stick to the standard. An example of this is convert which in many databases is used for data type casting, but in the SQL standard is used for character set conversion. If you want to convert data types, use CAST instead.

Note The majority of the code was created in collaboration with (vibe coded with) ChatGPT Codex with Copilot and myself (Per Nyfelt) as code reviewers.

Requirements

Java 21 (higher is not supported due to Hadoop restrictions)
Parquet files stored in a directory accessible from the JVM running the driver

Installation

Add the dependency to your build. Replace x.y.z with the latest version number from the Maven Central badge above.

<dependency>
  <groupId>se.alipsa</groupId>
  <artifactId>jparq</artifactId>
  <version>x.y.z</version>
</dependency>

Usage

The driver registers itself when the se.alipsa.jparq.JParqDriver class is loaded, so simply placing the JAR on the classpath is normally enough. If your runtime requires explicit registration you can call Class.forName("se.alipsa.jparq.JParqDriver") before obtaining a connection.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

import se.alipsa.jparq.JParqSql;

public class JParqExample {

  // Standard JDBC
  void selectMtcarsLimit() throws SQLException {
    String jdbcUrl = "jdbc:jparq:/home/user/data";
    try (Connection conn = DriverManager.getConnection(jdbcUrl);
         Statement stmt = conn.createStatement();
         ResultSet rs = stmt.executeQuery("SELECT * FROM mtcars LIMIT 5")) {
      while (rs.next()) {
        System.out.println(rs.getString(1));
      }
    }
  }

  // Using the JParqSql helper
  void selectMtcarsToyotas() {
    String jdbcUrl = "jdbc:jparq:/home/user/data";
    JParqSql jparqSql = new JParqSql(jdbcUrl);
    jparqSql.query("SELECT model, cyl, mpg FROM mtcars WHERE model LIKE 'Toyota%'", rs -> {
      try {
        while (rs.next()) {
          System.out.println(rs.getString(1) + ", " + rs.getInt(2) + ", " + rs.getDouble(3));
        }
      } catch (SQLException e) {
        System.out.println("Query failed: " + e);
      }
    });
  }
}

Connection options

caseSensitive — defaults to false. Set to true to make table-name resolution case-sensitive.
```
jdbc:jparq:/home/user/data?caseSensitive=true
```
Paths may be specified directly or using the file:// prefix.

Fat Jar

A "fat" jar, which includes all dependencies, is also created. This is useful when you want to add the driver to the classpath of a tool. The fat jar is named jparq-x.y.z-fat.jar i.e. fat is the maven classifier. It is available in maven central and in the releases section.

JParq CLI

An interactive command line interface is available for querying Parquet files using JParq. The CLI supports command history and basic line editing features. It also includes several commands starting with a slash (/) for controlling the CLI itself. To start the CLI, run the following command, replacing path/to/jparq.jar with the actual path to the JParq fat jar and the {path to jparq dir} with the path the directory where your parquet files reside.

java -jar path/to/jparq.jar {path to jparq dir}

To download the fat jar, you can either download it from releases or use the following command:

LATEST_VERSION=$(curl -s https://repo1.maven.org/maven2/se/alipsa/jparq/maven-metadata.xml | sed -nE 's/.*<release>(.*)<\/release>.*/\1/p') && \
curl -o jparq.jar https://repo1.maven.org/maven2/se/alipsa/jparq/$LATEST_VERSION/jparq-$LATEST_VERSION-fat.jar

For simpler access you can create a script to it (e.g called jparq) that you put in your ~/bin dir

#!/usr/bin/env bash
java -jar /path/to/jparq.jar $@

Once the CLI is started, you can use the following commands:

/connect - Connect to a Parquet database directory (dir with parquet files)
/close - Close the current connection
/list - List available tables
/describe - Show column definitions for a table
/info - Show information about the current connection
/help - Display this help text
/exit - Exit the CLI

Anything else will be treated as a SQL query and executed against the connected dir.

Example session:

$ java -jar ~/lib/jparq.jar target/test-classes/acme
JParq CLI version 1.2.0
Connected to /home/user1/project/JParq/target/test-classes/acme
jparq(acme)> select * from employees e left join salary s on e.id = salary.employee
e__id | first_name | last_name  | s__id | employee | salary   | change_date
------+------------+------------+-------+----------+----------+------------
1     | Per        | Andersson  | 1     | 1        | 150000.0 | 2020-03-01 
1     | Per        | Andersson  | 6     | 1        | 160000.0 | 2021-01-01 
1     | Per        | Andersson  | 8     | 1        | 165000.0 | 2021-08-01 
2     | Karin      | Pettersson | 2     | 2        | 180000.0 | 2020-03-01 
3     | Tage       | Lundström  | 3     | 3        | 130000.0 | 2021-01-01 
3     | Tage       | Lundström  | 7     | 3        | 140000.0 | 2021-12-01 
4     | Arne       | Larsson    | 4     | 4        | 195000.0 | 2020-10-01 
5     | Sixten     | Svensson   | 5     | 5        | 230000.0 | 2020-12-15 
jparq(acme)> /exit
$

JDBC support

Driver scope — Read-only. Only SELECT statements are supported; executeUpdate/DML, generated keys, and updatable result sets are not implemented. Result sets are TYPE_FORWARD_ONLY, CONCUR_READ_ONLY with CLOSE_CURSORS_AT_COMMIT holdability.
Prepared statements — SQL is planned when the PreparedStatement is created. Positional ? parameters are bound via the standard setXxx methods and rendered as safe SQL literals (named parameters are also supported). Parameter metadata is available but reports generic VARCHAR/IN characteristics because actual typing occurs at execution. Only forward-only/read-only statements with CLOSE_CURSORS_AT_COMMIT holdability are accepted; overloads requesting other result-set types or generated keys/column-based key selection throw SQLFeatureNotSupportedException. Prepared-statement batching without a SQL argument is supported for convenience and reports SUCCESS_NO_INFO counts per entry. Statement batching and addBatch(String) remain unsupported.
Transactions and sessions — The connection is always read-only with TRANSACTION_NONE; commit, rollback, and savepoints are no-ops. getAutoCommit() always returns true, and changing isolation level, holdability, or poolability is not supported.
Metadata — DatabaseMetaData exposes tables and columns derived from the base directory and subdirectories (schemas); the catalog is the base directory name. Callable statements, stored procedures, and custom type mappings are not supported.
Other JDBC surfaces — CallableStatement, createClob/createBlob/createArrayOf, network timeout, and client-info features are not implemented. Connection validation (isValid) is not supported.

Non standard JDBC extensions

Named parameters i.e. :paramName syntax for prepared statements

SQL Support

JParq aims to be fully compliant with the read part of the SQL standard (SQL:2016 and earlier). As of version 1.1.1, the support for the standard is complete (as far as I know). The following is a detailed list of supported features and functions.

The following SQL statements are supported:

SELECT with support for
- * to select all columns
- Qualified wildcard projections (table.*).
- alias support for columns and tables
- Support computed expressions with aliases (e.g. SELECT mpg*2 AS double_mpg)
- Quoted identifiers using "double quotes" as per the SQL standard.
- CASE support
SELECT statements with WHERE supporting:
- BETWEEN, IN, LIKE operators
- AND, OR, NOT logical operators
- Comparison operators: =, !=, <, >, <=, >=
- Null checks: IS NULL, IS NOT NULL
ORDER BY clause with multiple columns and ASC/DESC options
- support for optional explicit NULLS FIRST and NULLS LAST
OFFSET and LIMIT support
- Also support the PostgreSQL shorthand LIMIT n OFFSET m syntax
Standard row-limiting syntax (FETCH FIRST / OFFSET … FETCH)
DISTINCT support
Functions support
- Date functions
- Aggregate functions (count, sum, avg, max, min)
- CAST support
- CONVERT support (for both character set conversion with USING and data type conversion without USING, as per SQL standard and common extensions)
- coalesce (The COALESCE() function returns the first non-null value in a list.)
- String functions (all SQL standard string function supported)
  - also support || operator for string concatenations
- Numeric functions (abs, ceil/ceiling, floor, round, sqrt, truncate/trunc, mod, power/pow, exp, log, log10, rand/random, sign, sin, cos, tan, cot, asin, acos, atan, atan2, degrees, radians, pi)
- ARRAY constructor function
comments (line --) and block (/* */)
Subquery support
- In the SELECT Clause : Used to return a single value or a set of values. e.g. SELECT first_name, ( SELECT department_name FROM departments WHERE departments.department_id = employees.department_id ) AS department_name FROM employees;
- In the FROM Clause : Treated as a derived table or inline view. E.g: SELECT * FROM (SELECT first_name, salary FROM employees WHERE salary > 5000) AS "high_salaried"
- In the WHERE Clause : Used to filter the results. e.g SELECT first_name FROM employees WHERE department_id IN (SELECT department_id FROM departments WHERE location_id>1500);
- In the HAVING Clause : Used to filter groups. E.g: SELECT department_id, AVG(salary) FROM employees GROUP BY department_id HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);
GROUP BY support
- COUNT(*) aggregation
- HAVING clause with conditions
- support aggregation functions and case statements in the GROUP BY and SELECT clause
EXISTS support
ANY and ALL support
Join support: INNER, LEFT, RIGHT, FULL, CROSS, and Self Join, join ... using syntax
UNION and UNION ALL support
Complete set-operation coverage.
- EXCEPT
- INTERSECT
- INTERSECT ALL
- EXCEPT ALL
- nesting of set operations
CTE (Common Table Expressions) support
Windowing
- Ranking functions
  - ROW_NUMBER, RANK, DENSE_RANK, PERCENT_RANK, CUME_DIST, NTILE
- Aggregate window functions
- SUM, AVG, MIN, MAX, COUNT
- Analytic Value/Navigation Functions
  - LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE
Advanced GROUP BY constructs i.e:
- GROUPING SETS
- ROLLUP
- CUBE
Derived Tables: UNNEST with or without a table wrapper, LATERAL derived tables, VALUES table constructors
INFORMATION_SCHEMA.COLUMNS and INFORMATION_SCHEMA.TABLES

String functions support details

Character Length and Position

CHAR_LENGTH(string) or CHARACTER_LENGTH(string) Returns number of characters in a string. CHAR_LENGTH('hello') → 5
OCTET_LENGTH(string) Returns number of bytes in the string (depends on encoding). OCTET_LENGTH('Å') → 2 (in UTF-8)
POSITION(substring IN string) Finds the position (1-based) of substring in string. POSITION('l' IN 'hello') → 3

Substrings and Extraction

SUBSTRING(string FROM start [FOR length]) Extracts substring starting at start, optionally limited by length. SUBSTRING('abcdef' FROM 2 FOR 3) → 'bcd'
LEFT(string, count) (optional extension) Leftmost characters. LEFT('abcdef', 3) → 'abc'
RIGHT(string, count) (optional extension) Rightmost characters. RIGHT('abcdef', 2) → 'ef'

Concatenation

CONCAT(string1, string2, …) Concatenates two or more strings (SQL:2016 added variadic support). CONCAT('a','b','c') → 'abc'
Support for the || operator, covering both binary and text concatenation, as well as null propagation.

Case Conversion

UPPER(string) Converts to uppercase. UPPER('sql') → 'SQL'
LOWER(string) Converts to lowercase. LOWER('SQL') → 'sql'

Trimming and Padding

TRIM([LEADING TRAILING BOTH] [characters] FROM string)
LTRIM(string) (extension) Trims leading spaces. LTRIM(' hi') → 'hi'
RTRIM(string) (extension) Trims trailing spaces. RTRIM('hi ') → 'hi'
LPAD(string, length [, fill]) (SQL:2008 optional) Pads string on the left. LPAD('42', 5, '0') → '00042'
RPAD(string, length [, fill]) (SQL:2008 optional) Pads string on the right. RPAD('42', 5, '0') → '42000'

Searching and Replacing

OVERLAY(string PLACING replacement FROM start [FOR length]) Replaces part of string starting at start with replacement. OVERLAY('abcdef' PLACING 'xyz' FROM 3 FOR 2) → 'abxyze f'
REPLACE(string, search, replace) (SQL:2008) Replaces all occurrences of search with replace. REPLACE('banana', 'na', 'xy') → 'baxyxy'

Collation and Comparison

COLLATE(string, collation_name) Applies a specific collation to a string. 'abc' COLLATE "sv_SE"
SIMILAR TO e.g. 'cat' SIMILAR TO '(cat|dog)' → TRUE 'cab' SIMILAR TO 'c(a|o)b' → TRUE 'cab' SIMILAR TO 'c(a|e)b' → FALSE 'abc' SIMILAR TO 'a%' → TRUE
REGEXP_LIKE (pattern matching operators) e.g: REGEXP_LIKE('abc123', '^[a-z]+[0-9]+$') → TRUE REGEXP_LIKE('AbC', 'abc', 'i') → TRUE -- 'i' = case-insensitive REGEXP_LIKE('cat', 'dog|cat') → TRUE

Unicode and Codepoints

CHAR(code) Returns the character corresponding to a code point. CHAR(65) → 'A'
UNICODE(string) Returns Unicode code point of first character. UNICODE('A') → 65

SQL:2016–2023 Additions

NORMALIZE(string [USING form]) Normalizes Unicode text (SQL:2016). NORMALIZE('é') → 'é'
STRING_AGG(expression, separator) Aggregates values into a single string with a separator. STRING_AGG(name, ', ') → 'Alice, Bob, Carol'
JSON_VALUE, JSON_QUERY, JSON_OBJECT, JSON_ARRAY JSON construction/extraction—technically not core string functions but string-returning functions standardized in SQL:2016–2023.

Non-standard extensions

ASCII(string) Returns the ASCII code of the first character. ASCII('A') → 65
LOCATE(substring, string[, start]) Locates substring in string with optional start position. LOCATE('c', 'abcabc', 3) → 3
REPEAT(string, count) Repeats the string N times. REPEAT('a', 4) → 'aaaa'
SPACE(count) Creates a string of N space characters. SPACE(5) → ' '
INSERT(string, start, length, replacement) Inserts replacement at position after removing length characters. INSERT('abcdef', 3, 2, 'XYZ') → 'abXYZef'
SOUNDEX(string) Computes the Soundex phonetic code. SOUNDEX('Robert') → 'R163'
DIFFERENCE(string1, string2) Calculates similarity based on Soundex codes (0-4). DIFFERENCE('Smith', 'Smyth') → 4

Build and test

mvn verify

To run without spotless, add -Dspotless.check.skip=true
To skip unit tests, add -DskipTests
To skip integration tests add -DskipITs=true

This project uses Checkstyle, PMD, and Spotless. The checks run automatically as part of the Maven build. Use the spotless:apply goal before committing if you need to fix formatting issues.

Release notes

See release.md for the full version history and work in progress.

Roadmap: Might be implemented in the future

Non standard extensions

Allow omission of from clause in some cases
- Several popular databases allow you to omit the FROM clause when you are only selecting literal values, performing arithmetic, or evaluating scalar functions (functions that return a single value). In these cases, the query returns a single row e.g: SELECT CURRENT_DATE as cur_date.
Support for variable assignment and use within SQL scripts.
- @variable_name syntax to define a variable that exists for the duration of the connection
  - Example (direct assignment, connection scope): declare @myVar INT = 10; SELECT * FROM myTable WHERE myColumn > @myVar; SELECT * FROM anotherTable LIMIT @myVar;
PIVOT and UNPIVOT operators.
TABLESAMPLE clause for sampling rows from a table.
Support for modular encryption via a keystore and/or a Key Management Service

Out of scope (will not be supported, at least not in the foreseeable future)

Data modification statements (INSERT, UPDATE, DELETE, MERGE)
Transaction control (COMMIT, ROLLBACK, SAVEPOINT)
Data definition statements (CREATE, ALTER, DROP, TRUNCATE)
User management and security (GRANT, REVOKE, CREATE USER, etc.)
Stored procedures and functions (CREATE PROCEDURE, CREATE FUNCTION)
Triggers (CREATE TRIGGER, DROP TRIGGER)
Advanced indexing and optimization hints
Full-text search capabilities
TEMPORARY TABLES, you need to use CTE's or value tables instead.

Name		Name	Last commit message	Last commit date
Latest commit History 834 Commits
.github		.github
reports		reports
requirements		requirements
src		src
tools		tools
.Rhistory		.Rhistory
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
checkstyle.xml		checkstyle.xml
count-loc.sh		count-loc.sh
jparq.sh		jparq.sh
pmd-ruleset.xml		pmd-ruleset.xml
pom.xml		pom.xml
release.md		release.md
release.sh		release.sh
relocate.pom		relocate.pom
spotless-formatting.xml		spotless-formatting.xml
usageExample.groovy		usageExample.groovy
version-plugin-rules.xml		version-plugin-rules.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

JParq

Requirements

Installation

Usage

Connection options

Fat Jar

JParq CLI

JDBC support

Non standard JDBC extensions

SQL Support

String functions support details

Character Length and Position

Substrings and Extraction

Concatenation

Case Conversion

Trimming and Padding

Searching and Replacing

Collation and Comparison

Unicode and Codepoints

SQL:2016–2023 Additions

Non-standard extensions

Build and test

Release notes

Roadmap: Might be implemented in the future

Non standard extensions

Out of scope (will not be supported, at least not in the foreseeable future)

About

Uh oh!

Releases 10

Packages

Contributors 3

Uh oh!

Languages

License

Alipsa/JParq

Folders and files

Latest commit

History

Repository files navigation

JParq

Requirements

Installation

Usage

Connection options

Fat Jar

JParq CLI

JDBC support

Non standard JDBC extensions

SQL Support

String functions support details

Character Length and Position

Substrings and Extraction

Concatenation

Case Conversion

Trimming and Padding

Searching and Replacing

Collation and Comparison

Unicode and Codepoints

SQL:2016–2023 Additions

Non-standard extensions

Build and test

Release notes

Roadmap: Might be implemented in the future

Non standard extensions

Out of scope (will not be supported, at least not in the foreseeable future)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 3

Uh oh!

Languages

Packages