JParq is a JDBC driver for Apache Parquet files. It treats a directory as a database and every .parquet file in that
directory as a table. The table name is the filename without the .parquet extension.
Sub directories are treated as schemas.
For example, given the following directory structure:
/data
├── employees.parquet
├── departments.parquet
└── sales
├── orders.parquet
└── customers.parquet
The tables employees and departments would be in the default PUBLIC schema, while orders and customers would be in the sales schema.
JParq uses Apache Arrow and Apache Parquet for efficient columnar reads and jsqlparser to parse SQL statements. It aims to be 100% compliant with the read part of the SQL standard. There are a few common extensions supported as well e.g. LIMIT, ASCII, REPEAT, DIFFERENCE.
Note: When common implementations differ from the SQL standard, we stick to the standard. An example of this is convert which in many databases is used for data type casting, but in the SQL standard is used for character set conversion. If you want to convert data types, use CAST instead.
Note The majority of the code was created in collaboration with (vibe coded with) ChatGPT Codex with Copilot and myself (Per Nyfelt) as code reviewers.
- Java 21 (higher is not supported due to Hadoop restrictions)
- Parquet files stored in a directory accessible from the JVM running the driver
Add the dependency to your build. Replace x.y.z with the latest version number from the Maven Central badge above.
<dependency>
<groupId>se.alipsa</groupId>
<artifactId>jparq</artifactId>
<version>x.y.z</version>
</dependency>The driver registers itself when the se.alipsa.jparq.JParqDriver class is loaded, so simply placing the JAR on the
classpath is normally enough. If your runtime requires explicit registration you can call
Class.forName("se.alipsa.jparq.JParqDriver") before obtaining a connection.
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import se.alipsa.jparq.JParqSql;
public class JParqExample {
// Standard JDBC
void selectMtcarsLimit() throws SQLException {
String jdbcUrl = "jdbc:jparq:/home/user/data";
try (Connection conn = DriverManager.getConnection(jdbcUrl);
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * FROM mtcars LIMIT 5")) {
while (rs.next()) {
System.out.println(rs.getString(1));
}
}
}
// Using the JParqSql helper
void selectMtcarsToyotas() {
String jdbcUrl = "jdbc:jparq:/home/user/data";
JParqSql jparqSql = new JParqSql(jdbcUrl);
jparqSql.query("SELECT model, cyl, mpg FROM mtcars WHERE model LIKE 'Toyota%'", rs -> {
try {
while (rs.next()) {
System.out.println(rs.getString(1) + ", " + rs.getInt(2) + ", " + rs.getDouble(3));
}
} catch (SQLException e) {
System.out.println("Query failed: " + e);
}
});
}
}caseSensitive— defaults tofalse. Set totrueto make table-name resolution case-sensitive.jdbc:jparq:/home/user/data?caseSensitive=true- Paths may be specified directly or using the
file://prefix.
A "fat" jar, which includes all dependencies, is also created. This is useful when you want to add the driver to the classpath of a tool. The fat jar is named jparq-x.y.z-fat.jar i.e. fat is the maven classifier. It is available in maven central and in the releases section.
An interactive command line interface is available for querying Parquet files using JParq. The CLI supports command
history and basic line editing features. It also includes several commands starting with a slash (/) for controlling the CLI itself.
To start the CLI, run the following command, replacing path/to/jparq.jar with the actual path to the JParq fat jar and the {path to jparq dir} with the path the directory where your parquet files reside.
java -jar path/to/jparq.jar {path to jparq dir}To download the fat jar, you can either download it from releases or use the following command:
LATEST_VERSION=$(curl -s https://repo1.maven.org/maven2/se/alipsa/jparq/maven-metadata.xml | sed -nE 's/.*<release>(.*)<\/release>.*/\1/p') && \
curl -o jparq.jar https://repo1.maven.org/maven2/se/alipsa/jparq/$LATEST_VERSION/jparq-$LATEST_VERSION-fat.jarFor simpler access you can create a script to it (e.g called jparq) that you put in your ~/bin dir
#!/usr/bin/env bash
java -jar /path/to/jparq.jar $@Once the CLI is started, you can use the following commands:
- /connect - Connect to a Parquet database directory (dir with parquet files)
- /close - Close the current connection
- /list - List available tables
- /describe - Show column definitions for a table
- /info - Show information about the current connection
- /help - Display this help text
- /exit - Exit the CLI
- Driver scope — Read-only. Only
SELECTstatements are supported;executeUpdate/DML, generated keys, and updatable result sets are not implemented. Result sets areTYPE_FORWARD_ONLY,CONCUR_READ_ONLYwithCLOSE_CURSORS_AT_COMMITholdability. - Prepared statements — SQL is planned when the
PreparedStatementis created. Positional?parameters are bound via the standardsetXxxmethods and rendered as safe SQL literals (named parameters are also supported). Parameter metadata is available but reports genericVARCHAR/INcharacteristics because actual typing occurs at execution. Only forward-only/read-only statements withCLOSE_CURSORS_AT_COMMITholdability are accepted; overloads requesting other result-set types or generated keys/column-based key selection throwSQLFeatureNotSupportedException. Prepared-statement batching without a SQL argument is supported for convenience and reportsSUCCESS_NO_INFOcounts per entry.Statementbatching andaddBatch(String)remain unsupported. - Transactions and sessions — The connection is always read-only with
TRANSACTION_NONE;commit,rollback, and savepoints are no-ops.getAutoCommit()always returnstrue, and changing isolation level, holdability, or poolability is not supported. - Metadata —
DatabaseMetaDataexposes tables and columns derived from the base directory and subdirectories (schemas); the catalog is the base directory name. Callable statements, stored procedures, and custom type mappings are not supported. - Other JDBC surfaces —
CallableStatement,createClob/createBlob/createArrayOf, network timeout, and client-info features are not implemented. Connection validation (isValid) is not supported. - Named parameters i.e. :paramName syntax for prepared statements
SELECTwith support for*to select all columns- Qualified wildcard projections (table.*).
- alias support for columns and tables
- Support computed expressions with aliases (e.g. SELECT mpg*2 AS double_mpg)
- Quoted identifiers using "double quotes" as per the SQL standard.
CASEsupport
SELECTstatements withWHEREsupporting:BETWEEN,IN,LIKEoperatorsAND,OR,NOTlogical operators- Comparison operators:
=,!=,<,>,<=,>= - Null checks:
IS NULL,IS NOT NULL
ORDER BYclause with multiple columns andASC/DESCoptions- support for optional explicit NULLS FIRST and NULLS LAST
OFFSETandLIMITsupport- Also support the PostgreSQL shorthand
LIMIT n OFFSET msyntax
- Also support the PostgreSQL shorthand
- Standard row-limiting syntax (FETCH FIRST / OFFSET … FETCH)
DISTINCTsupport- Functions support
- Date functions
- Aggregate functions (count, sum, avg, max, min)
- CAST support
- CONVERT support (for both character set conversion with USING and data type conversion without USING, as per SQL standard and common extensions)
- coalesce (The COALESCE() function returns the first non-null value in a list.)
- String functions (all SQL standard string function supported)
- also support || operator for string concatenations
- Numeric functions (abs, ceil/ceiling, floor, round, sqrt, truncate/trunc, mod, power/pow, exp, log, log10, rand/random, sign, sin, cos, tan, cot, asin, acos, atan, atan2, degrees, radians, pi)
- ARRAY constructor function
- comments (line --) and block (/* */)
- Subquery support
- In the SELECT Clause : Used to return a single value or a set of values. e.g. SELECT first_name, ( SELECT department_name FROM departments WHERE departments.department_id = employees.department_id ) AS department_name FROM employees;
- In the FROM Clause : Treated as a derived table or inline view. E.g: SELECT * FROM (SELECT first_name, salary FROM employees WHERE salary > 5000) AS "high_salaried"
- In the WHERE Clause : Used to filter the results. e.g SELECT first_name FROM employees WHERE department_id IN (SELECT department_id FROM departments WHERE location_id>1500);
- In the HAVING Clause : Used to filter groups. E.g: SELECT department_id, AVG(salary) FROM employees GROUP BY department_id HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);
GROUP BYsupportCOUNT(*)aggregationHAVINGclause with conditions- support aggregation functions and case statements in the
GROUP BYandSELECTclause
EXISTSsupportANYandALLsupport- Join support: INNER, LEFT, RIGHT, FULL, CROSS, and Self Join, join ... using syntax
UNIONandUNION ALLsupport- Complete set-operation coverage.
EXCEPTINTERSECTINTERSECT ALLEXCEPT ALL- nesting of set operations
- CTE (Common Table Expressions) support
- Windowing
- Ranking functions
ROW_NUMBER,RANK,DENSE_RANK,PERCENT_RANK,CUME_DIST,NTILE
- Aggregate window functions
SUM,AVG,MIN,MAX,COUNT- Analytic Value/Navigation Functions
LAG,LEAD,FIRST_VALUE,LAST_VALUE,NTH_VALUE
- Ranking functions
- Advanced GROUP BY constructs i.e:
GROUPING SETSROLLUPCUBE
- Derived Tables:
UNNESTwith or without a table wrapper,LATERALderived tables,VALUEStable constructors INFORMATION_SCHEMA.COLUMNSandINFORMATION_SCHEMA.TABLES- CHAR_LENGTH(string) or CHARACTER_LENGTH(string) Returns number of characters in a string. CHAR_LENGTH('hello') → 5
- OCTET_LENGTH(string) Returns number of bytes in the string (depends on encoding). OCTET_LENGTH('Å') → 2 (in UTF-8)
- POSITION(substring IN string) Finds the position (1-based) of substring in string. POSITION('l' IN 'hello') → 3
- SUBSTRING(string FROM start [FOR length]) Extracts substring starting at start, optionally limited by length. SUBSTRING('abcdef' FROM 2 FOR 3) → 'bcd'
- LEFT(string, count) (optional extension) Leftmost characters. LEFT('abcdef', 3) → 'abc'
- RIGHT(string, count) (optional extension) Rightmost characters. RIGHT('abcdef', 2) → 'ef'
- CONCAT(string1, string2, …) Concatenates two or more strings (SQL:2016 added variadic support). CONCAT('a','b','c') → 'abc'
- Support for the || operator, covering both binary and text concatenation, as well as null propagation.
- UPPER(string) Converts to uppercase. UPPER('sql') → 'SQL'
- LOWER(string) Converts to lowercase. LOWER('SQL') → 'sql'
- TRIM([LEADING TRAILING BOTH] [characters] FROM string)
- LTRIM(string) (extension) Trims leading spaces. LTRIM(' hi') → 'hi'
- RTRIM(string) (extension) Trims trailing spaces. RTRIM('hi ') → 'hi'
- LPAD(string, length [, fill]) (SQL:2008 optional) Pads string on the left. LPAD('42', 5, '0') → '00042'
- RPAD(string, length [, fill]) (SQL:2008 optional) Pads string on the right. RPAD('42', 5, '0') → '42000'
- OVERLAY(string PLACING replacement FROM start [FOR length]) Replaces part of string starting at start with replacement. OVERLAY('abcdef' PLACING 'xyz' FROM 3 FOR 2) → 'abxyze f'
- REPLACE(string, search, replace) (SQL:2008) Replaces all occurrences of search with replace. REPLACE('banana', 'na', 'xy') → 'baxyxy'
- COLLATE(string, collation_name) Applies a specific collation to a string. 'abc' COLLATE "sv_SE"
- SIMILAR TO e.g. 'cat' SIMILAR TO '(cat|dog)' → TRUE 'cab' SIMILAR TO 'c(a|o)b' → TRUE 'cab' SIMILAR TO 'c(a|e)b' → FALSE 'abc' SIMILAR TO 'a%' → TRUE
- REGEXP_LIKE (pattern matching operators) e.g: REGEXP_LIKE('abc123', '^[a-z]+[0-9]+$') → TRUE REGEXP_LIKE('AbC', 'abc', 'i') → TRUE -- 'i' = case-insensitive REGEXP_LIKE('cat', 'dog|cat') → TRUE
- CHAR(code) Returns the character corresponding to a code point. CHAR(65) → 'A'
- UNICODE(string) Returns Unicode code point of first character. UNICODE('A') → 65
- NORMALIZE(string [USING form]) Normalizes Unicode text (SQL:2016). NORMALIZE('é') → 'é'
- STRING_AGG(expression, separator) Aggregates values into a single string with a separator. STRING_AGG(name, ', ') → 'Alice, Bob, Carol'
- JSON_VALUE, JSON_QUERY, JSON_OBJECT, JSON_ARRAY JSON construction/extraction—technically not core string functions but string-returning functions standardized in SQL:2016–2023.
- ASCII(string) Returns the ASCII code of the first character. ASCII('A') → 65
- LOCATE(substring, string[, start]) Locates substring in string with optional start position. LOCATE('c', 'abcabc', 3) → 3
- REPEAT(string, count) Repeats the string N times. REPEAT('a', 4) → 'aaaa'
- SPACE(count) Creates a string of N space characters. SPACE(5) → ' '
- INSERT(string, start, length, replacement) Inserts replacement at position after removing length characters. INSERT('abcdef', 3, 2, 'XYZ') → 'abXYZef'
- SOUNDEX(string) Computes the Soundex phonetic code. SOUNDEX('Robert') → 'R163'
- DIFFERENCE(string1, string2) Calculates similarity based on Soundex codes (0-4). DIFFERENCE('Smith', 'Smyth') → 4
- To run without spotless, add -Dspotless.check.skip=true
- To skip unit tests, add -DskipTests
- To skip integration tests add -DskipITs=true
- Allow omission of from clause in some cases
- Several popular databases allow you to omit the FROM clause when you are only selecting literal values, performing arithmetic, or evaluating scalar functions (functions that return a single value). In these cases, the query returns a single row e.g:
SELECT CURRENT_DATE as cur_date.
- Several popular databases allow you to omit the FROM clause when you are only selecting literal values, performing arithmetic, or evaluating scalar functions (functions that return a single value). In these cases, the query returns a single row e.g:
- Support for variable assignment and use within SQL scripts.
- @variable_name syntax to define a variable that exists for the duration of the connection
- Example (direct assignment, connection scope): declare @myVar INT = 10; SELECT * FROM myTable WHERE myColumn > @myVar; SELECT * FROM anotherTable LIMIT @myVar;
- @variable_name syntax to define a variable that exists for the duration of the connection
- PIVOT and UNPIVOT operators.
- TABLESAMPLE clause for sampling rows from a table.
- Support for modular encryption via a keystore and/or a Key Management Service
- Data modification statements (INSERT, UPDATE, DELETE, MERGE)
- Transaction control (COMMIT, ROLLBACK, SAVEPOINT)
- Data definition statements (CREATE, ALTER, DROP, TRUNCATE)
- User management and security (GRANT, REVOKE, CREATE USER, etc.)
- Stored procedures and functions (CREATE PROCEDURE, CREATE FUNCTION)
- Triggers (CREATE TRIGGER, DROP TRIGGER)
- Advanced indexing and optimization hints
- Full-text search capabilities
- TEMPORARY TABLES, you need to use CTE's or value tables instead.
Anything else will be treated as a SQL query and executed against the connected dir.
Example session:
$ java -jar ~/lib/jparq.jar target/test-classes/acme
JParq CLI version 1.2.0
Connected to /home/user1/project/JParq/target/test-classes/acme
jparq(acme)> select * from employees e left join salary s on e.id = salary.employee
e__id | first_name | last_name | s__id | employee | salary | change_date
------+------------+------------+-------+----------+----------+------------
1 | Per | Andersson | 1 | 1 | 150000.0 | 2020-03-01
1 | Per | Andersson | 6 | 1 | 160000.0 | 2021-01-01
1 | Per | Andersson | 8 | 1 | 165000.0 | 2021-08-01
2 | Karin | Pettersson | 2 | 2 | 180000.0 | 2020-03-01
3 | Tage | Lundström | 3 | 3 | 130000.0 | 2021-01-01
3 | Tage | Lundström | 7 | 3 | 140000.0 | 2021-12-01
4 | Arne | Larsson | 4 | 4 | 195000.0 | 2020-10-01
5 | Sixten | Svensson | 5 | 5 | 230000.0 | 2020-12-15
jparq(acme)> /exit
$
JParq aims to be fully compliant with the read part of the SQL standard (SQL:2016 and earlier). As of version 1.1.1, the support for the standard is complete (as far as I know). The following is a detailed list of supported features and functions.
The following SQL statements are supported:
mvn verifyThis project uses Checkstyle, PMD, and Spotless. The checks run automatically as part of the Maven build. Use the
spotless:apply goal before committing if you need to fix formatting issues.
See release.md for the full version history and work in progress.