ParquetSQL

A high-performance Qt5-based application for browsing and querying Parquet and CSV files using DuckDB as the SQL engine.

Features

File Browser: Navigate and select Parquet (.parquet) and CSV (.csv, .tsv) files
SQL Editor: Syntax-highlighted SQL editor with auto-completion
Fast Queries: Powered by DuckDB for optimized analytical queries
Pagination: Handle large result sets with built-in pagination (1000 rows per page)
Threading: Non-blocking SQL execution using background threads
Dark Theme: Modern dark UI theme optimized for data analysis
Performance: Optimized for large datasets with memory-efficient operations

Requirements

Qt5 (Core, Widgets, Sql)
DuckDB library
CMake 3.16+
C++17 compiler
pthread support

Building

Install Dependencies

Ubuntu/Debian:

sudo apt-get install qt5-default libqt5widgets5 libqt5sql5 libduckdb-dev cmake build-essential

macOS (with Homebrew):

brew install qt5 duckdb cmake

Windows:

Install Qt5 from https://www.qt.io/download
Download DuckDB from https://duckdb.org/docs/installation

Build Instructions

mkdir build
cd build
cmake ..
make -j$(nproc)

Usage

Launch the application:
```
./ParquetSQL
```
Load a file:
- Use the file browser on the left to navigate to your data files
- Select a .parquet, .csv, or .tsv file
- Click "Load Selected File"
Query your data:
- The SQL editor will populate with a default SELECT query
- Modify the query as needed
- Click "Execute Query" to run
- Results appear in the table below with pagination controls
Navigate results:
- Use pagination buttons (First, Previous, Next, Last)
- View up to 1000 rows per page for optimal performance

Performance Features

Memory Management: 2GB memory limit with efficient data streaming
Multi-threading: 4-thread execution for parallel query processing
Vectorized Operations: DuckDB's columnar processing for fast analytics
Lazy Loading: Results loaded on-demand with pagination
Query Optimization: Automatic query planning and optimization

SQL Examples

Basic queries:

SELECT * FROM your_table LIMIT 100;
SELECT column1, COUNT(*) FROM your_table GROUP BY column1;
SELECT * FROM your_table WHERE column1 > 1000 ORDER BY column2 DESC;

Analytical queries:

SELECT 
    column1,
    AVG(column2) as avg_value,
    MIN(column2) as min_value,
    MAX(column2) as max_value
FROM your_table 
GROUP BY column1 
HAVING COUNT(*) > 10
ORDER BY avg_value DESC;

Keyboard Shortcuts

Ctrl+O: Open file dialog
Ctrl+Space: SQL auto-completion
Ctrl+Enter: Execute query
Ctrl+Q: Quit application

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
include		include
src		src
third_party/duckdb		third_party/duckdb
ui		ui
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ParquetSQL

Features

Requirements

Building

Install Dependencies

Build Instructions

Usage

Performance Features

SQL Examples

Keyboard Shortcuts

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

bishoyh/ParquetSQL

Folders and files

Latest commit

History

Repository files navigation

ParquetSQL

Features

Requirements

Building

Install Dependencies

Build Instructions

Usage

Performance Features

SQL Examples

Keyboard Shortcuts

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages