A fast, reliable CLI tool for bidirectional conversion between CSV and Apache Parquet formats. Built in Go with Cobra CLI framework, it's designed for data workflows that need efficient, schema-aware columnar storage with support for both directions of conversion.
- π Bidirectional conversion: CSV β Parquet
- β‘ High performance: Batch processing with configurable flush intervals
- ποΈ Compression support: Multiple compression algorithms
- π― Schema-aware: Automatic schema detection and type inference
- π Verbose statistics: Runtime performance and memory usage reporting
- π οΈ Flexible CLI: Powered by Cobra with intuitive subcommands
- Cobra CLI Framework:
github.com/spf13/cobra v1.10.1 - Parquet Processing:
github.com/xitongsys/parquet-go v1.6.2 - High-Performance JSON:
github.com/bytedance/sonic v1.14.1 - Error Handling:
github.com/pkg/errors v0.9.1 - String Utilities:
github.com/iancoleman/strcase v0.3.0 - Dynamic Structs:
github.com/ompluscator/dynamic-struct v1.4.0
git clone https://github.com/dbunt1tled/parquet2csv.git
cd parquet2csv
go build -o csv2parquet main.gogo install github.com/dbunt1tled/parquet2csv@latestcsv2parquet # Root command
βββ parquet <input> <output> # Convert CSV to Parquet
βββ csv <input> <output> # Convert Parquet to CSV
| Flag | Short | Type | Default | Description |
|---|---|---|---|---|
--compression |
-c |
int | 0 | Compression type (0=UNCOMPRESSED, 1=SNAPPY, 2=GZIP, 3=LZO) |
--delimiter |
-d |
string | "," | Field delimiter for CSV files |
--flush |
-f |
int | 10000 | Number of rows to process before flushing to disk |
--verbose |
-v |
bool | false | Show detailed statistics and performance metrics |
--help |
-h |
bool | false | Display help information |
./csv2parquet --help # General help
./csv2parquet parquet --help # CSV to Parquet help
./csv2parquet csv --help # Parquet to CSV help# CSV to Parquet with default settings
./csv2parquet parquet data.csv
# Parquet to CSV with custom delimiter
./csv2parquet csv data.parquet --delimiter ";"
# CSV to Parquet with compression and verbose output
./csv2parquet parquet large_dataset.csv --compression 1 --verbose# Process large files with custom flush interval
./csv2parquet parquet big_file.csv big_file.parquet \
--flush 50000 \
--compression 2 \
--verbose
# Convert with pipe delimiter and detailed stats
./csv2parquet csv analytics.parquet analytics.csv \
--delimiter "|" \
--flush 1000 \
--verbose- Batch Processing: Configurable row batch sizes for optimal memory usage
- Compression: Support for multiple compression algorithms (SNAPPY, GZIP, LZO)
- Memory Management: Efficient memory pooling and garbage collection
- Progress Tracking: Runtime statistics including processing time and memory usage
- Schema Optimization: Automatic type inference and schema generation
- Custom delimiters (comma, semicolon, pipe, tab, etc.)
- Header row detection and processing
- Automatic type inference
- Large file handling with streaming
- Columnar storage optimization
- Schema preservation
- Multiple compression algorithms
- Efficient read/write operations
- Row group size optimization (128MB default)
βββ cmd/ # Cobra CLI commands
β βββ root.go # Root command definition
β βββ csv2parquet.go # CSV to Parquet conversion
β βββ parquet2csv.go # Parquet to CSV conversion
βββ internal/
β βββ file/ # File operations and I/O
β βββ helper/ # Utility functions
β βββ schema/ # Schema management
βββ main.go # Application entry point
go test ./... # Run all tests
go test -v ./... # Verbose test output
go test -bench . ./... # Run benchmarksgo build -o csv2parquet main.go # Build binary
make build # Using Makefile (if available)- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with xitongsys/parquet-go for Parquet file handling
- CLI powered by spf13/cobra
- High-performance JSON processing with bytedance/sonic