A protoc plugin that generates parquet schemas from protobuf files. See examples.
go get github.com/simo7/protoc-gen-parquetAlternatively clone the repo and build the plugin:
go build -o bin/protoc-gen-parquet .
export PATH=$PWD/bin:$PATHGenerate parquet schema:
protoc \
--parquet_out=no_unsigned=true,go_file=true:./ \
--parquet_opt=paths=source_relative \
--go_opt=paths=source_relative \
--go_out=./ \
examples/person.protoRe-generate parquet_options stubs:
protoc \
--go_opt=paths=source_relative \
--go_out=./ \
parquet_options/parquet_options.protono_unsigned (bool): Avoid unsigned integers and use the corresponding intenger instead.
timestamp_int96 (bool): Fields extended as timestamps (see timestamp_type in parquet options
can be defined as INT96 instead of INT64 to ensure compatibility with all Hive and Presto versions.
go_file (bool): An additional .go file containing the schema as a string constant will be generated. It makes it easier to import
a versioned schema into a Go application.
The following annotations are not implemented.
(DATE)(UUID)(MAP),(MAP_KEY_VALUE)(STRING)(all UTF8 by default)
Reference: https://developers.google.com/protocol-buffers/docs/reference/google.protobuf.
The following types are supported:
-
google.protobuf.Timestamp
It's tested against the new protobuf API google.golang.org/protobuf or version 1.4.0 of the legacy API github.com/golang/protobuf.