Skip to content

Commit e7b28dd

Browse files
alambFokko
andauthored
Add primitive_null metadata for example Variant values (#84)
* Add primitive_null metadata * Update variant/README.md Co-authored-by: Fokko Driesprong <fokko@apache.org> * Update variant/README.md --------- Co-authored-by: Fokko Driesprong <fokko@apache.org>
1 parent 107b366 commit e7b28dd

File tree

2 files changed

+16
-2
lines changed

2 files changed

+16
-2
lines changed

variant/README.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,22 @@ Each example consists of 2 files:
4545

4646
## Regenerating these files
4747

48-
The files were generated by running the [`regen.py`](regen.py) script that uses Apache Spark to
49-
generate the files.
48+
The files in this directory were initially generated by running the [`regen.py`](regen.py)
49+
script which used Apache Spark to generate the files. The files have been subsequently modified
50+
when necessary to ensure that they conform to the Parquet spec.
51+
52+
### Modification 1: Created metadata for `primitive_null` as a single byte (`0x01`)
53+
54+
Per <https://github.com/apache/parquet-testing/issues/81>, Spark did not generate
55+
any metadata for `null` and left `primitive_null.metadata` empty.
56+
The metadata for `primitive_null` should be the same 3 bytes as other primitive types
57+
* header = `0x01`
58+
* dictionary_size = `0x00`
59+
* `dictionary_size + 1 = 1` byte values: `0x00`
60+
61+
```shell
62+
cp primitive_int8.metadata primitive_null.metadata
63+
```
5064

5165
[Variant]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md
5266
[primitive types listed in the spec]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-data-for-primitive-type-basic_type0

variant/primitive_null.metadata

3 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)