-
Notifications
You must be signed in to change notification settings - Fork 43
Serialization Design
Ilya Sher edited this page May 31, 2022
·
8 revisions
- Forward and backward compatibility
-
Easily processable (as much as practical) by existing text processing tools-> Tools that would convert to and from the serialized format - Support graph of objects
- Support transient fields/values
- Support streaming read/write
- Support enum (when added to the language)
All values except for strings are little endian.
Overall layout:
-
The string
NGS-SERIALIZED--
. (two dots are padding to 16 bytes) -
Format version - 2 bytes int with value 1
-
Any amount of type-meta-length-value
- Format
- Type - 2 bytes int
- Meta - 2 bytes int
- Bit 0 - action for a filter program that doesn't recognize the type (only if bit 1 is 0)
- 0 - keep
- 1 - remove
- Bit 1 - it's an error if the reader doesn't recognise the type
- 0 - not an error
- 1 - an error
- Bits 2 till 15 - reserved, must be set to 0
- Bit 0 - action for a filter program that doesn't recognize the type (only if bit 1 is 0)
- Length - 4 bytes, can be zero
- Types (type & meta)
- 1 & 2 - end marker
- 2 & 0 - type definition chunk
- 3 & 1 - cryptographic algorithm and parameters (TBD, at beginning of the stream)
- 4 & 1 - cryptographic signature (TBD, at end of stream)
- 16 & 2 - object start
- 17 & 2 - object end
- 3 till 255 - reserved types
- 256 till 32767 - predefined types
- Format
-
End marker: type=1, meta=2, len=0
type definition chunk (data section)
- type - 2 bytes int - the new integer being assigned to a type
- length - 4 bytes
- data - JSON array, specifying
- For compatibility with external tools
- Line-based format
- JSON will be used
- JSON parts will be easily extractable
- JSON parts will convey information that is of interest to external tools: the main data.
- Easily extractable JSON parts will not convey information that is of interest mostly to NGS, such as types and view options.
- For forward and backward compatibility
- Each metadata item (key-value pair) will be classified into one of the following categories, specifying behaviour of an unserializer that doesn't know how to handle the item.
- "error" - unserializer must know how to process the given item, otherwise it's an error.
- "keep" - keep the item for further processing down the line
- "remove" - remove the item
- Each metadata item (key-value pair) will be classified into one of the following categories, specifying behaviour of an unserializer that doesn't know how to handle the item.
- Support type's versions like Java's
serialVersionUID
? - Consider a place for external tools to place their data which will be ignored by and preserved by NGS
- Cryptographically sign locally generated serialized data so it could be more "trusted"?
- If yes, JWT is probably the best signature format
- Allow several signatures? Should allow easy certificates rotation, etc.
- Network friendliness (frames)
-
echo()
on non-tty will output serialized data? - Track/keep all commands that were involved in creation of the data?
Note: this section is not related to the sections above and is motivated by the urgent need for serialization for communicating with UI and is probably not that well thought through.
{
"ngs-serialization": "0.1",
"data": ...
}
{
"type": "UNIQUE-TYPE",
"id": "UNIQUE-ID",
"fields": {...},
"items": [...],
"value": ...
}
-
type
-ngs:type:ngs-lang.org/types/xxx
(the resource at the URL does not have to exist) -
id
-ngs:id:1:random-id
(version1
of ids is purely random globally unique id) - Only one of
fields
,items
, orvalue
can be present. For some types, both can be omitted.-
fields
is used for map/hash-like objects -
items
is used for list/array-like objects -
value
is used for scalars such as numbers, booleans, strings
-
NGS official website is at https://ngs-lang.org/