Skip to content

Commit add69dd

Browse files
committed
feat: implement new binary format, BITE
1 parent 718542e commit add69dd

File tree

14 files changed

+1937
-6
lines changed

14 files changed

+1937
-6
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@
3232
- Added `multicall` support for the CLI ([#1141](https://github.com/0xMiden/miden-vm/pull/2081))
3333
- Made `miden-prover`'s metal prover async-compatible. ([#2133](https://github.com/0xMiden/miden-vm/pull/2133)).
3434
- Abstract away the fast processor's operation execution into a new `Processor` trait ([#2141](https://github.com/0xMiden/miden-vm/pull/2141))
35+
- Added ability to serialize packages using BITE ([#2071](https://github.com/0xMiden/miden-vm/pull/2071))
36+
- Implemented new binary serialization format, BITE ([#2071](https://github.com/0xMiden/miden-vm/pull/2071))
3537

3638
## 0.17.1 (2025-08-29)
3739

Cargo.lock

Lines changed: 21 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,9 +59,8 @@ miden-formatting = { version = "0.1", default-features = false }
5959
midenc-hir-type = { version = "0.1", default-features = false }
6060

6161
# Third-party crates
62-
insta = { version = "1.43", default-features = false, features = [
63-
"colors",
64-
] }
62+
env_logger = "0.11"
63+
insta = { version = "1.43", default-features = false, features = ["colors"] }
6564
log = { version = "0.4", default-features = false }
6665
paste = { version = "1.0", default-features = false }
6766
proptest = { version = "1.7", default-features = false, features = [

assembly-syntax/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ logging = ["dep:env_logger"]
3030

3131
[dependencies]
3232
aho-corasick = { version = "1.1", default-features = false }
33-
env_logger = { version = "0.11", optional = true }
33+
env_logger = { workspace = true, optional = true }
3434
lalrpop-util = { version = "0.22", default-features = false }
3535
log.workspace = true
3636
miden-core.workspace = true

assembly/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ testing = ["logging", "miden-assembly-syntax/testing"]
2424
logging = ["dep:env_logger"]
2525

2626
[dependencies]
27-
env_logger = { version = "0.11", optional = true }
27+
env_logger = { workspace = true, optional = true }
2828
log.workspace = true
2929
miden-assembly-syntax.workspace = true
3030
miden-core.workspace = true

crates/utils/bite/Cargo.toml

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
[package]
2+
name = "miden-bite"
3+
version = "0.18.0"
4+
description = "BITE is Binary Interchange, Tiny Encoding - a compact binary file format"
5+
documentation = "https://docs.rs/miden-bite/0.16.0"
6+
readme = "README.md"
7+
categories = ["no-std"]
8+
edition.workspace = true
9+
rust-version.workspace = true
10+
license.workspace = true
11+
authors.workspace = true
12+
homepage.workspace = true
13+
repository.workspace = true
14+
exclude.workspace = true
15+
16+
[features]
17+
default = ["std"]
18+
std = ["indexmap/std", "serde/std", "thiserror/std"]
19+
20+
[dependencies]
21+
bumpalo = { version = "3.19", default-features = false }
22+
log.workspace = true
23+
indexmap = { version = "2.10", default-features = false }
24+
rustc-hash = { version = "2.1", default-features = false }
25+
serde.workspace = true
26+
smallvec.workspace = true
27+
thiserror.workspace = true
28+
29+
[dev-dependencies]
30+
env_logger.workspace = true
31+
proptest.workspace = true

crates/utils/bite/README.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# miden-bite
2+
3+
This crate implements a `serde`-compatible binary encoding which we call BITE, which stands for _Binary Interchange, Tiny Encoding_. As the name implies, it is designed for exchanging data in binary form efficiently - in particular, we aim to use this as the underlying encoding for the Miden package format, and similar use cases.
4+
5+
## Design principles
6+
7+
BITE is designed with the following properties in mind:
8+
9+
* Compact
10+
* Versioned
11+
* Validatable, i.e. the ability to validate the structure of the input without
12+
needing to deserialize it.
13+
* Use `serde`'s data model to allow for encoding arbitrary Rust data types
14+
* Enable re-use of `serde`'s `Serialize` and `Deserialize` trait impls for both human-readable and BITE-encoded formats.
15+
* Minimally self-describing to allow for supporting `serde` features which require this, e.g. conditionally-skipped fields
16+
17+
## Size reduction techniques
18+
19+
The following are the techniques we use to acheive the goal of compact encoding of arbitrary Rust data structures:
20+
21+
* _Transparent string interning_. This de-duplicates strings that are encoded when serializing a given data structure, storing only a single copy of each unique string, assigning each an integer identifier, and then storing only the identifier at each place the string is used. For structures with many copies of the same string (e.g. an AST where each node holds a reference to the file and line it corresponds to at the source level), this vastly reduces the size of the encoded binary. If all strings in the input data structure are already unique, interning does introduce some minimal overhead; but for our use cases, duplication is far more common than not, and this technique is always beneficial.
22+
* _All integers use variable-length encoding_. Statisically, most integer values are, in practice, small values. Encoding such values using the number of bits equivalent to their maximum possible value is immensely wasteful. Consider using a `u32` to represent an index into an array, where all instances of that array are going to be smaller than 256 elements - encoding this value as a `u32` is going to waste 3 bytes for every index. Instead, using a variable-length encoding, the vast majority of indices will only require a single byte, with indices in the range 129-256 requiring a second byte. These savings add up considerably considering how frequently integer values are encoded.
23+
* _Booleans are intrinsically-tagged_. In other words, the encoding of `true` and `false`, as `1` and `0` respectively, are also valid type tags for the boolean type. So storing a boolean requires only a single byte for both type tag and value, rather than a byte for each.
24+
* _`Option<T>` is intrinsically tagged_. Similar to booleans, the value for `None` is just the type tag for `None`, while `Some` is encoded as the tag for `Some` followed by the encoded value of type `T`.
25+
26+
## Structure
27+
28+
The structure of a BITE-encoded stream is described using the Kaitai Struct language in [bite.ksy].

crates/utils/bite/bite.ksy

Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
meta:
2+
id: bite
3+
title: BITE-encoded file
4+
file-extension: bite
5+
endian: le
6+
bit-endian: le
7+
seq:
8+
- id: magic
9+
contents: 'BITE\0'
10+
- id: version
11+
type: u8
12+
- id: strings
13+
type: string_table
14+
- id: payload
15+
type: payload
16+
repeat: eos
17+
enum:
18+
tag:
19+
false: 0
20+
true: 1
21+
none: 2
22+
some: 3
23+
int: 4
24+
sint: 5
25+
i8: 6
26+
u8: 7
27+
f32: 8
28+
f64: 9
29+
char: 10
30+
bytes: 11
31+
str: 12
32+
seq: 13
33+
map: 14
34+
unit_variant: 15
35+
newtype_variant: 16
36+
struct_variant: 17
37+
tuple_variant: 18
38+
types:
39+
unit:
40+
seq:
41+
- id: empty
42+
size: 0
43+
payload:
44+
doc: An encoded value tagged with its serde data type
45+
seq:
46+
- id: tag
47+
type: u8
48+
enum: tag
49+
- id: value
50+
type:
51+
switch-on: tag
52+
cases:
53+
'tag::false': unit
54+
'tag::true': unit
55+
'tag::none': unit
56+
'tag::some': payload
57+
'tag::int': varint
58+
'tag::sint': varint
59+
'tag::i8': u8
60+
'tag::u8': u8
61+
'tag::f32': f32
62+
'tag::f64': f64
63+
'tag::char': varint
64+
'tag::bytes': payload_bytes
65+
'tag::str': varint
66+
'tag::seq': payload_seq
67+
'tag::map': payload_map
68+
'tag::unit_variant': unit
69+
'tag::newtype_variant': payload
70+
'tag::struct_variant': payload_variant
71+
'tag::tuple_variant': payload_variant
72+
payload_variant:
73+
doc: A hint to the deserializer that a given enum variant is next in the stream
74+
seq:
75+
- id: variant_id
76+
type: varint
77+
payload_seq:
78+
seq:
79+
- id: num_elements
80+
type: varint
81+
- id: num_element_bytes
82+
type: varint
83+
if: num_elements.value > 0
84+
doc: The number of subsequent bytes holding the encoded elements of this sequence
85+
- id: elements
86+
type: payload
87+
if: num_elements.value > 0
88+
size: num_element_bytes.value
89+
repeat: expr
90+
repeat-expr: num_elements.value
91+
payload_map:
92+
seq:
93+
- id: num_elements
94+
type: varint
95+
- id: num_key_bytes
96+
type: varint
97+
if: num_elements.value > 0
98+
doc: The number of subsequent bytes holding the encoded keys of this map
99+
- id: keys
100+
type: payload
101+
if: num_elements.value > 0
102+
size: num_key_bytes.value
103+
repeat: expr
104+
repeat-expr: num_elements.value
105+
- id: num_value_bytes
106+
type: varint
107+
if: num_elements.value > 0
108+
doc: The number of subsequent bytes holding the encoded values of this map
109+
- id: values
110+
type: payload
111+
if: num_elements.value > 0
112+
size: num_value_bytes.value
113+
repeat: expr
114+
repeat-expr: num_elements.value
115+
payload_bytes:
116+
seq:
117+
- id: num_bytes
118+
type: varint
119+
- id: bytes
120+
size: num_bytes.value
121+
strings_table:
122+
doc: The interned strings table
123+
seq:
124+
- id: num_strings
125+
type: varint
126+
- id: string_entries
127+
type: string_entry
128+
repeat: expr
129+
repeat-expr: num_strings.value
130+
string_entry:
131+
doc: An entry in the interned string table
132+
seq:
133+
- id: string_len
134+
type: varint
135+
- id: string_data
136+
type: str
137+
size: string_len.value
138+
encoding: UTF-8
139+
varint:
140+
doc: A variable-length encoded integer value
141+
seq:
142+
- id: varint_groups
143+
type: varint_group
144+
repeat: until
145+
repeat-until: not _.has_next
146+
instances:
147+
last:
148+
value: varint_groups.size - 1
149+
value:
150+
value: >-
151+
groups[last].value
152+
+ (last >= 1 ? (varint_groups[last - 1].value << 7) : 0)
153+
+ (last >= 2 ? (varint_groups[last - 2].value << 14) : 0)
154+
+ (last >= 3 ? (varint_groups[last - 3].value << 21) : 0)
155+
+ (last >= 4 ? (varint_groups[last - 4].value << 28) : 0)
156+
+ (last >= 5 ? (varint_groups[last - 5].value << 35) : 0)
157+
+ (last >= 6 ? (varint_groups[last - 6].value << 42) : 0)
158+
+ (last >= 7 ? (varint_groups[last - 7].value << 49) : 0)
159+
doc: Resulting value as normal integer
160+
varint_group:
161+
seq:
162+
- id: b
163+
type: u1
164+
instances:
165+
has_next:
166+
value: (b & 0b1000_0000) != 0
167+
doc: If true, then we have more bytes to read
168+
value:
169+
value: b & 0b0111_1111
170+
doc: The 7-bit (base128) numeric value chunk of this group

0 commit comments

Comments
 (0)