-
-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem Statement
Context7 Benchmark Impact: Question 1 scored 78/100
Context7 Q1 feedback:
"However, the context lacks detailed explanations of the actual serialization/deserialization mechanisms at the binary level, error handling patterns, and practical troubleshooting guidance when type mismatches occur between Rust and TypeScript implementations."
Specific gaps:
- Missing binary-level serialization mechanism explanations
- No low-level Borsh format documentation
- Lack of hex dump interpretation guides
- No debugging tools for serialization issues
Proposed Solution
Create comprehensive documentation explaining how LUMOS uses Borsh serialization at the binary level, including type-specific encodings, debugging techniques, and common serialization bugs.
1. New File: docs/internals/borsh-serialization.md
Comprehensive Binary Format Guide:
# Borsh Serialization Internals
## Overview
LUMOS uses [Borsh](https://borsh.io) (Binary Object Representation Serializer for Hashing) to ensure deterministic serialization between Rust and TypeScript. This guide explains the low-level binary format for each type.
## Serialization Pipeline
### 1. LUMOS Schema → IR
```rust
// schema.lumos
#[solana]
#[account]
struct PlayerAccount {
wallet: PublicKey,
level: u16,
score: u64,
}2. IR → Rust Struct with BorshSerialize
use borsh::{BorshSerialize, BorshDeserialize};
#[derive(BorshSerialize, BorshDeserialize)]
pub struct PlayerAccount {
pub wallet: Pubkey,
pub level: u16,
pub score: u64,
}3. Data → Binary Format
When serialized, the struct becomes:
[32 bytes: wallet][2 bytes: level][8 bytes: score]
Total: 42 bytes
Binary Layout by Type
Primitive Types
Unsigned Integers (Little-Endian)
- u8: 1 byte
- Example:
255→0xFF
- Example:
- u16: 2 bytes
- Example:
1000→0xE8 0x03
- Example:
- u32: 4 bytes
- Example:
1000000→0x40 0x42 0x0F 0x00
- Example:
- u64: 8 bytes
- Example:
1000000000→0x00 0xCA 0x9A 0x3B 0x00 0x00 0x00 0x00
- Example:
- u128: 16 bytes
- Example: Large value serialized in little-endian
Signed Integers (Little-Endian, Two's Complement)
- i8: 1 byte
- Example:
-1→0xFF
- Example:
- i16: 2 bytes
- Example:
-1000→0x18 0xFC
- Example:
- i32: 4 bytes
- i64: 8 bytes
- i128: 16 bytes
Boolean
- bool: 1 byte
true→0x01false→0x00
Solana-Specific Types
PublicKey
- Size: 32 bytes (fixed)
- Format: Raw bytes of the public key
- Example:
PublicKey("11111111111111111111111111111111") → 0x00 0x00 0x00 0x00 ... (32 bytes)
Signature
- Size: 64 bytes (fixed)
- Format: Raw signature bytes
String
- Format:
[4-byte length prefix][UTF-8 bytes] - Example:
"hello"Length: 5 → 0x05 0x00 0x00 0x00 UTF-8: "hello" → 0x68 0x65 0x6C 0x6C 0x6F Total: [0x05 0x00 0x00 0x00 0x68 0x65 0x6C 0x6C 0x6F]
Vec
- Format:
[4-byte length][element 1][element 2]...[element n] - Example:
Vec<u16>([10, 20, 30])Length: 3 → 0x03 0x00 0x00 0x00 Element 1: 10 → 0x0A 0x00 Element 2: 20 → 0x14 0x00 Element 3: 30 → 0x1E 0x00 Total: [0x03 0x00 0x00 0x00 0x0A 0x00 0x14 0x00 0x1E 0x00]
Option
- Format:
[1-byte discriminant][value if Some] - Discriminant:
None→0x00Some(value)→0x01 [serialized value]
- Example:
Option<u32>None→0x00Some(1000)→0x01 0x40 0x42 0x0F 0x00
Enum
- Format:
[1-byte discriminant][variant data] - Discriminant: Sequential (0, 1, 2, ...)
Example Enum:
enum GameState {
Active, // discriminant: 0
Paused, // discriminant: 1
Finished { score: u64 }, // discriminant: 2
}Serialization:
GameState::Active→0x00GameState::Paused→0x01GameState::Finished { score: 1000 }→0x02 0xE8 0x03 0x00 0x00 0x00 0x00 0x00 0x00
Struct
- Format: Fields serialized in declaration order
- No padding: Fields are tightly packed
- Field order matters: Changing field order breaks serialization
Example:
struct Player {
level: u16, // Offset 0: 2 bytes
score: u64, // Offset 2: 8 bytes
name: String, // Offset 10: 4 + string length
}Serialization of Player { level: 5, score: 100, name: "Alice" }:
[0x05 0x00] // level: 5
[0x64 0x00 0x00 0x00 0x00 0x00 0x00 0x00] // score: 100
[0x05 0x00 0x00 0x00] // name length: 5
[0x41 0x6C 0x69 0x63 0x65] // name: "Alice"
Nested Structures
Example:
struct Inventory {
items: Vec<String>,
}
struct Player {
wallet: Pubkey,
inventory: Inventory,
}Serialization (nested structures are inlined):
[32 bytes: wallet]
[4 bytes: items length]
[4 bytes: item 1 length][item 1 UTF-8 bytes]
[4 bytes: item 2 length][item 2 UTF-8 bytes]
...
Anchor Accounts
Anchor adds an 8-byte discriminator at the start of account data:
[8-byte discriminator][borsh-serialized data]
The discriminator is a hash of the account type name, used for type safety.
Example:
#[account]
pub struct PlayerAccount {
pub level: u16,
pub score: u64,
}On-chain data:
[8 bytes: discriminator][2 bytes: level][8 bytes: score]
Total: 18 bytes
Debugging Serialization Issues
Hex Dump Interpretation
Tool: hexdump
hexdump -C account_data.binExample Output:
00000000 92 bc 2c 1a 8e 4f 7a 6d 05 00 64 00 00 00 00 00 |..,..Ozm..d.....|
00000010 00 00 |..|
Interpretation:
- Bytes 0-7: Discriminator
92 bc 2c 1a 8e 4f 7a 6d - Bytes 8-9:
level = 5→0x05 0x00(little-endian u16) - Bytes 10-17:
score = 100→0x64 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Common Serialization Bugs
1. Field Order Mismatch
Problem:
// Rust
struct Player {
level: u16,
score: u64,
}
// TypeScript (WRONG!)
interface Player {
score: number; // Wrong order!
level: number;
}Solution: LUMOS ensures field order matches. If manually writing schemas, maintain declaration order.
2. Endianness Issues
Problem: Reading multi-byte integers in wrong byte order
Solution: Borsh uses little-endian for all integers. Ensure your tooling expects this.
3. String Encoding
Problem: Non-UTF-8 strings causing deserialization failures
Solution: Validate UTF-8 before serialization:
let name = String::from_utf8(bytes).map_err(|_| ErrorCode::InvalidUtf8)?;4. Discriminator Confusion
Problem: Forgetting to skip 8-byte discriminator in Anchor accounts
Solution:
// ❌ WRONG
const player = borsh.deserialize(PlayerAccountSchema, accountInfo.data);
// ✅ CORRECT
const player = borsh.deserialize(PlayerAccountSchema, accountInfo.data.slice(8));Manual Serialization Example
use borsh::BorshSerialize;
let player = PlayerAccount {
wallet: Pubkey::default(),
level: 10,
score: 500,
};
let bytes = player.try_to_vec().unwrap();
println!("Serialized bytes: {:02X?}", bytes);Output:
Serialized bytes: [00, 00, 00, ..., 0A, 00, F4, 01, 00, 00, 00, 00, 00, 00]
[ 32-byte PublicKey ][level][ score ]
### 2. New Directory: `examples/borsh-internals/`
examples/borsh-internals/
├── schema.lumos # Test schema with various types
├── Cargo.toml
├── src/
│ ├── binary_inspector.rs # Print hex dumps of serialized data
│ ├── manual_serialize.rs # Manual Borsh encoding examples
│ ├── type_sizes.rs # Calculate sizes of all types
│ └── lib.rs # Export utilities
└── README.md # Binary format reference
**`src/binary_inspector.rs`:**
```rust
use borsh::BorshSerialize;
use generated::PlayerAccount;
pub fn inspect_account(account: &PlayerAccount) {
let bytes = account.try_to_vec().unwrap();
println!("Total size: {} bytes", bytes.len());
println!("Hex dump:");
for (i, chunk) in bytes.chunks(16).enumerate() {
print!("{:08x} ", i * 16);
for byte in chunk {
print!("{:02x} ", byte);
}
println!();
}
println!("\nField breakdown:");
println!(" wallet (32 bytes): {:02X?}", &bytes[0..32]);
println!(" level (2 bytes): {:02X?}", &bytes[32..34]);
println!(" score (8 bytes): {:02X?}", &bytes[34..42]);
}
src/type_sizes.rs:
use std::mem::size_of;
use borsh::BorshSerialize;
pub fn print_type_sizes() {
println!("Primitive Types:");
println!(" u8: {} byte", size_of::<u8>());
println!(" u16: {} bytes", size_of::<u16>());
println!(" u32: {} bytes", size_of::<u32>());
println!(" u64: {} bytes", size_of::<u64>());
println!(" u128: {} bytes", size_of::<u128>());
println!("\nSolana Types:");
println!(" Pubkey: {} bytes", size_of::<anchor_lang::prelude::Pubkey>());
println!("\nVariable-Length Types:");
let empty_vec: Vec<u8> = vec![];
let vec_3: Vec<u8> = vec![1, 2, 3];
println!(" Vec<u8> (empty): {} bytes", empty_vec.try_to_vec().unwrap().len());
println!(" Vec<u8> (3 items): {} bytes", vec_3.try_to_vec().unwrap().len());
let none: Option<u64> = None;
let some: Option<u64> = Some(100);
println!(" Option<u64> (None): {} byte", none.try_to_vec().unwrap().len());
println!(" Option<u64> (Some): {} bytes", some.try_to_vec().unwrap().len());
}3. Add Binary Layout Diagrams
In docs/reference/type-mapping.md, add visual diagrams:
## Binary Layout Examples
### Example: PlayerAccount
```rust
struct PlayerAccount {
wallet: PublicKey, // 32 bytes
level: u16, // 2 bytes
score: u64, // 8 bytes
}Memory Layout:
┌─────────────────────────────────────┬──────────┬─────────────────────┐
│ wallet (32 bytes) │ level │ score (8 bytes) │
│ │(2 bytes) │ │
└─────────────────────────────────────┴──────────┴─────────────────────┘
0 31 32 33 34 41
Total Size: 42 bytes
## Acceptance Criteria
- [ ] `docs/internals/borsh-serialization.md` written with:
- [ ] Complete binary format reference for all types
- [ ] Serialization pipeline explanation
- [ ] Nested structure examples
- [ ] Anchor discriminator documentation
- [ ] Common bug patterns and solutions
- [ ] New `examples/borsh-internals/` directory with:
- [ ] Binary inspector tool (hex dump utility)
- [ ] Manual serialization examples
- [ ] Type size calculator
- [ ] Comprehensive README
- [ ] Binary layout diagrams added to `docs/reference/type-mapping.md`
- [ ] Reference table for all type encodings
- [ ] **Target:** Context7 Q1 score ≥ 88 (+10 points)
## Impact
**Context7 Benchmark:**
- Q1: 78 → 88 (+10 points)
**Overall Score:** 84.1 → 85.1 (+1.0 point)
**User Value:**
- Deep understanding of serialization format
- Better debugging capabilities
- Reduced serialization bugs
- Educational resource for Borsh/Solana development
## Related
- Context7 Benchmark Question 1
- Borsh specification: https://borsh.io
- Type mapping documentation
## Priority Justification
🟢 **MEDIUM** - Technical depth improvement, valuable for advanced users and debugging