|
| 1 | +# ADR 0035: SameDiff Unified Container Format |
| 2 | + |
| 3 | +## Status |
| 4 | + |
| 5 | +Implemented |
| 6 | + |
| 7 | +Proposed by: Adam Gibson (15-04-2025) |
| 8 | + |
| 9 | + |
| 10 | +## Context |
| 11 | + |
| 12 | +The current SameDiff serialization relies on FlatBuffers for graph representation and handles large arrays (>2GB) using a chunking mechanism. However, this approach has several limitations: |
| 13 | + |
| 14 | +1. **Single File Deployment**: Current format often requires multiple files when externalizing large arrays |
| 15 | +2. **Large Model Support**: Limited efficiency when dealing with very large models |
| 16 | +3. **Metadata Management**: Lack of standardized metadata for model tracking and versioning |
| 17 | +4. **Model Sharding**: Limited explicit support for sharding large models |
| 18 | +5. **Compatibility**: Each format change risks breaking backward compatibility |
| 19 | + |
| 20 | +We need a more robust serialization format that addresses these challenges while maintaining compatibility with existing systems. |
| 21 | + |
| 22 | +## Decision |
| 23 | + |
| 24 | +We have implemented a unified container format for SameDiff that encapsulates both graph structure and arrays in a single file, with support for optional externalization and sharding when needed. This format maintains full backward compatibility with the original serialization approach. |
| 25 | + |
| 26 | +### Key Components |
| 27 | + |
| 28 | +1. **Multi-Format Support**: |
| 29 | + - SDNB Format: Single-file internal format (.sdnb) |
| 30 | + - SDZ Format: ZIP-based container format (.sdz) |
| 31 | + - Sharded formats for both SDNB and SDZ |
| 32 | + |
| 33 | +2. **SDNB Format**: |
| 34 | + - Section-based container with header, metadata, graph, and arrays |
| 35 | + - Efficient memory mapping for large arrays |
| 36 | + - Optimized for performance with direct I/O |
| 37 | + - Compatible with 32-bit FlatBuffers limitations |
| 38 | + |
| 39 | +3. **SDZ Format**: |
| 40 | + - Standard ZIP archive containing internal .sdnb files |
| 41 | + - Compressed storage to reduce file size |
| 42 | + - Standard tools compatibility for inspection and extraction |
| 43 | + - Single file deployment for complex models |
| 44 | + - Simplicity of implementation using standard ZIP libraries |
| 45 | + |
| 46 | +4. **Metadata Management**: |
| 47 | + - Standardized keys for common model attributes |
| 48 | + - Support for custom metadata |
| 49 | + - Versioning and provenance information |
| 50 | + - Extensible metadata system similar to GGUF (General GPU Unified Format) |
| 51 | + - Ability to add metadata later without reserializing model parameters |
| 52 | + |
| 53 | +5. **Sharding Support**: |
| 54 | + - Explicit first-class support for model sharding in both formats |
| 55 | + - Smart distribution of variables across shards |
| 56 | + - Automatic shard count determination based on model size |
| 57 | + - Consistent naming convention for shards |
| 58 | + - Support for NDArrays of any size through intelligent sharding |
| 59 | + |
| 60 | +6. **Backward Compatibility**: |
| 61 | + - Automatic format detection between SDNB and SDZ formats |
| 62 | + - Support for loading both internal and externalized original formats |
| 63 | + - Legacy model conversion utilities |
| 64 | + |
| 65 | +### Implementation Details |
| 66 | + |
| 67 | +1. **SDNB Format Structure**: |
| 68 | + ``` |
| 69 | + MAGIC_BYTES (4 bytes: "SDNB") |
| 70 | + VERSION (4 bytes) |
| 71 | + MANIFEST_OFFSET (8 bytes) |
| 72 | + MANIFEST_LENGTH (8 bytes) |
| 73 | + METADATA_OFFSET (8 bytes) |
| 74 | + [FLATBUFFER_GRAPH_DATA] |
| 75 | + [APPENDED_ARRAYS_DATA] |
| 76 | + [SERIALIZED_MANIFEST] |
| 77 | + ``` |
| 78 | + |
| 79 | +2. **SDZ Format Structure**: |
| 80 | + ``` |
| 81 | + ZIP_HEADER |
| 82 | + [ENTRY: model.sdnb] # Graph structure shard |
| 83 | + [ENTRY: model.shard0-of-N.sdnb] # Alternative naming for graph shard |
| 84 | + [ENTRY: model.shard1-of-N.sdnb] # Variable shard 1 |
| 85 | + [ENTRY: model.shard2-of-N.sdnb] # Variable shard 2 |
| 86 | + ... |
| 87 | + [ENTRY: model.shardM-of-N.sdnb] # Variable shard M |
| 88 | + ZIP_DIRECTORY |
| 89 | + ZIP_END |
| 90 | + ``` |
| 91 | + |
| 92 | +3. **Sharding Strategy**: |
| 93 | + - Graph structure in shard 0 |
| 94 | + - Variables distributed across remaining shards |
| 95 | + - Dynamic shard count calculation based on variable sizes |
| 96 | + - Maximum shard size limit of 1GB per shard |
| 97 | + - Smart variable grouping to minimize cross-shard dependencies |
| 98 | + |
| 99 | +4. **API Design**: |
| 100 | + ```java |
| 101 | + // SDNB Format API |
| 102 | + SameDiffSerializer.save(sameDiff, file, saveUpdaterState, metadata); |
| 103 | + SameDiffSerializer.saveAutoShard(sameDiff, baseFile, saveUpdaterState, metadata); |
| 104 | + SameDiffSerializer.saveSharded(sameDiff, baseFile, saveUpdaterState, estimatedShards, metadata); |
| 105 | + SameDiff model = SameDiffSerializer.load(file, loadUpdaterState); |
| 106 | + SameDiff model = SameDiffSerializer.loadSharded(baseFile, loadUpdaterState); |
| 107 | + |
| 108 | + // SDZ Format API |
| 109 | + SDZSerializer.save(sameDiff, outputZipFile, saveUpdaterState, metadata); |
| 110 | + SameDiff model = SDZSerializer.load(modelZipFile, loadUpdaterState); |
| 111 | + ``` |
| 112 | + |
| 113 | +## Implementation |
| 114 | + |
| 115 | +### SDZ Format Details |
| 116 | + |
| 117 | +The SDZ format addresses the need for single-file distribution of large models through the following implementation: |
| 118 | + |
| 119 | +1. **ZIP Container**: The SDZ format uses a standard ZIP archive as its container, enabling compatibility with standard zip tools for inspection and extraction. |
| 120 | + |
| 121 | +2. **Internal Structure**: |
| 122 | + - The ZIP archive contains one or more SDNB format files |
| 123 | + - The first file (shard0) contains the graph structure |
| 124 | + - Subsequent files contain variables distributed across shards |
| 125 | + - Consistent naming convention ensures proper loading sequence |
| 126 | + |
| 127 | +3. **Sharding Implementation**: |
| 128 | + - `SDZSerializer.save()` internally calls `SameDiffSerializer.saveAutoShard()` to create SDNB files |
| 129 | + - These files are then compressed and packaged into the ZIP archive |
| 130 | + - Automatic cleanup of temporary files after ZIP creation |
| 131 | + - Distributed variable serialization across shards based on size |
| 132 | + |
| 133 | +4. **Loading Process*``*: |
| 134 | + - `SDZSerializer.load()` extracts all SDNB files to a temporary directory |
| 135 | + - Loads shard 0 first to establish graph structure |
| 136 | + - Loads variable data from remaining shards |
| 137 | + - Ensures temporary directory cleanup |
| 138 | + - Returns fully reconstituted SameDiff instance |
| 139 | + |
| 140 | +5. **ZIP Operations**: |
| 141 | + - Uses standard Java ZIP APIs for maximum compatibility |
| 142 | + - Implements efficient I/O with buffering for large file handling |
| 143 | + - Security measures against zip slip vulnerabilities |
| 144 | + - Validation of ZIP structure integrity |
| 145 | + |
| 146 | +6. **Optimizations**: |
| 147 | + - Manifest-based array lookup for efficient loading |
| 148 | + - Smart buffer management to minimize memory pressure |
| 149 | + - Native byte order handling for cross-platform compatibility |
| 150 | + - Verification steps to validate loaded model integrity |
| 151 | + |
| 152 | +### Performance Considerations |
| 153 | + |
| 154 | +The SDZ format balances compression benefits against performance requirements: |
| 155 | + |
| 156 | +1. **Serialization Performance**: |
| 157 | + - Slight additional overhead for ZIP compression |
| 158 | + - Parallelized compression when possible |
| 159 | + - Progressive ZIP writing to avoid memory spikes |
| 160 | + |
| 161 | +2. **Deserialization Performance**: |
| 162 | + - Sequential extraction for predictable memory usage |
| 163 | + - Lazy loading strategies for large variables |
| 164 | + - Efficient memory mapping for large arrays when possible |
| 165 | + - Verification during loading to ensure data integrity |
| 166 | + |
| 167 | +3. **Storage Efficiency**: |
| 168 | + - Typically 30-50% size reduction through compression |
| 169 | + - Optimal balance between compression level and performance |
| 170 | + - Compression ratio varies based on parameter data patterns |
| 171 | + |
| 172 | +## Trade-offs and Consequences |
| 173 | + |
| 174 | +### Design Trade-offs |
| 175 | + |
| 176 | +1. **FlatBuffers Compatibility vs. Unlimited Model Size**: |
| 177 | + - We maintain compatibility with 32-bit FlatBuffers for graph structure |
| 178 | + - We overcome FlatBuffers' 2GB size limitation through our sharding approach |
| 179 | + - This allows us to leverage FlatBuffers' efficiency for small graph structures while supporting NDArrays of any size |
| 180 | + |
| 181 | +2. **Single File Format vs. Performance**: |
| 182 | + - We chose ZIP for its ubiquity, tooling support, and single-file deployment benefits |
| 183 | + - ZIP allows self-contained distribution while accepting some performance overhead during compression/decompression |
| 184 | + - This trades some loading speed for better deployment experience and reduced operational complexity |
| 185 | + |
| 186 | +3. **Metadata Extensibility vs. Format Complexity**: |
| 187 | + - We implement an extensible metadata system similar to GGUF |
| 188 | + - This allows adding/updating metadata without reserializing the entire model |
| 189 | + - The increased format complexity is justified by the flexibility to evolve models over time |
| 190 | + |
| 191 | +4. **Cross-Platform Support vs. Optimization**: |
| 192 | + - We prioritize cross-platform compatibility over platform-specific optimizations |
| 193 | + - This ensures models can be shared across environments but may not achieve maximum performance on specialized hardware |
| 194 | + |
| 195 | +### Advantages |
| 196 | + |
| 197 | +1. **Simplified Deployment**: |
| 198 | + - Single file deployment with SDZ format |
| 199 | + - Easier distribution and management |
| 200 | + - Reduced risk of missing files or shard mismatches |
| 201 | + |
| 202 | +2. **Enhanced Model Storage**: |
| 203 | + - Support for NDArrays and models of any size |
| 204 | + - Efficient storage with ZIP compression |
| 205 | + - Selective loading of model components |
| 206 | + |
| 207 | +3. **Better Metadata Management**: |
| 208 | + - Standardized tracking of model attributes |
| 209 | + - Version management for compatibility |
| 210 | + - Custom metadata for specific requirements |
| 211 | + - Post-training metadata additions without parameter reserializing |
| 212 | + |
| 213 | +4. **First-Class Sharding**: |
| 214 | + - Explicit support for very large models |
| 215 | + - Intelligent variable distribution |
| 216 | + - Efficient loading of sharded models |
| 217 | + |
| 218 | +5. **Complete Backward Compatibility**: |
| 219 | + - Seamless support for reading existing formats |
| 220 | + - Automatic format detection and handling |
| 221 | + - No disruption to existing workflows |
| 222 | + - Migration path for older models |
| 223 | + |
| 224 | +### Disadvantages |
| 225 | + |
| 226 | +1. **Implementation Complexity**: |
| 227 | + - More complex than previous FlatBuffers-only approach |
| 228 | + - Additional code paths for format handling |
| 229 | + - Need for comprehensive testing across formats |
| 230 | + |
| 231 | +2. **Performance Considerations**: |
| 232 | + - Compression/decompression time with SDZ format |
| 233 | + - Temporary storage requirements during extraction |
| 234 | + - Slight overhead for small models |
| 235 | + |
| 236 | +3. **Tool Ecosystem**: |
| 237 | + - Need for updates to existing tooling |
| 238 | + - Additional format documentation requirements |
| 239 | + - Migration guidance for existing models |
| 240 | + |
| 241 | +## Technical Implementation |
| 242 | + |
| 243 | +### Format Detection Algorithm |
| 244 | +```java |
| 245 | +public static SameDiff load(File file, boolean loadUpdaterState) throws IOException { |
| 246 | + // Check if it's a ZIP file first (SDZ format) |
| 247 | + if (isZipFile(file)) { |
| 248 | + return SDZSerializer.load(file, loadUpdaterState); |
| 249 | + } |
| 250 | + |
| 251 | + // Not a ZIP, check if it's a native SDNB file |
| 252 | + if (isValidSdnbFile(file)) { |
| 253 | + return SameDiffSerializer.load(file, loadUpdaterState); |
| 254 | + } |
| 255 | + |
| 256 | + // Check if it's a base name for sharded files |
| 257 | + if (hasShardedFiles(file)) { |
| 258 | + return SameDiffSerializer.loadSharded(file, loadUpdaterState); |
| 259 | + } |
| 260 | + |
| 261 | + // Unsupported format |
| 262 | + throw new UnsupportedOperationException("Unrecognized model format"); |
| 263 | +} |
| 264 | +``` |
| 265 | + |
| 266 | +### SDZ Implementation |
| 267 | +```java |
| 268 | +public static void save(SameDiff sameDiff, File outputZipFile, boolean saveUpdaterState, |
| 269 | + Map<String, String> metadata) throws IOException { |
| 270 | + // Create temporary directory for SDNB files |
| 271 | + Path tempDir = Files.createTempDirectory("sdz-serializer-save-"); |
| 272 | + |
| 273 | + try { |
| 274 | + // Save using SDNB serializer to temporary directory |
| 275 | + File internalSavePath = new File(tempDir.toFile(), "model"); |
| 276 | + SameDiffSerializer.saveAutoShard(sameDiff, internalSavePath, saveUpdaterState, metadata); |
| 277 | + |
| 278 | + // Collect all files to add to ZIP |
| 279 | + List<File> filesToZip = new ArrayList<>(); |
| 280 | + findAllFilesRecursively(tempDir.toFile(), filesToZip); |
| 281 | + |
| 282 | + // Create ZIP archive |
| 283 | + createZipArchive(outputZipFile, filesToZip); |
| 284 | + } finally { |
| 285 | + // Clean up temporary directory |
| 286 | + FileUtils.deleteDirectory(tempDir.toFile()); |
| 287 | + } |
| 288 | +} |
| 289 | + |
| 290 | +public static SameDiff load(File modelZipFile, boolean loadUpdaterState) throws IOException { |
| 291 | + // Extract ZIP to temporary directory |
| 292 | + Path tempDir = Files.createTempDirectory("sdz-serializer-load-"); |
| 293 | + |
| 294 | + try { |
| 295 | + // Extract ZIP contents |
| 296 | + extractZip(modelZipFile, tempDir.toFile()); |
| 297 | + |
| 298 | + // Determine the path to load from |
| 299 | + File loadPath = determineLoadPath(tempDir.toFile()); |
| 300 | + |
| 301 | + // Load using SDNB serializer |
| 302 | + return SameDiffSerializer.load(loadPath, loadUpdaterState); |
| 303 | + } finally { |
| 304 | + // Clean up temporary directory |
| 305 | + FileUtils.deleteDirectory(tempDir.toFile()); |
| 306 | + } |
| 307 | +} |
| 308 | +``` |
| 309 | + |
| 310 | + |
| 311 | +## Migration Guidelines |
| 312 | + |
| 313 | +For existing users: |
| 314 | + |
| 315 | +1. **Loading Existing Models**: |
| 316 | + - No changes needed, automatic format detection handles existing models |
| 317 | + |
| 318 | +2. **Converting to SDZ Format**: |
| 319 | + - Use `SDZSerializer.save()` with existing SameDiff instances |
| 320 | + - Alternatively, load existing models and save in SDZ format |
| 321 | + |
| 322 | +3. **When to Use Each Format**: |
| 323 | + - SDNB: For highest performance, particularly during training |
| 324 | + - SDZ: For deployment, storage efficiency, and single-file distribution |
| 325 | + - Sharded formats: For very large models exceeding memory limits |
0 commit comments