Skip to content

Commit 50f230e

Browse files
authored
add samediff serialization changes (deeplearning4j#10209)
1 parent 52ccca5 commit 50f230e

File tree

77 files changed

+10623
-582
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+10623
-582
lines changed

ADRs/0034 - FlatBuffers upgrade.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44

55
Implemented
66

7-
Proposed by: Assistant (20-02-2025)
8-
Discussed with: Adam Gibson
7+
Proposed by: Assistant (14-04-2025)
8+
Discussed with: Paul Dubs
99

1010
## Context
1111

Lines changed: 325 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,325 @@
1+
# ADR 0035: SameDiff Unified Container Format
2+
3+
## Status
4+
5+
Implemented
6+
7+
Proposed by: Adam Gibson (15-04-2025)
8+
9+
10+
## Context
11+
12+
The current SameDiff serialization relies on FlatBuffers for graph representation and handles large arrays (>2GB) using a chunking mechanism. However, this approach has several limitations:
13+
14+
1. **Single File Deployment**: Current format often requires multiple files when externalizing large arrays
15+
2. **Large Model Support**: Limited efficiency when dealing with very large models
16+
3. **Metadata Management**: Lack of standardized metadata for model tracking and versioning
17+
4. **Model Sharding**: Limited explicit support for sharding large models
18+
5. **Compatibility**: Each format change risks breaking backward compatibility
19+
20+
We need a more robust serialization format that addresses these challenges while maintaining compatibility with existing systems.
21+
22+
## Decision
23+
24+
We have implemented a unified container format for SameDiff that encapsulates both graph structure and arrays in a single file, with support for optional externalization and sharding when needed. This format maintains full backward compatibility with the original serialization approach.
25+
26+
### Key Components
27+
28+
1. **Multi-Format Support**:
29+
- SDNB Format: Single-file internal format (.sdnb)
30+
- SDZ Format: ZIP-based container format (.sdz)
31+
- Sharded formats for both SDNB and SDZ
32+
33+
2. **SDNB Format**:
34+
- Section-based container with header, metadata, graph, and arrays
35+
- Efficient memory mapping for large arrays
36+
- Optimized for performance with direct I/O
37+
- Compatible with 32-bit FlatBuffers limitations
38+
39+
3. **SDZ Format**:
40+
- Standard ZIP archive containing internal .sdnb files
41+
- Compressed storage to reduce file size
42+
- Standard tools compatibility for inspection and extraction
43+
- Single file deployment for complex models
44+
- Simplicity of implementation using standard ZIP libraries
45+
46+
4. **Metadata Management**:
47+
- Standardized keys for common model attributes
48+
- Support for custom metadata
49+
- Versioning and provenance information
50+
- Extensible metadata system similar to GGUF (General GPU Unified Format)
51+
- Ability to add metadata later without reserializing model parameters
52+
53+
5. **Sharding Support**:
54+
- Explicit first-class support for model sharding in both formats
55+
- Smart distribution of variables across shards
56+
- Automatic shard count determination based on model size
57+
- Consistent naming convention for shards
58+
- Support for NDArrays of any size through intelligent sharding
59+
60+
6. **Backward Compatibility**:
61+
- Automatic format detection between SDNB and SDZ formats
62+
- Support for loading both internal and externalized original formats
63+
- Legacy model conversion utilities
64+
65+
### Implementation Details
66+
67+
1. **SDNB Format Structure**:
68+
```
69+
MAGIC_BYTES (4 bytes: "SDNB")
70+
VERSION (4 bytes)
71+
MANIFEST_OFFSET (8 bytes)
72+
MANIFEST_LENGTH (8 bytes)
73+
METADATA_OFFSET (8 bytes)
74+
[FLATBUFFER_GRAPH_DATA]
75+
[APPENDED_ARRAYS_DATA]
76+
[SERIALIZED_MANIFEST]
77+
```
78+
79+
2. **SDZ Format Structure**:
80+
```
81+
ZIP_HEADER
82+
[ENTRY: model.sdnb] # Graph structure shard
83+
[ENTRY: model.shard0-of-N.sdnb] # Alternative naming for graph shard
84+
[ENTRY: model.shard1-of-N.sdnb] # Variable shard 1
85+
[ENTRY: model.shard2-of-N.sdnb] # Variable shard 2
86+
...
87+
[ENTRY: model.shardM-of-N.sdnb] # Variable shard M
88+
ZIP_DIRECTORY
89+
ZIP_END
90+
```
91+
92+
3. **Sharding Strategy**:
93+
- Graph structure in shard 0
94+
- Variables distributed across remaining shards
95+
- Dynamic shard count calculation based on variable sizes
96+
- Maximum shard size limit of 1GB per shard
97+
- Smart variable grouping to minimize cross-shard dependencies
98+
99+
4. **API Design**:
100+
```java
101+
// SDNB Format API
102+
SameDiffSerializer.save(sameDiff, file, saveUpdaterState, metadata);
103+
SameDiffSerializer.saveAutoShard(sameDiff, baseFile, saveUpdaterState, metadata);
104+
SameDiffSerializer.saveSharded(sameDiff, baseFile, saveUpdaterState, estimatedShards, metadata);
105+
SameDiff model = SameDiffSerializer.load(file, loadUpdaterState);
106+
SameDiff model = SameDiffSerializer.loadSharded(baseFile, loadUpdaterState);
107+
108+
// SDZ Format API
109+
SDZSerializer.save(sameDiff, outputZipFile, saveUpdaterState, metadata);
110+
SameDiff model = SDZSerializer.load(modelZipFile, loadUpdaterState);
111+
```
112+
113+
## Implementation
114+
115+
### SDZ Format Details
116+
117+
The SDZ format addresses the need for single-file distribution of large models through the following implementation:
118+
119+
1. **ZIP Container**: The SDZ format uses a standard ZIP archive as its container, enabling compatibility with standard zip tools for inspection and extraction.
120+
121+
2. **Internal Structure**:
122+
- The ZIP archive contains one or more SDNB format files
123+
- The first file (shard0) contains the graph structure
124+
- Subsequent files contain variables distributed across shards
125+
- Consistent naming convention ensures proper loading sequence
126+
127+
3. **Sharding Implementation**:
128+
- `SDZSerializer.save()` internally calls `SameDiffSerializer.saveAutoShard()` to create SDNB files
129+
- These files are then compressed and packaged into the ZIP archive
130+
- Automatic cleanup of temporary files after ZIP creation
131+
- Distributed variable serialization across shards based on size
132+
133+
4. **Loading Process*``*:
134+
- `SDZSerializer.load()` extracts all SDNB files to a temporary directory
135+
- Loads shard 0 first to establish graph structure
136+
- Loads variable data from remaining shards
137+
- Ensures temporary directory cleanup
138+
- Returns fully reconstituted SameDiff instance
139+
140+
5. **ZIP Operations**:
141+
- Uses standard Java ZIP APIs for maximum compatibility
142+
- Implements efficient I/O with buffering for large file handling
143+
- Security measures against zip slip vulnerabilities
144+
- Validation of ZIP structure integrity
145+
146+
6. **Optimizations**:
147+
- Manifest-based array lookup for efficient loading
148+
- Smart buffer management to minimize memory pressure
149+
- Native byte order handling for cross-platform compatibility
150+
- Verification steps to validate loaded model integrity
151+
152+
### Performance Considerations
153+
154+
The SDZ format balances compression benefits against performance requirements:
155+
156+
1. **Serialization Performance**:
157+
- Slight additional overhead for ZIP compression
158+
- Parallelized compression when possible
159+
- Progressive ZIP writing to avoid memory spikes
160+
161+
2. **Deserialization Performance**:
162+
- Sequential extraction for predictable memory usage
163+
- Lazy loading strategies for large variables
164+
- Efficient memory mapping for large arrays when possible
165+
- Verification during loading to ensure data integrity
166+
167+
3. **Storage Efficiency**:
168+
- Typically 30-50% size reduction through compression
169+
- Optimal balance between compression level and performance
170+
- Compression ratio varies based on parameter data patterns
171+
172+
## Trade-offs and Consequences
173+
174+
### Design Trade-offs
175+
176+
1. **FlatBuffers Compatibility vs. Unlimited Model Size**:
177+
- We maintain compatibility with 32-bit FlatBuffers for graph structure
178+
- We overcome FlatBuffers' 2GB size limitation through our sharding approach
179+
- This allows us to leverage FlatBuffers' efficiency for small graph structures while supporting NDArrays of any size
180+
181+
2. **Single File Format vs. Performance**:
182+
- We chose ZIP for its ubiquity, tooling support, and single-file deployment benefits
183+
- ZIP allows self-contained distribution while accepting some performance overhead during compression/decompression
184+
- This trades some loading speed for better deployment experience and reduced operational complexity
185+
186+
3. **Metadata Extensibility vs. Format Complexity**:
187+
- We implement an extensible metadata system similar to GGUF
188+
- This allows adding/updating metadata without reserializing the entire model
189+
- The increased format complexity is justified by the flexibility to evolve models over time
190+
191+
4. **Cross-Platform Support vs. Optimization**:
192+
- We prioritize cross-platform compatibility over platform-specific optimizations
193+
- This ensures models can be shared across environments but may not achieve maximum performance on specialized hardware
194+
195+
### Advantages
196+
197+
1. **Simplified Deployment**:
198+
- Single file deployment with SDZ format
199+
- Easier distribution and management
200+
- Reduced risk of missing files or shard mismatches
201+
202+
2. **Enhanced Model Storage**:
203+
- Support for NDArrays and models of any size
204+
- Efficient storage with ZIP compression
205+
- Selective loading of model components
206+
207+
3. **Better Metadata Management**:
208+
- Standardized tracking of model attributes
209+
- Version management for compatibility
210+
- Custom metadata for specific requirements
211+
- Post-training metadata additions without parameter reserializing
212+
213+
4. **First-Class Sharding**:
214+
- Explicit support for very large models
215+
- Intelligent variable distribution
216+
- Efficient loading of sharded models
217+
218+
5. **Complete Backward Compatibility**:
219+
- Seamless support for reading existing formats
220+
- Automatic format detection and handling
221+
- No disruption to existing workflows
222+
- Migration path for older models
223+
224+
### Disadvantages
225+
226+
1. **Implementation Complexity**:
227+
- More complex than previous FlatBuffers-only approach
228+
- Additional code paths for format handling
229+
- Need for comprehensive testing across formats
230+
231+
2. **Performance Considerations**:
232+
- Compression/decompression time with SDZ format
233+
- Temporary storage requirements during extraction
234+
- Slight overhead for small models
235+
236+
3. **Tool Ecosystem**:
237+
- Need for updates to existing tooling
238+
- Additional format documentation requirements
239+
- Migration guidance for existing models
240+
241+
## Technical Implementation
242+
243+
### Format Detection Algorithm
244+
```java
245+
public static SameDiff load(File file, boolean loadUpdaterState) throws IOException {
246+
// Check if it's a ZIP file first (SDZ format)
247+
if (isZipFile(file)) {
248+
return SDZSerializer.load(file, loadUpdaterState);
249+
}
250+
251+
// Not a ZIP, check if it's a native SDNB file
252+
if (isValidSdnbFile(file)) {
253+
return SameDiffSerializer.load(file, loadUpdaterState);
254+
}
255+
256+
// Check if it's a base name for sharded files
257+
if (hasShardedFiles(file)) {
258+
return SameDiffSerializer.loadSharded(file, loadUpdaterState);
259+
}
260+
261+
// Unsupported format
262+
throw new UnsupportedOperationException("Unrecognized model format");
263+
}
264+
```
265+
266+
### SDZ Implementation
267+
```java
268+
public static void save(SameDiff sameDiff, File outputZipFile, boolean saveUpdaterState,
269+
Map<String, String> metadata) throws IOException {
270+
// Create temporary directory for SDNB files
271+
Path tempDir = Files.createTempDirectory("sdz-serializer-save-");
272+
273+
try {
274+
// Save using SDNB serializer to temporary directory
275+
File internalSavePath = new File(tempDir.toFile(), "model");
276+
SameDiffSerializer.saveAutoShard(sameDiff, internalSavePath, saveUpdaterState, metadata);
277+
278+
// Collect all files to add to ZIP
279+
List<File> filesToZip = new ArrayList<>();
280+
findAllFilesRecursively(tempDir.toFile(), filesToZip);
281+
282+
// Create ZIP archive
283+
createZipArchive(outputZipFile, filesToZip);
284+
} finally {
285+
// Clean up temporary directory
286+
FileUtils.deleteDirectory(tempDir.toFile());
287+
}
288+
}
289+
290+
public static SameDiff load(File modelZipFile, boolean loadUpdaterState) throws IOException {
291+
// Extract ZIP to temporary directory
292+
Path tempDir = Files.createTempDirectory("sdz-serializer-load-");
293+
294+
try {
295+
// Extract ZIP contents
296+
extractZip(modelZipFile, tempDir.toFile());
297+
298+
// Determine the path to load from
299+
File loadPath = determineLoadPath(tempDir.toFile());
300+
301+
// Load using SDNB serializer
302+
return SameDiffSerializer.load(loadPath, loadUpdaterState);
303+
} finally {
304+
// Clean up temporary directory
305+
FileUtils.deleteDirectory(tempDir.toFile());
306+
}
307+
}
308+
```
309+
310+
311+
## Migration Guidelines
312+
313+
For existing users:
314+
315+
1. **Loading Existing Models**:
316+
- No changes needed, automatic format detection handles existing models
317+
318+
2. **Converting to SDZ Format**:
319+
- Use `SDZSerializer.save()` with existing SameDiff instances
320+
- Alternatively, load existing models and save in SDZ format
321+
322+
3. **When to Use Each Format**:
323+
- SDNB: For highest performance, particularly during training
324+
- SDZ: For deployment, storage efficiency, and single-file distribution
325+
- Sharded formats: For very large models exceeding memory limits
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
//
2+
// Created by agibsonccc on 4/11/25.
3+
//
4+
5+
/* ******************************************************************************
6+
*
7+
* Copyright (c) 2024 Konduit K.K.
8+
* This program and the accompanying materials are made available under the
9+
* terms of the Apache License, Version 2.0 which is available at
10+
* https://www.apache.org/licenses/LICENSE-2.0.
11+
*
12+
* Unless required by applicable law or agreed to in writing, software
13+
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
14+
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
15+
* License for the specific language governing permissions and limitations
16+
* under the License.
17+
*
18+
* SPDX-License-Identifier: Apache-2.0
19+
******************************************************************************/
20+
21+
//
22+
// @author Adam Gibson
23+
//
24+
25+
#ifndef LIBND4J_CUDALIMITTYPE_H
26+
#define LIBND4J_CUDALIMITTYPE_H
27+
28+
29+
#ifndef __JAVACPP_HACK__
30+
enum CudaLimitType {
31+
CUDA_LIMIT_STACK_SIZE = 0,
32+
CUDA_LIMIT_MALLOC_HEAP_SIZE = 1,
33+
CUDA_LIMIT_PRINTF_FIFO_SIZE = 2,
34+
CUDA_LIMIT_DEV_RUNTIME_SYNC_DEPTH = 3,
35+
CUDA_LIMIT_DEV_RUNTIME_PENDING_LAUNCH_COUNT = 4,
36+
CUDA_LIMIT_MAX_L2_FETCH_GRANULARITY = 5,
37+
CUDA_LIMIT_PERSISTING_L2_CACHE_SIZE = 6
38+
};
39+
#endif
40+
#endif // LIBND4J_CUDALIMITTYPE_H

0 commit comments

Comments
 (0)