Welcome to the SAM3 project! We are building a pure C11 inference engine for Facebook's Segment Anything Model 3.
To ensure the codebase remains clean, fast, and maintainable, we have strict coding standards. Please review these guidelines before submitting any pull requests.
Before you start, familiarize yourself with the layout:
include/sam3/- Public API headers (sam3.h,sam3_types.h)src/core/- Tensor ops, arena allocator, compute graphsrc/backend/- Backend abstraction + Metal/CPU implementationssrc/model/- SAM3 layers (image encoder, prompt encoder, mask decoder)src/util/- Logging, error codestools/- CLI binaries (inference, weight conversion)tests/- Unit and integration testsdocs/- Documentation (format specs, reference material)
Model weights use the .sam3 binary format. See
docs/weight-format.md for the full specification.
The format is a flat binary container: a 48-byte header with model config,
followed by an array of 176-byte tensor descriptors, then a page-aligned data
blob with 64-byte per-tensor alignment. Files are loaded via mmap() with
O(1) tensor lookup using FNV-1a hashing.
Supported data types: F32, F16, BF16, I32, I8, Q8_0 (block-quantized int8).
Format structs are defined in src/core/weight.h. The loader and writer live
in src/core/weight.c. Convert from SafeTensors with sam3_convert.
These rules are non-negotiable. Every file must follow them exactly.
- C11 only. No C++ features, no GNU extensions unless guarded by
#ifdef. - Your code must compile with:
-std=c11 -Wall -Wextra -Wpedantic - If it doesn't compile with
-std=c11, it doesn't ship.
- Indentation: Use tabs, 8 characters wide (Linux kernel convention). Keep nesting shallow.
- Line length: 80-column soft limit, 100 hard limit. Break long lines at operators or after commas.
- Braces: K&R brace style for functions (opening brace on its own line). Same-line braces for
if,for,while,switch,struct.
- Cases:
snake_casefor everything (functions, variables, types, enum values, macros). - Prefixes: Prefix public symbols with
sam3_. Internal symbols use their subsystem prefix (e.g.,tensor_,metal_,graph_). - Anti-patterns: No Hungarian notation (
pFoo,m_bar,szName). No typedefs hiding pointers (the*must be visible). Typedefs are acceptable for opaque structs in the public API (typedef struct sam3_ctx sam3_ctx;).
- Allocations: We use arena allocators for inference. No
malloc/freein hot paths. All allocations go throughsam3_alloc_*functions. - State: No global mutable state. All state lives in
sam3_ctxor is passed as function arguments. YAGNI: do not usealloca(). - Ownership: Ownership is explicit. If a function allocates, its doc comment says who frees. Prefer arena allocation where the arena owns everything.
- Return Codes: Use
enum sam3_errorcodes. Never useerrnofor sam3 errors. - Cleanup: Use the
goto cleanuppattern for functions that acquire multiple resources. - Never silently ignore errors. Log or propagate them.
All diagnostic output uses the macros in src/util/log.h. Never use raw
printf/fprintf for diagnostics.
Log levels (enum sam3_log_level):
| Level | Macro | Use for |
|---|---|---|
SAM3_LOG_DEBUG |
sam3_log_debug(...) |
Detailed tracing (suppressed by default) |
SAM3_LOG_INFO |
sam3_log_info(...) |
Operational milestones |
SAM3_LOG_WARN |
sam3_log_warn(...) |
Non-fatal issues |
SAM3_LOG_ERROR |
sam3_log_error(...) |
Failures that affect correctness |
Example:
sam3_log_error("unsupported dtype %d", t->dtype);
sam3_log_info("patch embedding evaluated (%d patches)", np);Output goes to stderr in the format [LEVEL] file:line: message. The macros
capture __FILE__ and __LINE__ automatically.
Configuration: Default level is SAM3_LOG_INFO. CLI tools accept -v to
enable SAM3_LOG_DEBUG. Call sam3_log_set_level() to change at runtime.
Guidelines:
- Always log before returning an error code.
- Use
sam3_log_errorfor failures,sam3_log_warnfor non-fatal issues. - Keep messages short with relevant numeric context (sizes, counts, indices).
- Do not log in tight loops — one message per operation, not per iteration.
Every .c and .h file MUST begin with this header.
/*
* <relative/path/to/file> - <one-line description>
*
* <2-4 sentences explaining purpose, role in the system, and key design
* decisions. Mention what subsystem this belongs to and how it fits into
* the larger architecture.>
*
* Key types: <primary structs/enums defined or used here>
* Depends on: <direct header dependencies, not transitive>
* Used by: <files that directly include or call into this>
*
* Copyright (c) 2026 Rifky Bujana Bisri
* SPDX-License-Identifier: MIT
*/Document non-trivial functions with a comment block above them:
/*
* sam3_tensor_reshape - Change tensor dimensions without copying data.
*
* @t: Tensor to reshape (must not be a view)
* @new_dims: Array of new dimension sizes
* @n_dims: Number of dimensions (1-4)
*
* Returns 0 on success, -SAM3_EINVAL if total element count changes.
* The tensor data pointer is not modified.
*/
int sam3_tensor_reshape(struct sam3_tensor *t, const int *new_dims, int n_dims);- Include system headers first (
<stdint.h>,<stdlib.h>), then project headers. - Use
#include "sam3/header.h"for public headers, and#include "local_header.h"for same-directory private headers. - Use standard include guards (
#ifndef SAM3_CORE_TENSOR_H). Do not use#pragma once. - Do not
#includea.cfile.
- Backends must implement
struct sam3_backend_ops(vtable of function pointers). - Never call Metal/CUDA/CPU functions directly from model code — always go through the backend vtable.
These rules apply to every hot path change. If your code runs during inference (per-token, per-pixel, per-layer), all eight rules are mandatory.
Use stack buffers or arena allocators. Never malloc/free inside a loop.
/* BAD */
for (int i = 0; i < n; i++) {
char *key = malloc(len_a + len_b + 2);
/* ... */
free(key);
}
/* GOOD */
char key_buf[128];
for (int i = 0; i < n; i++) {
/* build key in key_buf */
}Track derived quantities in parallel arrays. If you call strlen() on the
same string more than once, store the length.
/* BAD: recompute every iteration */
for (int i = 0; i < n - 1; i++) {
size_t la = strlen(symbols[i]);
size_t lb = strlen(symbols[i + 1]);
}
/* GOOD: parallel length array, update on mutation */
int sym_len[MAX_SYMBOLS];If you need the length and the content, do both in one pass.
/* BAD */
int len = (int)strlen(text);
int n = len < limit ? len : limit;
for (int i = 0; i < n; i++) { /* process */ }
/* GOOD */
int i = 0;
while (i < limit && text[i]) { /* process */ i++; }memcpy/memset use SIMD internally. For non-zero fill patterns, copy from
a static const array.
/* BAD */
for (int i = pos; i < max; i++)
tokens[i] = EOT_TOKEN;
/* GOOD */
static const int32_t eot_pad[77] = { E_, E_, ... };
memcpy(tokens + pos, eot_pad, (max - pos) * sizeof(int32_t));Replace predictable branches with arithmetic.
/* BAD */
if (c >= 'A' && c <= 'Z')
c += 'a' - 'A';
/* GOOD */
c |= (unsigned char)(((unsigned)(c - 'A') < 26u) << 5);Use NEON (ARM64) or SSE (x86) to process 16 bytes at a time. Always provide
a scalar fallback. Mark SIMD helpers that intentionally over-read with
no_sanitize("address").
#ifdef __aarch64__
__attribute__((no_sanitize("address")))
static int neon_process(const uint8_t *src, int32_t *dst, int limit)
{
/* 16 bytes per iteration, scalar fallback after */
}
#endifIf a deterministic function is called repeatedly with the same inputs, add a direct-mapped hash cache.
int slot = fnv1a(word, len) & (CACHE_SIZE - 1);
if (cache[slot].key_len == len && memcmp(...) == 0)
return cache[slot].result; /* hit */
/* miss: compute, then store in cache[slot] */ASan adds 5-20x overhead per memory access. Debug builds measure sanitizer cost, not your code.
# Correctness
cd build && ctest --output-on-failure
# Performance
cd build-release && ./bench_tokenizer
Benchmark files live in tests/bench_*.c and are auto-registered via the
foreach loop in CMakeLists.txt. They are not included in CTest.
To add a new benchmark:
- Create
tests/bench_<module>.cfollowing existing patterns - Use
clock_gettime(CLOCK_MONOTONIC)for timing - Include warmup iterations before timed iterations
- Report meaningful metrics (enc/s, GFLOPS, GB/s, ns/op)
- Run from a Release build:
cd build-release && ./bench_<module>
- Write tests for new modules in
tests/test_<module>.c. - Name test functions
test_<module>_<behavior>. - Ensure tests can be run via CTest. Use the predefined assertions in
tests/test_helpers.h.
- Make one logical change per commit.
- Use the imperative mood for subject lines (e.g., "Add tensor reshape", not "Added tensor reshape").
- Follow the format:
<subsystem>: <description>(e.g.,core/tensor: add reshape operation). - Do not add features "for later" (YAGNI). Build only what is needed now.
Thank you for contributing to SAM3!