Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions .claude/skills/msgspec-patterns/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
name: msgspec-patterns
description: Reference guide for msgspec.Struct usage patterns and performance tips. Use when writing or reviewing code that defines msgspec Structs, encodes/decodes data, or needs performance optimization for serialization.
user-invocable: false
---

## Use Structs for Structured Data

Always prefer `msgspec.Struct` over `dict`, `dataclasses`, or `attrs` for structured data with a known schema. Structs are 5-60x faster for common operations.

## Struct Configuration Options

| Option | Description | Default |
| ----------------------- | --------------------------------------------- | ------- |
| `omit_defaults` | Omit fields with default values when encoding | `False` |
| `forbid_unknown_fields` | Error on unknown fields when decoding | `False` |
| `frozen` | Make instances immutable and hashable | `False` |
| `kw_only` | Make all fields keyword-only | `False` |
| `tag` | Enable tagged union support | `None` |
| `array_like` | Encode/decode as arrays instead of objects | `False` |
| `gc` | Enable garbage collector tracking | `True` |

## Omit Default Values

Set `omit_defaults=True` when default values are known on both encoding and decoding ends. Reduces encoded message size and improves performance.

```python
class Config(msgspec.Struct, omit_defaults=True):
host: str = "localhost"
port: int = 8080
```

## Avoid Decoding Unused Fields

Define smaller "view" Struct types that only contain the fields you actually need. msgspec skips decoding fields not defined in your Struct.

## Use `encode_into` for Buffer Reuse

In hot loops, use `Encoder.encode_into()` with a pre-allocated `bytearray` instead of `encode()`. Always measure before adopting.

```python
encoder = msgspec.json.Encoder()
buffer = bytearray(1024)
n = encoder.encode_into(msg, buffer)
socket.sendall(memoryview(buffer)[:n])
```

## Use MessagePack for Internal APIs

`msgspec.msgpack` is more compact and can be more performant than `msgspec.json` for internal service communication.

## gc=False

Set `gc=False` on Struct types that will never participate in reference cycles. Reduces GC overhead by up to 75x and saves 16 bytes per instance. See the `msgspec-struct-gc-check` skill for the full safety analysis.

## array_like=True

Set `array_like=True` when both ends know the field schema. Encodes structs as arrays instead of objects, removing field names from the message.

```python
class Point(msgspec.Struct, array_like=True):
x: float
y: float
# Encodes as [1.0, 2.0] instead of {"x": 1.0, "y": 2.0}
```

## Tagged Unions

Use `tag=True` on Struct types when handling multiple message types in a single union for efficient type discrimination during decoding.

```python
class GetRequest(msgspec.Struct, tag=True):
key: str

class PutRequest(msgspec.Struct, tag=True):
key: str
value: str

Request = GetRequest | PutRequest
decoder = msgspec.msgpack.Decoder(Request)
```

## NDJSON with encode_into

For line-delimited JSON, use `encode_into()` with `buffer.extend()` to avoid copies:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The description for NDJSON says to use encode_into() with buffer.extend(), but the code example doesn't use buffer.extend(). This is confusing as encode_into and buffer.extend serve different purposes. The example shows writing a single encoded message to a file, which is a valid use case for encode_into to avoid intermediate allocations.

To improve clarity, I suggest updating the description to better match the example. For instance:
"For line-delimited JSON, use encode_into() to write into a reusable buffer before sending the data to a file or socket. This avoids extra memory copies."


```python
encoder = msgspec.json.Encoder()
buffer = bytearray(64)
n = encoder.encode_into(msg, buffer)
file.write(memoryview(buffer)[:n])
file.write(b"\n")
```
76 changes: 76 additions & 0 deletions .claude/skills/msgspec-struct-gc-check/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
name: msgspec-struct-gc-check
description: Check whether msgspec.Struct types can safely use gc=False. Use when adding or changing msgspec.Struct definitions, or when reviewing code that uses msgspec structs.
allowed-tools: Read, Grep, Glob
---

# msgspec.Struct gc=False Safety Check

## When to use this skill

- Adding or modifying a class that inherits from `msgspec.Struct`
- Reviewing or refactoring code that defines or uses msgspec structs
- Deciding whether to add or remove `gc=False` on a Struct

## Why gc=False matters

Setting `gc=False` on a Struct means instances are **never tracked** by Python's garbage collector. This reduces GC pressure and can improve performance when many structs are allocated. The **only** risk: if a **reference cycle** involves only gc=False structs (or objects not tracked by GC), that cycle will **never be collected** (memory leak).

## Verified safety constraints

All must hold for gc=False to be safe.

### 1. No reference cycles

- The struct (and any container it references) must never be part of a reference cycle.
- **Multiple variables** pointing to the same struct (`x = s; y = x`) are **safe** — that is not a cycle.
- **Returning** a struct from a function is **safe**. What matters is whether any reference path leads back to the struct (e.g. struct's list contains the struct or something that holds the struct).

### 2. No mutation that could create cycles

- **Do not mutate** struct fields after construction in a way that could introduce a cycle.
- **Frozen structs** (`frozen=True`) prevent field reassignment; `force_setattr` in `__post_init__` is one-time init only, so that's acceptable.
- Assigning **scalars** (int, str, bool, float, None) to fields is safe — they cannot form cycles.

### 3. Mutable containers (list, dict, set) on the struct

- If the struct has list/dict/set fields, either:
- **Never mutate** those containers after creation, and never store in them any object that references the struct, or
- Do not use `gc=False` (conservative).
- **Reading** from containers does not create cycles and is allowed.

### 4. Nested structs

- If a struct holds another Struct, the same rules apply to the whole reference graph: no cycles, no mutation that could create cycles.

### 5. Generic / mixins

- With `gc=False`, the type must be compatible with `__slots__` (e.g. if using `Generic`, the mixin must define `__slots__ = ()`).

## Quick per-struct analysis steps

1. List all fields and their types (scalars vs containers vs nested Structs).
2. Search the codebase for: assignments to this struct's fields, mutations of its container fields (`.append`, `.update`, etc.), and any place the struct instance is stored.
3. If only scalars or immutable types, or frozen with no container mutation -> likely safe for gc=False.
4. If mutable containers and they're never mutated (and never made to reference the struct) -> likely safe; otherwise -> do not use gc=False.

## Risky structs: audit and at-risk comment

A struct is **risky** for gc=False if it has a condition that would normally disallow gc=False (e.g. mutable list/dict/set fields), but that condition might never arise in practice.

### When audit passes

- Set `gc=False` on the struct.
- Add an **at-risk comment** above the class:

`# gc=False: audit YYYY-MM: <condition> is only read, never mutated.`

- Add a docstring note:

`AT-RISK (gc=False): Has <brief condition>. Any change that <what would violate safety> must be audited; if so, remove gc=False.`

### When touching an at-risk struct

1. Re-run the audit for that struct.
2. If your change mutates the at-risk field(s) or creates a cycle, **remove** `gc=False`.
3. If your change does not touch the at-risk field, the existing gc=False remains; you may update the audit date.
119 changes: 119 additions & 0 deletions .cursor/skills/msgspec-struct-gc-check/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fold it or symlink to claude skills to avoid duplication

name: msgspec-struct-gc-check
description: Check whether msgspec.Struct types can safely use gc=False. Use when adding or changing msgspec.Struct definitions, or when reviewing code that uses msgspec structs.
---

# msgspec.Struct gc=False Safety Check

## When to use this skill

- Adding or modifying a class that inherits from `msgspec.Struct`
- Reviewing or refactoring code that defines or uses msgspec structs
- Deciding whether to add or remove `gc=False` on a Struct

## Why gc=False matters

Setting `gc=False` on a Struct means instances are **never tracked** by Python's garbage collector. This reduces GC pressure and can improve performance when many structs are allocated. The **only** risk: if a **reference cycle** involves only gc=False structs (or objects not tracked by GC), that cycle will **never be collected** (memory leak).

Reference: [msgspec Structs – Disabling Garbage Collection](https://jcristharif.com/msgspec/structs.html#struct-gc).

## Verified safety constraints

Use these constraints to decide if a Struct can use `gc=False`. All must hold.

### 1. No reference cycles

- The struct (and any container it references) must never be part of a reference cycle.
- **Multiple variables** pointing to the same struct (`x = s; y = x`) are **safe** — that is not a cycle. A cycle is A → B → … → A.
- **Returning** a struct from a function is **safe**. What matters is whether any reference path leads back to the struct (e.g. struct’s list contains the struct or something that holds the struct).

### 2. No mutation that could create cycles

- **Do not mutate** struct fields after construction in a way that could introduce a cycle (e.g. set a field to an object that references the struct, or append the struct to its own list/dict).
- **Frozen structs** (`frozen=True`) prevent field reassignment; `force_setattr` in `__post_init__` is one-time init only, so that’s acceptable.
- Assigning **scalars** (int, str, bool, float, None) to fields is safe — they cannot form cycles.

### 3. Mutable containers (list, dict, set) on the struct

- If the struct has list/dict/set fields, either:
- **Never mutate** those containers after creation (no `.append`, `.update`, `[...] = ...`, etc.), and never store in them any object that references the struct, or
- Do not use `gc=False` (conservative).
- **Reading** from containers (e.g. `x = struct.foobars[i]`) does not create cycles and is allowed.

### 4. Nested structs

- If a struct holds another Struct (or holds containers that hold Structs), the same rules apply to the whole reference graph: no cycles, no mutation that could create cycles. If any nested Struct uses `gc=False`, the whole graph must still be cycle-free.

### 5. Generic / mixins

- With `gc=False`, the type must be compatible with `__slots__` (e.g. if using `Generic`, the mixin must define `__slots__ = ()`). See msgspec issue #631 / PR #635.

## Checklist for “can use gc=False”

- [ ] Struct and everything it references can never participate in a reference cycle.
- [ ] No mutation of struct fields after construction that could introduce a cycle (frozen or init-only mutation is ok; scalar assignment is ok).
- [ ] Any list/dict/set fields are never mutated after creation, or we do not use gc=False.
- [ ] No storing the struct (or anything that references it) inside its own container fields.
- [ ] If Generic/mixins are used, `__slots__` compatibility is satisfied.

## Checklist for “must NOT use gc=False”

- [ ] Struct is mutated after creation in a way that could create a cycle (e.g. appending self to a list field).
- [ ] Container fields are mutated after creation and could hold the struct or back-references.
- [ ] Struct is used in a pattern where it’s stored in a container that the struct (or its fields) also references.

## Quick per-struct analysis steps

1. List all fields and their types (scalars vs containers vs nested Structs).
2. Search the codebase for: assignments to this struct’s fields, mutations of its container fields (`.append`, `.update`, etc.), and any place the struct instance is stored (e.g. in a list/dict that might be referenced by the struct).
3. If only scalars or immutable types, or frozen with no container mutation → likely safe for gc=False.
4. If mutable containers and they’re never mutated (and never made to reference the struct) → likely safe; otherwise → do not use gc=False.

## Risky structs: audit and at-risk comment

A struct is **risky** for gc=False if it has a condition that would normally disallow gc=False (e.g. mutable list/dict/set fields), but that condition might never arise in practice (e.g. the field is only ever read, never mutated after construction).

### Auditing a risky struct

1. Identify the at-risk condition (e.g. "has `metadata: dict` that could be mutated").
2. Search the codebase for all uses of that struct and of the at-risk field:
- Any assignment to the field: `obj.field = ...`, `obj.field[key] = ...`, `obj.field.append(...)`, `obj.field.update(...)`, etc.
- Any code path that could store the struct (or something holding it) inside that container.
3. If the audit finds **no** such mutation or cycle-creating storage, the condition never arises and gc=False is acceptable **provided** you add the at-risk marker so future changes are re-audited.

### When audit passes

- Set `gc=False` on the struct.
- Add an **at-risk comment** and docstring note:

- **Above the class**: a short comment stating why gc=False is used despite the at-risk condition, and when the audit was done (e.g. `# gc=False: audit YYYY-MM: <condition> is only read, never mutated.`).
- **In the docstring**: a line that signals to future readers and to this skill that changes touching this struct must be re-audited. Use this format:

`AT-RISK (gc=False): Has <brief condition>. Any change that <what would violate safety> must be audited; if so, remove gc=False.`

- Example (for a struct with a `metadata` dict that is only ever read):

```python
# gc=False: audit 2026-03: metadata dict is only ever read, never mutated after construction.
class QueryResult(msgspec.Struct, ..., gc=False):
"""Result of a completed inference query.

AT-RISK (gc=False): Has mutable container field `metadata`. Any change that
mutates `metadata` after construction or stores this struct in a container
referenced by this struct must be audited; if so, remove gc=False.
...
```

### When touching an at-risk struct

If you are adding or changing code that uses a struct marked AT-RISK (gc=False):

1. Re-run the audit for that struct (searches above).
2. If your change mutates the at-risk field(s) or creates a cycle (e.g. stores the struct in its own container), **remove** `gc=False` from the struct and remove the at-risk comment/docstring line.
3. If your change does not touch the at-risk field or create cycles, the existing gc=False and at-risk comment remain; you may add a short note in the at-risk comment if the audit was re-checked (e.g. update the audit date).

## References

- [msgspec Structs – Disabling Garbage Collection](https://jcristharif.com/msgspec/structs.html#struct-gc)
- [msgspec Performance Tips – Use gc=False](https://jcristharif.com/msgspec/perf-tips.html#use-gc-false)
- [msgspec #631 – Generic structs and gc=False](https://github.com/jcrist/msgspec/issues/631)
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -189,5 +189,10 @@ outputs/
# Example vLLM virtualenv
examples/03_BenchmarkComparison/vllm_venv/

# Cursor artifacts (local development only)
# Agent artifacts (local development only)
.cursor_artifacts/
.claude/agent-memory/

# User-specific local rules (local Docker dev); do not commit
.cursor/rules/local-docker-dev.mdc
CLAUDE.local.md
Loading
Loading