Skip to content

NORTHTEKDevs/sumerian-agent-patterns

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sumerian Agent Patterns

Empirical mining of the SumTablets cuneiform corpus (91,606 tablets, 6.97M glyphs) for software-design primitives applicable to modern multi-agent AI systems.

Sumerian scribes ran a multi-agent bureaucracy 4,000 years ago. Their clay tablets carry sealed envelopes, named time periods, periodic audits, RPC headers, and witness sets — the same primitives modern agent systems are reinventing. This repo statistically validates which patterns are real (with p-values and cited tablet IDs) and translates them into agent-framework code shapes.

Who this is for:

  • Multi-agent / LLM agent framework developers (LangGraph, AutoGen, CrewAI, custom runtimes)
  • Researchers in agentic AI, cognitive architectures, distributed systems
  • Anyone designing memory layers, identity/auth subsystems, or audit logs for AI agents
  • Cuneiform / Assyriology researchers curious about cross-disciplinary applications

Start Here — The Evidence-Backed Artifacts

Of the ~158 ideas in outputs/FULL_IDEAS.md, the following 9 first-class artifacts carry real evidence (statistical results, cited tablet IDs, or measured benchmarks). The rest of the catalog is brainstorm-grade — useful for ideation, but not load-bearing. Start here:

# Artifact Where What it gives you
1 Empirical method — statistical mining of an ancient corpus to derive software-design primitives, with shuffled-baseline controls and Bonferroni correction scripts/phase{0,1,3}_*.py A reproducible pipeline you can re-run on any corpus to extract templates + structure
2 9 named agent primitives with cited tablet IDs and contracts outputs/primitives.json Single-responsibility agent designs grounded in real attestations (P/Q tablet IDs)
3 Zipf-as-DSL detector finding (Admin s=1.746, Royal s=1.737, Lexical s=1.114) outputs/compression_findings.md §1 Empirical method for unsupervised "is-this-a-DSL?" classification of any corpus
4 RULING-parity finding (Royal p=0.002, Admin p=0.005) outputs/compression_findings.md §4 Statistical proof that physical document boundaries map to logical row boundaries — informs vector-chunk strategy
5 ELS-null result (0 / 495 tests Bonferroni-significant) outputs/compression_findings.md §3 Defensive prior art against future numerology / "hidden code" claims on cuneiform
6 Reference architecture — composes the primitives into a multi-agent design with Python + Rust pseudocode outputs/reference_architecture.md Drop-in design doc you can adapt to any agent runtime
7 Quantitative benchmark — sealed envelope vs anonymous baseline, measured benchmarks/RESULTS.md Hard numbers (+59% bytes, +37 tokens, +7µs per write) and a 5/5-vs-0/5 capability comparison
8 Reference implementation of kishib3 (sealed envelope) in 250 LoC stdlib Python benchmarks/kishib3.py Working code for one of the primitives — clone, adapt, ship
9 5 unmined research directions from data we already have on disk outputs/FULL_IDEAS.md §N (items N1, N4, N5, N9, N12) Named-person social network (N1), region-scoped authority (N4), votive ledger pattern (N5), multi-tablet narratives (N9), kišib₃ undercount (N12)

Honest framing: the ~158-idea catalog in FULL_IDEAS.md is preserved for browsability, but most of it is restatements of these 9 artifacts in different framings, branding gimmicks (Sumerian-named libraries that are the primitives renamed), or ephemera. If you only have 30 minutes, read outputs/summary.md + benchmarks/RESULTS.md. If you have an hour, add outputs/reference_architecture.md.


Why You Should Care (Findings → Actions)

Three concrete things you can build differently after reading this:

1. Wrap every agent write in a sealed envelope

Finding. 25.4% of administrative tablets carry kišib₃ (seal of so-and-so), 74.2% are dated by year, and witness clauses (igi PN-šè) are common. No important write is anonymous, undated, or unattributed. Tablets P101440, P132611, P117793, P145759.

Action. Make every state-changing call in your agent runtime carry (payload, by_seal, witnesses, period). Audit becomes a property of the envelope, not a separate concern bolted on per agent. Replay across any time window becomes trivial.

Measured impact (benchmarks/RESULTS.md). We built a minimal sealed-envelope library (benchmarks/kishib3.py, 250 LoC, stdlib only) and ran 100,000 writes through both it and an anonymous-log baseline:

  • Cost: +37 tokens / write (estimated), +7 µs latency / write, +59 % bytes at 250-byte payloads (drops to ~14 % at 1 KB payloads, ~1.4 % at 10 KB).
  • Capability: sealed log answers 5/5 audit queries (who wrote X, all writes by principal, all writes in period, integrity verification, replay-after-cascade-revoke). Anonymous log answers 0 of 5. The capability gap is total, not partial.
  • The killer query: replay-as-of after revoking a parent principal. Baseline returns all 50 writes (silent drift). Sealed log returns 0 — every revoked descendant is correctly excluded.

2. Tier your agent memory as session → topic → row

Finding. Sumerian tablets have three layers of structure: physical surface (obverse/reverse), logical column, atomic row marked by <RULING>. We statistically confirmed <RULING> is a real row separator: adjacent ruling-bounded chunks share trigrams 30–500× more than shuffled baselines (Royal Inscription p=0.002, Administrative p=0.005).

Action. Replace flat agent memory with three tiers: SURFACE (session) → COLUMN (topic) → RULING (row). Reads return the smallest tier that satisfies the query — never drag back the whole session when one row answers.

3. Replace silent token/cost rollups with periodic signed audits

Finding. Administrative tablets close with šu-nigin₂ (sum-total) — a periodic reconciliation. Shortfalls and excesses are named explicitly: la₂-ia₃ (deficit owed by named person), diri (excess). Nothing drifts silently.

Action. For any ledger-shaped agent state (token usage, tool-call counts, cost tracking, evidence accumulation), close periods at fixed intervals with a signed audit. Deficits and excesses must be named and attributed to a counterparty.

Plus a defensive null result: A 99-skip × 5-genre × 1,000-shuffle ELS scan found zero hidden codes (0 of 495 Bonferroni-significant tests). Useful prior art if anyone tries to sell you "Sumerian secret-code AI."

Want more? outputs/summary.md ranks the top 10 ideas by leverage. outputs/FULL_IDEAS.md lists ~158 across 16 categories. outputs/reference_architecture.md has full code shapes in Python and Rust.


Architecture Overview

                     ┌────────────────────────────────────────┐
                     │           RoyalDecreeAgent             │
                     │  (policy/version registry, broadcasts) │
                     └────────────────┬───────────────────────┘
                                      │ subscribes
        ┌─────────────────────────────┼─────────────────────────────┐
        │                             │                             │
        ▼                             ▼                             ▼
┌────────────────┐         ┌────────────────────┐        ┌─────────────────────┐
│ TempleLedger   │ ──uses──▶ CommodityLedgerLine │        │ AddressedMessage    │
│ Agent          │         │ Agent (stateless)   │        │ Agent (RPC-on-clay) │
└──────┬─────────┘         └─────────┬──────────┘         └─────────┬───────────┘
       │ writes                      │ canonicalizes                │ delivers
       ▼                             ▼                              ▼
┌────────────────┐         ┌─────────────────────┐         ┌──────────────────┐
│ SealAuthority  │◀────────│ LexicalOntology     │         │ RitualSequence   │
│ Agent          │         │ Agent (taxonomy)    │         │ Agent (workflow) │
└──────┬─────────┘         └─────────────────────┘         └──────────────────┘
       │ identity                                                  ▲
       ▼                                                           │
┌────────────────┐                                          ┌──────┴───────────┐
│ ScribalSchool  │                                          │ YearNameRegistry │
│ Agent          │                                          │ Agent (time)     │
└────────────────┘                                          └──────────────────┘

Nine named agent primitives. Three transverse subsystems: memory tiers, identity/provenance, taxonomy. See outputs/reference_architecture.md for the full design with Python + Rust code shapes.


Concrete Example — One Tablet, Decomposed

Tablet P101440 (Ur III administrative, 39 cuneiform glyphs):

<SURFACE>
la₂-ia₃ 1(aš) gun₂ 5(u) 5(diš) ma-na siki du                  ← deficit line + commodity quantities
<unk> 5(gešʾu) 4(geš₂) 7(diš) a₂ geme₂ u₄ 1(diš)-še₃            ← labor accounting
kišib₃ {d}šul-gi-i₃-li₂                                         ← seal of Shulgi-ili
<SURFACE>
<BLANK_SPACE>
iti ezem-me-ki-gal₂                                             ← month: festival of Mekigal
mu us₂-sa ki-maš{ki} ba-...                                     ← year after the destruction of Kimaš

The same tablet, decomposed into modern primitives:

WriteEnvelope(
  payload = LedgerEntry(
    lines = [
      Line(qty=Rational(1,1), unit="gun₂", commodity="siki", instrument="deficit"),  # la₂-ia₃
      Line(qty="5(gešʾu) 4(geš₂) 7(diš)", unit="labor-day", commodity="female-worker"),
    ],
  ),
  by_seal      = SealId("shulgi-ili-001"),         # kišib₃ {d}šul-gi-i₃-li₂
  witnesses    = [],                                # none recorded on this tablet
  period_id    = PeriodRegistry.resolve(
    name="iti ezem-me-ki-gal₂",
    year=YearName(derived_from="ki-maš destruction year", offset="us₂-sa"),  # mu us₂-sa
  ),
)

This is what every line in outputs/templates.json is doing — taking a real tablet's structural pattern and showing the modern primitive it implies.


Glossary — Sumerian Terms Used

Term Literal Modern equivalent
kišib₃ (PN) seal of [person name] Cryptographic signature / write attribution
mu X year of X Named time period (event-named, not numeric)
mu us₂-sa X year after the year of X Relative time reference resolved at write-time
iti X month of X Calendar month sub-period
šu-nigin₂ sum-total Periodic audit / signed reconciliation
la₂-ia₃ deficit Named outstanding obligation (never silent)
diri excess Named surplus requiring disposition
igi PN-šè before [person] Witness clause — live attestation at write-time
dumu PN son of [person] Filiation edge in principal/identity graph
u₃-na-a-du₁₁ speak to him Letter address formula — RPC envelope opener
dub-ba-ni his tablet Reference to a prior message (thread-id)
lugal king Top-tier role in authority graph
ensi₂ governor Region-scoped authority role
niga 4(diš)-kam grade-4 grain-fed Numbered quality tier on a commodity (SLA tier)
<SURFACE> physical face of tablet L1 — Frame / session boundary
<COLUMN> column on a surface L2 — Section / topic boundary
<RULING> drawn dividing line L3 — Row / atomic record boundary
<BLANK_SPACE> intentional gap Semantic whitespace — preserve, don't trim

Findings (Detail)

The numbers behind the implications above. All claims here are reproducible from scripts/phase3_compression.py with seed=42.

Finding Statistic Genre / Coverage
Admin tablets are a domain-specific language Zipf exponent s = 1.746 (R²=0.93) Administrative
Royal Inscription is the most-templated genre Compression-Δ = +0.099 vs shuffled baseline Royal Inscription
Lexical lists are closest to natural language Zipf s = 1.114 (R²=0.92) Lexical
Letters are short single-purpose RPCs Lowest marker density (0.02 RULING/tab) Letter
<RULING> is a logical row separator (not visual) p = 0.002 (Royal), p = 0.005 (Admin) for cross-ruling trigram parity Royal, Admin
No hidden encodings in the corpus 0 of 495 ELS tests Bonferroni-significant All genres
Seal-of-PN clauses are pervasive 25.4% of Admin tablets Administrative
Year-formulas are universal envelopes 74.2% Admin, 65% Letter, 62.4% Royal Across genres
Letters are addressed RPCs u₃-na-a-du₁₁ in 58.8% of Letters Letter

Full statistics with shuffled-baseline controls in outputs/compression_findings.md. Per-tablet pattern citations in outputs/templates.json.


Repository Layout

.
├── README.md                          this file
├── LICENSE                            CC BY 4.0 (docs and analysis artifacts)
├── LICENSE-CODE                       MIT (Python scripts)
├── requirements.txt                   Python deps
├── scripts/
│   ├── phase0_sample.py               loads SumTablets, builds stratified samples
│   ├── phase1_templates.py            extracts genre templates and probe hits
│   └── phase3_compression.py          Zipf, compression, ELS, RULING-parity analysis
├── benchmarks/
│   ├── kishib3.py                     reference sealed-envelope implementation (~250 LoC)
│   ├── baseline_log.py                anonymous-log comparison point (~50 LoC)
│   ├── benchmark.py                   harness — overhead + capability comparison
│   ├── results.json                   raw measurement output
│   └── RESULTS.md                     report with measured numbers and caveats
└── outputs/
    ├── templates.json                 229 templates × {genre, pattern, role, frequency, tablet IDs}
    ├── primitives.json                9 named agent primitives (6 rubric + 3 data-justified)
    ├── compression_findings.md        Phase 3 statistics with p-values
    ├── phase3_raw.json                machine-readable Phase 3 metric rows
    ├── reference_architecture.md      multi-agent reference architecture with code shapes
    ├── summary.md                     top-10 ideas ranked novelty × implementability
    └── FULL_IDEAS.md                  ~158 ideas across 16 categories

How to Reproduce

pip install -r requirements.txt
python scripts/phase0_sample.py     # downloads SumTablets, persists local parquet, builds samples
python scripts/phase1_templates.py  # writes outputs/templates.json
python scripts/phase3_compression.py  # writes outputs/compression_findings.md + phase3_raw.json

All scripts are seeded (random_state=42, np.random.default_rng(42)) and reproducible end-to-end. Phase 0 downloads ~50 MB of corpus data from HuggingFace on first run, caches it locally as parquet, and reuses on subsequent runs.

reference_architecture.md, summary.md, FULL_IDEAS.md, and primitives.json are hand-authored design artifacts that cite outputs from the scripts.

Methodology

  1. Sample. Stratified-sample 500 tablets per genre (Administrative, Literary, Lexical, Royal Inscription, Letter) plus up to 50 long-form tablets per genre. Total sample: 2,069 tablets + 197 long-form.
  2. Templates. For each genre: structural-marker statistics (<SURFACE>, <COLUMN>, <RULING>, <BLANK_SPACE>); per-position opening/closing line templates; distinctive bigrams and trigrams via genre log-odds vs other genres; hand-coded regex probes for known bureaucratic primitives (seal-of, year-formula, total/audit, deficit, witness, etc.). Every template carries cited tablet IDs.
  3. Compression and ELS. Per-genre Zipfian fit; compression-ratio Δ between raw and shuffled token streams; equidistant-letter-sequence (ELS) decimation at skips 2–100 with 1,000 shuffled-baseline controls (Bonferroni-corrected for 495 tests); cross-RULING trigram parity vs within-tablet shuffled baseline.
  4. Mapping. For each empirical template, propose a named single-responsibility agent primitive with inputs, outputs, state, tools, and guardrails. Distinguish validated-by-data from speculative.

Honest Limits

  • We sampled 2.3% of the corpus. Findings are strong for Ur III administrative tablets and Old Babylonian literary tablets; weaker for everything else.
  • The corpus is 92%+ administrative — generalizing about "Sumerian thought" from this sample would be like generalizing about "civilization" from accounting receipts.
  • Lexical findings rely on only 69 tablets; the Lexical-list architectural slot is real but the actual taxonomic content needs to come from external sources (CDLI, ePSD2).
  • Sumerian seals were physical, witnessed, and socially backed — not cryptographic. The SealAuthorityAgent primitive borrows the shape (named principals, revocation, witness sets), not the threat model.
  • Year-names are political artifacts named after royal acts, not a neutral monotonic clock.
  • The general observation "Sumerian admin = proto-information-system" is well-established in popular essays. The contribution here is the empirical statistical mining with cited tablet IDs and shuffled-baseline controls — not the metaphor itself.

Citation

If you find this useful in your own work, please cite the underlying corpus:

Simmons, C., Diehl Martinez, R., & Jurafsky, D. (2024). SumTablets: A Transliteration Dataset of Sumerian Tablets. Workshop on Machine Learning for Ancient Languages (ML4AL), ACL 2024. https://aclanthology.org/2024.ml4al-1.20.pdf

Tablet IDs cited throughout (P-numbers and Q-numbers) are CDLI catalog entries and resolvable at:

License

  • Documentation and analysis artifacts (outputs/*, *.md): CC BY 4.0. Use freely with attribution.
  • Python scripts (scripts/*): MIT.

Contributing

Issues and PRs welcome — particularly:

  • Cross-validation against CDLI / Oracc / ePSD2
  • Extension to Akkadian or other periods
  • Counter-examples to any cited template
  • Bug fixes in the analysis scripts
  • Independent reproduction of Phase 3 statistics

Acknowledgements

Built on top of the SumTablets corpus (Simmons et al., 2024, CC BY 4.0) and the broader work of the Cuneiform Digital Library Initiative, Oracc, ETCSL, ePSD2, and the cuneiform NLP community.

About

Empirical mining of the SumTablets cuneiform corpus (91,606 Sumerian tablets) for software-design primitives applicable to modern multi-agent AI systems. ~158 ideas across 16 categories with cited tablet IDs and statistical controls.

Topics

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE-CODE

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages