Skip to content

Capture the essence of your codebase - pack directories into single files for LLM consumption. Zig library + CLI application

License

Notifications You must be signed in to change notification settings

bkataru/zeitgeist

Repository files navigation

zeitgeist

/ˈtsaɪtɡaɪst/ — German for "spirit of the age": the defining spirit or mood of a particular period of history as shown by the ideas and beliefs of the time.

A fast, powerful directory-to-file compiler built in Zig for maximum performance and zero dependencies.

Just as the word captures the essence of an era, zeitgeist captures the essence of your codebase — compiling it into a single, coherent file optimized for LLM consumption.

Features

  • Shortword/Longword Usage: Use zg as a short alias for zeitgeist, both commands work identically — like rg for ripgrep or erd for erdtree
  • Multiple Output Formats: XML, Markdown, JSON, and plain text
  • Custom Templates: Define your own output format with placeholders
  • Smart Filtering: Glob patterns for include/exclude, respects .gitignore by default
  • Unignore Patterns: Override ignore rules for specific files (--unignore)
  • Custom Ignore Files: Support for .zeitgeistignore for project-specific exclusions
  • .ignore File Support: Respects ripgrep-style .ignore files (disable with --no-dot-ignore)
  • Directory Tree: Visual tree structure in output header
  • Tree-Only Mode: Output directory structure without file contents
  • Line Numbers: Optional line number annotations
  • Token Counting: Accurate token estimation with multiple encodings (cl100k_base, o200k_base)
  • Top Files Summary: Show largest files by token count (--no-file-summary to omit)
  • Statistics Summary: Detailed stats with --stats flag
  • Output Chunking: Split large outputs for LLM context limits (--max-tokens 128k --split)
  • Remote Repository Support: Pack GitHub/GitLab/Bitbucket/Gitea/Codeberg repos directly
  • Security Scanning: Detect secrets and API keys before packing
  • Git Integration: Sort by recency, include diffs and logs
  • Category Sorting: Sort files by category (source, docs, tests, config)
  • Priority Rules: Fine-grained control with pattern-based priority boosts
  • Config Files: zeitgeist.json, zeitgeist.yaml, or zeitgeist.toml
  • Custom Instructions: Include instruction files for AI context
  • Clipboard Support: Copy output directly to clipboard
  • Stdout Mode: Pipe output directly (zeitgeist --stdout | pbcopy)
  • MCP Server: Model Context Protocol support for tool-based access
  • AI Skill Output: Generate structured documentation for AI agents (--skill-output)
  • Cross-Platform: Windows, Linux, and macOS support
  • Self-Update: Check for and install updates from GitHub (zeitgeist update)
  • TOML Config: Support for zeitgeist.toml configuration files
  • Global Config: User-wide configuration with zeitgeist init --global
  • Multi-Encoding: Automatic detection of UTF-16 LE/BE and Latin-1 files

Installation

From Source

Requires Zig 0.15.2 or later.

git clone https://github.com/bkataru/zeitgeist.git
cd zeitgeist
zig build -Doptimize=ReleaseFast

# The binaries will be at ./zig-out/bin/zeitgeist and ./zig-out/bin/zg
# Optionally, install to PATH:
zig build install --prefix ~/.local

Note: Building automatically creates both zeitgeist and zg executables. Use whichever you prefer.

Pre-built Binaries

Download from the Releases page.

Quick Start

# Pack current directory (use 'zg' or 'zeitgeist' interchangeably)
zg

# Pack and pipe to clipboard (macOS)
zg --stdout | pbcopy

# Pack a GitHub repository
zg owner/repo

# Create a config file
zg init

# Check for updates
zg update --check

# Update to latest version
zg update

Usage

Basic Usage

# Pack current directory to zeitgeist-output.xml
zeitgeist

# Pack specific directory
zeitgeist ./src

# Specify output file and format
zeitgeist ./src -o output.md -s markdown

# Output to stdout for piping
zeitgeist --stdout | pbcopy

Output Formats

# XML (default) - best for structured parsing
zeitgeist -s xml

# Markdown - readable, good for documentation
zeitgeist -s markdown

# JSON - machine-readable
zeitgeist -s json

# Plain text - minimal overhead
zeitgeist -s text

Filtering

# Include only specific patterns
zeitgeist -i "*.zig,*.md"

# Exclude patterns
zeitgeist -e "test_*,*.tmp"

# Include hidden files
zeitgeist --hidden

# Ignore .gitignore rules
zeitgeist --no-gitignore

# Use custom ignore file
zeitgeist --ignore-file .zeitgeistignore

# Maximum file size
zeitgeist --max-size 1MB

Output Modes

# Tree-only mode (just directory structure, no file contents)
zeitgeist --tree-only

# Metadata only (tree + stats, no file contents)
zeitgeist --no-files

# Include custom instruction file (for AI context)
zeitgeist --instruction INSTRUCTIONS.md

# Add custom header text
zeitgeist --header-text "Project: MyApp v1.0"

# Show top N files by token count
zeitgeist --top-files 10

# Sort by file category (source files last for LLM context)
zeitgeist --categorize

# Disable progress spinner (for CI/scripting)
zeitgeist --no-progress

Remote Repositories

Pack GitHub, GitLab, Bitbucket, Gitea, or Codeberg repositories directly:

# GitHub (multiple URL formats)
zeitgeist owner/repo
zeitgeist github.com/owner/repo
zeitgeist https://github.com/owner/repo

# Specific branch
zeitgeist owner/repo -b develop

# Private repository (with token)
zeitgeist owner/private-repo --token ghp_xxxxx

# GitLab and Bitbucket
zeitgeist gitlab.com/owner/repo
zeitgeist bitbucket.org/owner/repo

# Gitea and Codeberg
zeitgeist codeberg.org/owner/repo
zeitgeist gitea.example.com/owner/repo

Output Chunking

Split large outputs for LLM context limits:

# Split by token count (e.g., 128k context limit)
zeitgeist --max-tokens 128k --split

# Split by byte size
zeitgeist --max-output-size 10MB --split

# Use specific token encoding
zeitgeist --max-tokens 128k --split --token-encoding o200k_base

# Include header in all chunks
zeitgeist --max-tokens 32k --split --include-header-all

Output files are named zeitgeist-output-1.xml, zeitgeist-output-2.xml, etc.

Git Integration

# Sort files by git recency (oldest first, so recent changes appear last for LLMs)
zeitgeist --git-sort

# Include git diffs in output
zeitgeist --include-diffs

# Include git log
zeitgeist --include-logs
zeitgeist --include-logs --log-count 20

Security Scanning

Zeitgeist automatically scans for secrets and API keys:

# Default: security scanning is enabled
zeitgeist

# Disable security scanning
zeitgeist --no-security-check

# Write security report to file
zeitgeist --security-report security.txt

# Block output if secrets detected
zeitgeist --block-secrets

Detected patterns include:

  • AWS Access Keys and Secret Keys
  • GitHub/GitLab/Slack tokens
  • Private keys (RSA, DSA, EC, OpenSSH)
  • Database connection strings
  • JWT tokens
  • Generic API keys and passwords

Config Files

Create a zeitgeist.json for persistent project settings:

# Create config interactively
zeitgeist init

# Create with defaults (non-interactive)
zeitgeist init -f

# Create YAML format
zeitgeist init --yaml

# Create TOML format
zeitgeist init --toml

# Create global config (user-wide)
zeitgeist init --global

# Use specific config file
zeitgeist --config ./custom-config.json

Example zeitgeist.json:

{
  "$schema": "https://raw.githubusercontent.com/bkataru/zeitgeist/main/schema/zeitgeist.schema.json",
  "output": {
    "style": "markdown",
    "filePath": "output.md",
    "showLineNumbers": false,
    "includeTree": true
  },
  "include": ["**/*.zig", "**/*.md"],
  "exclude": ["**/test_*", "**/.*"],
  "security": {
    "enableSecurityCheck": true
  },
  "ignore": {
    "useGitignore": true,
    "useDefaultPatterns": true,
    "customIgnoreFile": ".zeitgeistignore"
  },
  "processing": {
    "categorize": true,
    "convertNotebooks": true,
    "instructionFile": "INSTRUCTIONS.md",
    "headerText": "Project Documentation",
    "topFilesCount": 5
  },
  "categoryWeights": {
    "source": 20,
    "docs": 15,
    "test": 10,
    "config": 5,
    "other": 1
  },
  "priorityRules": [
    {"pattern": "src/**/*.zig", "boost": 10},
    {"pattern": "tests/**", "boost": -5},
    {"pattern": "README.md", "boost": 20}
  ]
}

Example zeitgeist.yaml:

output:
  style: markdown
  filePath: output.md
  includeTree: true

include:
  - "**/*.zig"
  - "**/*.md"

exclude:
  - "**/test_*"

ignore:
  useGitignore: true
  customIgnoreFile: .zeitgeistignore

processing:
  categorize: true
  headerText: "Project Documentation"
  topFilesCount: 5

categoryWeights:
  source: 20
  docs: 15
  test: 10
  config: 5
  other: 1

priorityRules:
  - pattern: "src/**/*.zig"
    boost: 10
  - pattern: "tests/**"
    boost: -5
  - pattern: "README.md"
    boost: 20

Example zeitgeist.toml:

[output]
style = "markdown"
filePath = "output.md"
showLineNumbers = false
includeTree = true

include = ["**/*.zig", "**/*.md"]
exclude = ["**/test_*", "**/.*"]

[security]
enableSecurityCheck = true

[ignore]
useGitignore = true
useDefaultPatterns = true
customIgnoreFile = ".zeitgeistignore"

[processing]
categorize = true
convertNotebooks = true
instructionFile = "INSTRUCTIONS.md"
headerText = "Project Documentation"
topFilesCount = 5

[categoryWeights]
source = 20
docs = 15
test = 10
config = 5
other = 1

[[priorityRules]]
pattern = "src/**/*.zig"
boost = 10

[[priorityRules]]
pattern = "tests/**"
boost = -5

[[priorityRules]]
pattern = "README.md"
boost = 20

Config files are searched in order:

  1. zeitgeist.json (project-local)
  2. .zeitgeist.json (hidden variant)
  3. zeitgeist.yaml (YAML variant)
  4. zeitgeist.toml (TOML variant)
  5. Global config location:
    • Windows: %APPDATA%\zeitgeist\config.json (or .yaml/.toml)
    • macOS: ~/.config/zeitgeist/config.json
    • Linux: ~/.config/zeitgeist/config.json

JSON Schema: For editor autocompletion and validation, add the $schema property to your zeitgeist.json:

{
  "$schema": "https://raw.githubusercontent.com/bkataru/zeitgeist/main/schema/zeitgeist.schema.json"
}

Category Weights & Priority Rules

When using --categorize, files are sorted by category priority so that more important files appear later in the output (optimal for LLM context, where later content gets more attention).

Category Weights control the base priority for each category:

  • source (default: 20) - Source code files
  • docs (default: 15) - Documentation files
  • test (default: 10) - Test files
  • config (default: 5) - Configuration files
  • other (default: 1) - Everything else

Priority Rules allow fine-grained control with glob patterns:

  • pattern: Glob pattern to match (supports *, **, ?)
  • boost: Integer to add/subtract from the file's priority

Final priority = Category Weight + Priority Boost

Example use case: Boost README.md to always appear last:

{
  "priorityRules": [
    {"pattern": "README.md", "boost": 100}
  ]
}

MCP Server Mode

Zeitgeist can run as an MCP (Model Context Protocol) server for integration with LLM tools:

zeitgeist serve

This exposes tools and resources that can be accessed by MCP-compatible clients:

Tools:

  • pack: Pack a directory with configurable options
  • scan: Scan a directory and return file listing

Resources:

  • zeitgeist://cwd - Returns the current working directory path
  • zeitgeist://tree - Returns a directory tree listing of the current directory
  • zeitgeist://file/{path} - Returns the content of a specific file (replace {path} with the relative file path)

AI Skill Output

Generate structured documentation optimized for AI agents and skills:

# Generate skill documentation for current directory
zeitgeist --skill-output ./my-skill .

# Generate skill for a specific directory
zeitgeist --skill-output ./lib-skill ./src/lib

# Generate skill from a GitHub repository
zeitgeist owner/repo --skill-output ./repo-skill

This creates a directory structure designed for AI consumption:

my-skill/
├── SKILL.md                      # Main entry point with usage guide
└── references/
    ├── summary.md                # Purpose, format, and statistics
    ├── project-structure.md      # Directory tree with line counts
    ├── files.md                  # All file contents
    └── tech-stack.md             # Languages, frameworks detected

AI agents can use these files to:

  • Understand project structure via project-structure.md
  • Search for code patterns by grepping files.md
  • Find specific files with ## File: <path> markers
  • Check tech stack via tech-stack.md

Self-Update

Zeitgeist can update itself to the latest version from GitHub:

# Check for updates without installing
zeitgeist update --check

# Update to the latest version
zeitgeist update

The update command:

  • Checks GitHub releases for the latest version
  • Compares with the current installed version
  • Downloads and replaces the binary (with appropriate permissions)
  • Supports all platforms (Windows, macOS, Linux)

All CLI Options

Note: In all examples below, you can use zg instead of zeitgeist.

USAGE:
    zeitgeist [OPTIONS] [PATH|URL]...
    zeitgeist init [-f] [--yaml|--toml] [--global]
    zeitgeist update [--check]
    zeitgeist serve
    zeitgeist help | version

    Alias: zg (works identically to zeitgeist)

FILE SELECTION:
    --stdin                   Read file paths from stdin (one per line)
    -i, --include <patterns>  Include patterns (comma-separated globs)
    -e, --exclude <patterns>  Exclude patterns (comma-separated globs)
    --unignore <patterns>     Override ignore patterns (comma-separated globs)
    --no-gitignore            Don't respect .gitignore
    --no-dot-ignore           Don't respect .ignore files (ripgrep-style)
    --hidden                  Include hidden files
    --follow-symlinks         Follow symbolic links (default: skip)
    --include-empty-directories  Include empty directories in tree
    --max-size <bytes>        Maximum file size to include
    --max-depth <n>           Maximum directory depth to traverse
    --max-files <n>           Maximum number of files to process
    --config <file>           Use a specific config file
    --no-config               Skip loading config files

OUTPUT OPTIONS:
    -o, --output <file>       Output file path (default: zeitgeist-output.xml)
    --output-dir <dir>        Output directory with auto-generated filename
    --stdout                  Write output to stdout (for piping)
    -s, --style <format>      Output format: xml, markdown, md, text, txt, json
    -t, --template <tmpl>     Custom output template with placeholders
    --no-tree                 Don't include directory tree header
    -n, --line-numbers        Include line numbers in output
    -c, --copy                Copy output to clipboard
    --compress                Compress output (remove comments/whitespace)
    --tree-only               Output only directory tree (no file contents)
    --token-count-tree [n]    Show tree with token counts (n = min threshold)
    --no-files                Output metadata only (tree, stats) without file contents
    --no-progress             Disable progress spinner (for scripting/CI)
    --stats                   Show detailed statistics summary
    --instruction <file>      Include instruction file content for AI context
    --header-text <text>      Custom header text at the top of output
    --top-files <n>           Show top N files by token count in summary
    --no-file-summary         Omit top files summary section from output
    --ignore-file <file>      Load additional ignore patterns from file
    --max-tokens <limit>      Max tokens per chunk (e.g., 128k)
    --max-output-size <size>  Max bytes per chunk (e.g., 10MB)
    --split                   Enable output splitting
    --include-header-all      Include header/tree in all chunks
    --token-encoding <enc>    Token counting: simple, cl100k_base, o200k_base
    --remove-empty-lines      Remove consecutive empty lines from output
    --truncate-base64         Truncate base64 data URIs to reduce tokens
    --skill-output <dir>      Generate AI skill documentation to directory
    -q, --quiet               Suppress all non-error output

GIT OPTIONS:
    --git-sort                Sort files by git recency
    --categorize              Sort files by category (source files last)
    --include-diffs           Include git diffs in output
    --include-logs            Include git log in output
    --log-count <n>           Number of log entries (default: 10)

NOTEBOOK OPTIONS:
    --convert-notebooks       Convert .ipynb files to markdown (default)
    --no-convert-notebooks    Keep .ipynb files as raw JSON

SECURITY OPTIONS:
    --security-check          Enable security scanning (default)
    --no-security-check       Disable security scanning
    --security-report <file>  Write security report to file
    --block-secrets           Block output if secrets detected

REMOTE OPTIONS:
    -r, --remote              Force treat input as remote URL
    -b, --branch <branch>     Branch to clone
    --tag <tag>               Git tag to clone
    --commit <sha>            Git commit SHA to checkout
    --token <token>           Auth token for private repos
    --recurse-submodules      Clone git submodules recursively
    --subpath <path>          Only clone specific subdirectory (sparse checkout)

GENERAL OPTIONS:
    -v, --verbose             Verbose output
    -q, --quiet               Suppress non-error output
    --no-color                Disable colored output
    -h, --help                Show help
    --version                 Show version

INIT OPTIONS:
    -f, --force               Create config without prompts (non-interactive)
    -g, --global              Create config in global location
    --yaml                    Create config in YAML format
    --toml                    Create config in TOML format

UPDATE OPTIONS:
    --check                   Check for updates without installing

Library Usage

Zeitgeist can also be used as a Zig library:

const std = @import("std");
const zeitgeist = @import("zeitgeist");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var packer = try zeitgeist.Packer.init(allocator, .{
        .output_style = .markdown,
        .include_tree = true,
    });
    defer packer.deinit();

    try packer.scan("./src");
    const result = try packer.pack();
    defer allocator.free(result.output);

    std.debug.print("Packed {d} files, ~{d} tokens\n", .{
        result.stats.files_processed,
        result.stats.estimated_tokens,
    });
}

Add to your build.zig.zon:

.dependencies = .{
    .zeitgeist = .{
        .url = "git+https://github.com/bkataru/zeitgeist.git",
        .hash = "...", // Use zig fetch to get the hash
    },
},

Performance

Zeitgeist is designed for speed:

  • Single-pass file scanning
  • Efficient memory allocation
  • Zero runtime dependencies
  • Native binary (no interpreter startup)

Typical performance on a medium-sized codebase (~1000 files):

  • Scanning: <100ms
  • Processing: <500ms
  • Total: <1s

Comparison

Feature Zeitgeist Repomix Yek Gitingest
Language Zig TypeScript Rust Python
Binary Size ~500KB N/A (Node) ~2MB N/A (Python)
Startup Time <10ms ~200ms ~50ms ~500ms
Output Formats 4 4 2 2
Config Formats JSON/YAML/TOML JSON YAML/TOML No
Remote Repos Yes Yes No Yes
Config Files Yes Yes Yes No
Global Config Yes Yes No No
Security Scan Yes Yes No No
Token Counting Yes Yes Yes Basic
Output Chunking Yes Yes Yes No
Git Integration Yes Yes Yes No
MCP Server Yes Yes No No
Library API Yes No Yes Yes
Self-Update Yes Yes No No
Multi-Encoding Yes No No No

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

# Run unit tests
zig build test --summary all

# Run integration tests
zig build integration --summary all

# Build release binary
zig build -Doptimize=ReleaseFast

See CHANGELOG.md for version history.

License

MIT License - see LICENSE for details.

Acknowledgments

Inspired by:

  • Repomix - TypeScript codebase packer
  • Yek - Rust codebase serializer
  • Gitingest - Python repo ingester

About

Capture the essence of your codebase - pack directories into single files for LLM consumption. Zig library + CLI application

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published