A local-first command-line tool for coding qualitative interview data and other UTF-8 text corpora.
Bewley is built around four principles:
- No silent data loss — every action is recorded as an immutable event; nothing is ever overwritten.
- Text-first — corpora are plain UTF-8 files; no proprietary formats.
- Full provenance — every coding decision is traceable to a specific document revision.
- Rebuildable state — the SQLite index is a cache; it can always be reconstructed from the event log alone.
Bewley is designed for researchers who want a rigorous, inspectable audit trail for their qualitative analysis — closer in spirit to git than to a GUI NVivo-style tool.
Install from source:
git clone https://github.com/expectedparrot/bewley.git
cd bewley
pip install -e .Verify the install:
bewley --version| Term | Meaning |
|---|---|
| Project | A directory containing corpus/ and .bewley/ metadata. |
| Document | A UTF-8 text file tracked in the corpus. Has a stable identity even if the file is renamed. |
| Revision | An immutable snapshot of a document's content, addressed by SHA-256. |
| Code | A named analytic label (e.g. trust, friction, workaround). |
| Annotation | An application of a code to a whole document or a specific text span. |
| Event | An immutable JSON record of every state-changing operation. The source of truth. |
| Anchor | Metadata stored with each span annotation so it can be relocated when the document is updated. |
mkdir my-study && cd my-study
bewley initThis creates:
my-study/
corpus/ ← put your text files here
.bewley/ ← metadata, event log, object store, SQLite index
Copy or write your interview transcripts into the corpus/ directory, then track them:
bewley add corpus/interview-alice.txt
bewley add corpus/interview-bob.txtCheck what is tracked:
bewley status
bewley list documentsIf you have source interview audio, Bewley can ask OpenAI to transcribe it and then add the transcript as a normal corpus document:
export OPENAI_API_KEY=...
bewley add-audio recordings/interview-alice.m4a --output corpus/interview-alice.txtFor speaker-turn transcripts with timestamps, use the diarization response:
bewley add-audio recordings/interview-alice.m4a \
--output corpus/interview-alice.txt \
--model gpt-4o-transcribe-diarize \
--response-format diarized_jsonInspect the linkage later:
bewley show document corpus/interview-alice.txt
bewley show audio corpus/interview-alice.txtVideo works the same way, but Bewley first extracts audio with ffmpeg and chunks long recordings into transcription-safe pieces before merging the transcript back together:
bewley add-video recordings/interview-alice.mp4 \
--output corpus/interview-alice.txt \
--response-format verbose_json
bewley show video corpus/interview-alice.txtDefine your analytic codes:
bewley code create trust
bewley code create friction
bewley code create workaround
bewley code listWhole-document annotation — mark a document as belonging to a theme:
bewley annotate apply trust corpus/interview-alice.txt --documentSpan annotation by line range — apply a code to specific lines:
bewley annotate apply friction corpus/interview-alice.txt --lines 14:22Span annotation by byte offset (for precision):
bewley annotate apply workaround corpus/interview-bob.txt --bytes 1024:1280Add an optional memo to any annotation:
bewley annotate apply trust corpus/interview-alice.txt --lines 5:10 --memo "Explicit trust in the platform despite past issues"Show all snippets for a code:
bewley show snippets --code frictionInspect a specific document's revision history and annotations:
bewley show document corpus/interview-alice.txtInspect a specific annotation:
bewley annotate show <annotation-id>Boolean queries return documents (or overlapping annotations) that match:
# Documents that have both codes somewhere in them
bewley query "trust AND friction"
# Documents with one code but not the other
bewley query "workaround AND NOT trust"
# Annotation-level: only where spans actually overlap
bewley query "trust AND friction" --mode annotationDefault mode is document. Use --mode annotation for stricter, co-located matching.
Rename a code without losing any history:
bewley code rename workaround coping-strategyAdd an alias so old queries still resolve:
bewley code alias coping-strategy workaroundMerge two codes into one:
bewley code merge trust reliability --into credibilityShow a code and all its annotations:
bewley code show credibilityWhen an interview transcript is corrected or extended, update it in place. Bewley creates a new immutable revision and attempts to relocate all existing annotations automatically:
# Edit corpus/interview-alice.txt, then:
bewley update corpus/interview-alice.txtIf an annotation cannot be relocated with confidence it is marked conflicted. Resolve it manually:
bewley status # shows conflicted annotations
bewley annotate resolve <annotation-id> --lines 18:25Export snippets for a code as JSONL (with surrounding context lines):
bewley export snippets --code friction --format jsonl --context-lines 3Export verbatim quotes with byte-exact provenance:
bewley export quotes --code friction --format jsonl --context-lines 3Export a full interactive HTML code explorer:
bewley export html --output analysis.html --title "My Study"Export a single annotated document as HTML:
bewley export document-html corpus/interview-alice.txt --output alice-annotated.htmlView the full event log:
bewley history
bewley history --document corpus/interview-alice.txt
bewley history --code frictionUndo a specific event (where supported):
bewley undo <event-id>Verify that every event, object, and projection is internally consistent:
bewley fsckIf the SQLite index is ever corrupted or deleted, rebuild it from the event log:
bewley rebuild-indexmy-study/
corpus/
interview-alice.txt
interview-bob.txt
.bewley/
config.toml ← project settings
HEAD ← pointer to latest event
events/ ← append-only event log (one JSON file per action)
objects/documents/ ← immutable document snapshots (SHA-256 addressed)
objects/audio/ ← immutable stored audio sources for transcribed documents
objects/video/ ← immutable stored video sources for transcribed documents
index/bewley.sqlite ← rebuildable query index (not the source of truth)
locks/write.lock ← prevents concurrent writes
logs/rebuild.log
The .bewley/ directory is the only thing that needs to be backed up (along with corpus/). The SQLite index can always be discarded and rebuilt.
bewley init
bewley status
bewley fsck
bewley rebuild-index
bewley add <path>
bewley add-audio <audio-path> [--output <path>] [--model <model>] [--response-format json|verbose_json|diarized_json]
bewley add-video <video-path> [--output <path>] [--model <model>] [--response-format json|verbose_json|diarized_json]
bewley update <path>
bewley list documents
bewley show document <ref>
bewley show audio <ref>
bewley show video <ref>
bewley code create <name> [--description <text>]
bewley code list
bewley code show <ref>
bewley code rename <old> <new>
bewley code alias <ref> <alias>
bewley code merge <source>... --into <target>
bewley code split <source> --new <target>
bewley annotate apply <code> <document> --document [--memo <text>]
bewley annotate apply <code> <document> --lines <start>:<end> [--memo <text>]
bewley annotate apply <code> <document> --bytes <start>:<end> [--memo <text>]
bewley annotate remove <annotation-id>
bewley annotate show <annotation-id>
bewley annotate resolve <annotation-id> --lines <start>:<end>
bewley annotate resolve <annotation-id> --bytes <start>:<end>
bewley show snippets --code <ref>
bewley query "<expr>" [--mode document|annotation]
bewley export snippets --code <ref> --format jsonl [--context-lines N]
bewley export quotes --code <ref> --format jsonl [--context-lines N]
bewley export quotes --query "<expr>" --format jsonl [--context-lines N]
bewley export html --output <file> [--title <text>]
bewley export document-html <ref> --output <file> [--title <text>]
bewley history [--document <ref>] [--code <ref>] [--annotation <id>]
bewley undo <event-id>
- The event log (
events/) is append-only. No command ever modifies or deletes prior events. - Undo is implemented as a new compensating event, not by erasing history.
- Document revisions are content-addressed by SHA-256 and stored in
objects/. They are immutable. - The SQLite database is a projection of the event log — a cache, not a database of record.
- One writer at a time is enforced via a file lock. Concurrent reads are safe.
- Annotation relocation across revisions is best-effort; uncertain relocations produce explicit
conflictedstatus rather than silent best-guesses.
See LICENSE if present, or contact the maintainers.