obsidian-parse

If you're building tools on top of an Obsidian vault — graph visualizers, backlink finders, migration scripts, or second-brain analytics — obsidian-parse gives you the structured data to work with.

Parses .md, .canvas, and .base files to extract wikilinks, embeds, tags, and frontmatter, then converts them into a D3-compatible knowledge graph. Covers Obsidian-specific syntax (wikilinks, embeds, nested tags, canvas nodes, vault ignore rules, shortest-path link resolution) and leaves everything else to the caller.

Use Cases

Knowledge graph visualization — feed the D3 output directly into force-directed graph layouts
Backlink analysis — find all notes that link to a given note, detect orphaned notes
Tag auditing — enumerate all tags used across a vault, expand nested hierarchies
Vault migration tooling — extract link structure before reformatting or moving notes
Static site generation — resolve wikilinks to build nav and cross-reference systems
Second-brain analytics — track how topics connect across your personal knowledge base

Installation

uv add obsidian-parse

pip install obsidian-parse

Quick Start

from obsidian_parse import parse, results_to_d3

results = parse(["/path/to/vault"])

for r in results:
    print(r.file_id)            # "Project Ideas"
    print(r.wikilink_targets)   # ["Daily Notes", "Tasks.canvas"]
    print(r.tag_names)          # ["python", "python/tools", "status/done"]

graph = results_to_d3(results)
# → {"nodes": [...], "links": [...]}

Output Examples

ParseResult

results = parse(["/vault"])
r = results[0]

r.file_id        # "Project Ideas"
r.path           # PosixPath('/vault/Projects/Project Ideas.md')
r.frontmatter    # {"tags": ["python", "tools"], "status": "active"}
r.wikilink_targets  # ["Daily Notes", "Tasks.canvas", "References/Paper"]
r.tag_names      # ["python", "tools", "python/tools"]  (body + frontmatter, merged)

r.wikilinks[0]
# WikiLink(target="Daily Notes", section="#Week 3", alias="this week", line=5, col=0)

r.wikilinks[1]
# WikiLink(target="Paper", section="^abc123", alias=None, line=12, col=4)

D3 Graph

graph = results_to_d3(results)

{
  "nodes": [
    {"id": "Project Ideas", "type": "file", "label": "Project Ideas"},
    {"id": "Daily Notes",   "type": "file", "label": "Daily Notes"},
    {"id": "Tasks.canvas",  "type": "file", "label": "Tasks.canvas"},
    {"id": "#python",       "type": "tag",  "label": "python"},
    {"id": "#python/tools", "type": "tag",  "label": "python/tools"}
  ],
  "links": [
    {"source": "Project Ideas", "target": "Daily Notes",  "relation": "wikilink"},
    {"source": "Project Ideas", "target": "Tasks.canvas", "relation": "wikilink"},
    {"source": "Project Ideas", "target": "#python",      "relation": "tag"},
    {"source": "#python/tools", "target": "#python",      "relation": "parent"}
  ]
}

Link relation values: wikilink, embed, tag, parent (tag hierarchy).

API

`parse(paths)`

Accepts a list of file or directory paths. Directories are scanned recursively. Respects .obsidian/app.json ignore rules and skips dotfiles/dotfolders.

Returns a list of ParseResult objects.

Raises:

NoPathsProvidedError — if paths is empty
PathNotFoundError — if none of the paths exist
NoMarkdownFilesError — if paths exist but contain no parseable files

`ParseResult`

Property	Type	Description
`file_id`	`str`	Node ID — `.md` extension omitted, `.canvas`/`.base` kept (e.g. `"Note"`, `"Board.canvas"`)
`path`	`Path`	Original file path
`frontmatter`	`dict`	Parsed YAML frontmatter
`wikilinks`	`list[WikiLink]`	Wikilinks with line/col positions
`embeds`	`list[Embed]`	Embeds with line/col positions
`tags`	`list[TagRef]`	Tags with line/col positions
`wikilink_targets`	`list[str]`	Deduplicated link targets (computed)
`embed_targets`	`list[str]`	Deduplicated embed targets (computed)
`tag_names`	`list[str]`	Merged body + frontmatter tags (computed)

`WikiLink`

Field	Type	Description
`target`	`str`	Link target — `.md` extension omitted, other extensions kept
`section`	`str \| None`	Heading (`#Section`) or block ref (`^id`)
`alias`	`str \| None`	Display alias after `\|`
`line`	`int \| None`	Source line number
`col`	`int \| None`	Source column number

`Embed`

Field	Type	Description
`target`	`str`	Embed target — `.md` extension omitted, other extensions kept
`section`	`str \| None`	Heading or block ref
`line`	`int \| None`	Source line number
`col`	`int \| None`	Source column number

`TagRef`

Field	Type	Description
`name`	`str`	Tag name without leading `#`
`line`	`int \| None`	Source line number
`col`	`int \| None`	Source column number

`parse_file(file_path)`

Parses a single file by dispatching to the correct parser based on extension.

Raises UnsupportedFileTypeError for unregistered extensions.

`parse_markdown_file(file_path)`

Reads and parses a single .md file directly, returning a ParseResult.

`parse_markdown_content(content, file_id, path)`

Parses raw markdown string content without reading from disk. Useful for testing or in-memory workflows.

`find_file_by_id(vault_root, file_id, *, known_files=None)`

Resolves a file_id to a vault-relative path.

Bare stem or "stem.md" → searches .md files only
"stem.canvas" / "stem.base" → searches that exact extension

When multiple files match, the shallowest path wins (Obsidian's shortest-path behavior). Pass known_files (from discover_files()) to avoid repeated filesystem traversal.

from pathlib import Path
from obsidian_parse import find_file_by_id

vault = Path("/path/to/vault")
find_file_by_id(vault, "Note")          # → Path("folder/Note.md") or None
find_file_by_id(vault, "Board.canvas")  # → Path("Board.canvas") or None

`expand_nested_tag(tag)`

Expands a nested tag into all ancestor segments.

from obsidian_parse.utils.tags import expand_nested_tag

expand_nested_tag("topic/subtopic/leaf")  # ["topic", "topic/subtopic", "topic/subtopic/leaf"]

`results_to_d3(results)`

Converts a list of ParseResult into a D3-compatible dict with nodes and links.

Supported File Types

Extension	Parsing
`.md`	Frontmatter, wikilinks, embeds, tags
`.canvas`	Wikilinks from file nodes; wikilinks + tags from text nodes
`.base`	Registered as graph nodes (no link extraction)

Extraction is block-aware: wikilinks and tags inside code fences or HTML blocks are ignored.

Development

uv sync --group dev   # install with dev dependencies
ruff check src/       # lint
mypy src/             # type check

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src/obsidian_parse		src/obsidian_parse
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

obsidian-parse

Use Cases

Installation

Quick Start

Output Examples

ParseResult

D3 Graph

API

`parse(paths)`

`ParseResult`

`WikiLink`

`Embed`

`TagRef`

`parse_file(file_path)`

`parse_markdown_file(file_path)`

`parse_markdown_content(content, file_id, path)`

`find_file_by_id(vault_root, file_id, *, known_files=None)`

`expand_nested_tag(tag)`

`results_to_d3(results)`

Supported File Types

Development

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

obsidian-parse

Use Cases

Installation

Quick Start

Output Examples

ParseResult

D3 Graph

API

parse(paths)

ParseResult

WikiLink

Embed

TagRef

parse_file(file_path)

parse_markdown_file(file_path)

parse_markdown_content(content, file_id, path)

find_file_by_id(vault_root, file_id, *, known_files=None)

expand_nested_tag(tag)

results_to_d3(results)

Supported File Types

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages

`parse(paths)`

`ParseResult`

`WikiLink`

`Embed`

`TagRef`

`parse_file(file_path)`

`parse_markdown_file(file_path)`

`parse_markdown_content(content, file_id, path)`

`find_file_by_id(vault_root, file_id, *, known_files=None)`

`expand_nested_tag(tag)`

`results_to_d3(results)`