Skip to content

feat: complete v0.1 — MCP server, HTTP API, audio/video, graph, ingest#2

Merged
ojspace merged 3 commits intomainfrom
feat/v0.1-complete
Mar 18, 2026
Merged

feat: complete v0.1 — MCP server, HTTP API, audio/video, graph, ingest#2
ojspace merged 3 commits intomainfrom
feat/v0.1-complete

Conversation

@ojspace
Copy link
Copy Markdown
Owner

@ojspace ojspace commented Mar 13, 2026

Summary

  • MCP server (src/mcp.ts): Claude/Cursor integration via bunx md-anything-mcp
  • HTTP REST API (src/server.ts): Hono server with /doctor, /convert, /ingest endpoints
  • Audio/video providers: OpenAI Whisper-based transcription with graceful fallbacks
  • Knowledge graph extraction (src/core/graph.ts): local heuristic entity/relation extraction — no cloud APIs
  • PHO-139 runtime capabilities refactor: binary detection pre-computed once, injected into providers
  • Ingest improvements: --graph and --index flags fully wired; generates _index.md table
  • npm publish readiness: files, exports, build, prepublishOnly in package.json
  • README: motivation story, personal usage guide, source manifests docs
  • Live regression tests: YouTube + URL (skipped unless LIVE_TESTS=1)

Test plan

  • bun run test:required — 36 pass, 0 fail
  • bun run lint — type check clean
  • Smoke test: bun run src/cli.ts doctor
  • Smoke test: bun run src/cli.ts convert <local-pdf>
  • Smoke test: bun run src/cli.ts ingest <folder> --index --graph -o /tmp/out
  • Live tests: bun run test:live (requires network)
  • MCP test: add .mcp.json to a project and verify tools appear in Claude

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • HTTP API server for convert/ingest and a stdio MCP server exposing convert, ingest, and doctor tools
    • Audio and video transcription support; improved PDF, HTML, EPUB handling and graph extraction from documents
    • CLI: dedicated convert subcommand, ingest output directory and index generation
  • Documentation

    • Comprehensive README and local runbook with examples and troubleshooting
  • Tests / CI

    • New live integration tests, fixtures, and GitHub Actions workflow

ojspace and others added 2 commits March 13, 2026 20:15
…t improvements

Core additions:
- src/mcp.ts: MCP stdio server exposing convert/ingest/doctor tools
- src/server.ts: Hono HTTP REST API (GET /doctor, POST /convert, POST /ingest)
- src/providers/audio.ts: Whisper-based audio transcription with graceful fallback
- src/providers/video.ts: ffmpeg + Whisper video transcription with graceful fallback
- src/core/graph.ts: local heuristic entity/relation extraction (no cloud APIs)

Refactors:
- PHO-139: runtime capabilities injected into providers (image/pdf/mobi) instead of re-detected per call
- src/core/route-input.ts: routes audio/video kinds; threads runtime.capabilities to all providers
- src/core/ingest.ts: graph extraction, _index.md generation, audio/video support
- src/cli.ts: --graph and --index flags wired up; real subcommand routing

Tests & config:
- Live regression tests for YouTube and URL (skipped unless LIVE_TESTS=1)
- .mcp.json for MCP host integration
- package.json: files/exports/build/prepublishOnly for npm publish readiness
- README: motivation story, personal usage guide, source manifests docs, --graph/--index docs
- CLAUDE.md: local runbook for future sessions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 13, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8fac319a-ae18-4373-934d-6b0ce1476dac

📥 Commits

Reviewing files that changed from the base of the PR and between d78ec37 and 68cd340.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • .github/workflows/ci.yml
  • README.md
  • package.json
  • src/core/route-input.ts
  • src/core/support-levels.ts
  • src/mcp.ts
  • src/providers/epub.ts
  • src/providers/html.ts
  • src/providers/pdf.ts
  • src/server.ts
  • src/utils/html-to-markdown.ts
  • src/utils/split-sections.ts
  • tests/integration/convert-pdf.test.ts
  • tests/unit/support-levels.test.ts

📝 Walkthrough

Walkthrough

Adds MCP stdio server and an HTTP API, new providers (audio, video, enhanced pdf/image/mobi/epub/html), heuristic graph extraction, CLI/ingest changes (convert command, disk output, index), HTML→Markdown utilities, docs, CI, and several integration tests and fixtures.

Changes

Cohort / File(s) Summary
MCP & HTTP Server
.mcp.json, src/mcp.ts, src/server.ts
Adds an MCP stdio server exposing convert/ingest/doctor tools and a Hono-based HTTP API with endpoints for convert, ingest, doctor, and health.
CLI & Ingest
src/cli.ts, src/core/ingest.ts
Adds convert subcommand, writes ingest outputs to disk when requested, integrates graph extraction, and generates _index.md when indexing is enabled.
Providers — new & extended
src/providers/audio.ts, src/providers/video.ts, src/providers/pdf.ts, src/providers/image.ts, src/providers/mobi.ts, src/providers/epub.ts, src/providers/html.ts
Adds audio/video transcription modules, upgrades PDF extraction (unpdf + pdftotext fallback), moves OCR/ebook checks to caller flags, converts HTML/EPUB via HTML→Markdown pipeline, and expands metadata returned by providers.
Graph & Content utilities
src/core/graph.ts, src/utils/html-to-markdown.ts, src/utils/split-sections.ts
Introduces heuristic graph extraction (entities/relations) and utilities to convert HTML→Markdown and split Markdown into sections.
Routing & Capability Passing
src/core/route-input.ts
Adds audio/video input kinds and passes runtime capability flags (tesseract, pdftotext, ebook-convert, whisper) into provider converters.
Server/Dev tooling & packaging
package.json, .github/workflows/ci.yml
Expands package metadata, adds md-anything-mcp and md-anything-server bins, adds scripts (server, mcp, build, test:live), new deps (@modelcontextprotocol/sdk, hono, unpdf), and CI workflow including optional tools setup.
Docs & Runbook
README.md, CLAUDE.md
Adds comprehensive README and local runbook with quick-start, CLI examples, MCP usage, troubleshooting, and smoke checks.
Tests & Fixtures
tests/fixtures/sample.html, tests/fixtures/sample.json, tests/integration/..., tests/unit/support-levels.test.ts
Adds HTML/JSON fixtures, live URL and YouTube integration tests (conditional on LIVE_TESTS), and updates support-level tests for PDF support change.
Support-level tweak
src/core/support-levels.ts
Changes PDF support level from "best-effort" to "strong", updating related tests.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client/API Caller
  participant HTTP as HTTP Server (Hono)
  participant MCP as MCP Server (stdio)
  participant Runtime as Runtime Providers
  participant FS as Filesystem
  participant Ext as External Tools (unpdf, ffmpeg, whisper, tesseract, pdftotext)

  rect rgba(200,230,255,0.5)
    Client->>HTTP: POST /convert or POST /ingest
    HTTP->>Runtime: validate & call convert/ingest
    Runtime->>FS: read input (file or download URL)
    Runtime->>Ext: call tool(s) as needed (unpdf/ffmpeg/whisper/ocr)
    Ext-->>Runtime: return transcripts/text/metadata
    Runtime-->>HTTP: return Markdown/summary/json
    HTTP-->>Client: 200 + payload
  end

  rect rgba(230,255,200,0.5)
    Client->>MCP: MCP tool request (convert/ingest/doctor) via stdio
    MCP->>Runtime: dispatch to same convert/ingest/doctor handlers
    Runtime->>Ext: external tool calls (as above)
    Runtime-->>MCP: tool result
    MCP-->>Client: tool response
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰 Hops in the meadow with a tiny tap-tap,
I stitch transcripts, graphs, and a server-map,
From HTML to Markdown the carrots align,
MCP hums softly — everything's fine,
A rabbit's patch built one small hop at a time. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main feature additions of the v0.1 release, including MCP server, HTTP API, audio/video support, graph extraction, and ingest improvements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/v0.1-complete
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use OpenGrep to find security vulnerabilities and bugs across 17+ programming languages.

OpenGrep is compatible with Semgrep configurations. Add an opengrep.yml or semgrep.yml configuration file to your project to enable OpenGrep analysis.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/providers/mobi.ts (1)

50-53: ⚠️ Potential issue | 🟠 Major

Potential command injection via filePath.

The filePath is interpolated directly into a shell command string. If the path contains shell metacharacters (e.g., backticks, $(), semicolons), arbitrary commands could be executed.

Consider using execFileSync with an arguments array instead of execSync with string interpolation to avoid shell interpretation:

🔒 Proposed fix using execFileSync
-import { execSync } from "node:child_process";
+import { execFileSync } from "node:child_process";

...

-      execSync(`ebook-convert "${filePath}" "${outPath}" 2>/dev/null`, {
-        timeout: 60000,
-      });
+      execFileSync("ebook-convert", [filePath, outPath], {
+        timeout: 60000,
+        stdio: ["ignore", "ignore", "ignore"],
+      });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/providers/mobi.ts` around lines 50 - 53, The call to execSync
interpolates filePath into a shell string which allows command injection;
replace this with execFileSync (or spawnSync) and pass "ebook-convert" as the
executable and [filePath, outPath] as the args array, preserve the timeout
option and ensure shell is false, and remove any shell redirection (2>/dev/null)
so stderr can be captured via options.stdio or try/catch; update the code
locations referencing execSync, filePath and outPath in src/providers/mobi.ts to
use execFileSync/ spawnSync and argument arrays instead of string interpolation.
🧹 Nitpick comments (8)
tests/integration/url-live.test.ts (1)

40-48: Minor: Redundant kind assertion duplicated across tests.

The assertion expect(result.kind).toBe("url") appears in both this test (line 46) and the first test (line 25). Consider consolidating this into a single test or removing the duplicate to avoid redundant network calls during the live test run.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/url-live.test.ts` around lines 40 - 48, The test duplicates
the `expect(result.kind).toBe("url")` assertion; update tests to avoid redundant
network calls by removing the duplicate `expect(result.kind).toBe("url")` from
this "should report source_type as url" test (or consolidate it into the first
test that also calls convertToMarkdown), keeping the check once where
convertToMarkdown is already invoked; locate the assertions around the
convertToMarkdown call that uses TARGET_URL, DEFAULT_CONFIG.options and runtime,
and either delete the redundant `result.kind` assertion here or move any
additional expectations (e.g., usefulness_score) into the single test that
retains the kind assertion (ensuring convertToMarkdown is still awaited and
other expectations remain).
src/server.ts (1)

128-135: Side effect: console.log executes on module import.

The console.log statement at module level will execute whenever this module is imported, even if the server isn't being started (e.g., during testing). Consider moving startup logging inside a conditional or using Bun's serve callback:

♻️ Proposed fix to avoid import side effects
 const port = parseInt(process.env.PORT ?? "3000", 10);

+// Only log when running as main module
+if (import.meta.main) {
+  console.log(`md-anything API running on http://localhost:${port}`);
+}
+
 export default {
   port,
   fetch: app.fetch,
 };
-
-console.log(`md-anything API running on http://localhost:${port}`);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server.ts` around lines 128 - 135, The console.log at module scope causes
a startup side-effect on import; move that logging into the actual server-start
path so it only runs when the server is started. Specifically, remove the
module-level console.log and add the log inside the code that invokes the server
(e.g., where you call Bun.serve or the function that starts the server),
referencing the same port value (port) and app.fetch context; alternatively wrap
the log in a conditional that checks a runtime flag (e.g., if (require.main ===
module) or an environment variable) so console.log is not executed during
imports/tests.
src/core/route-input.ts (1)

43-46: Audio and video routing looks correct, but note the ffmpeg detection inconsistency.

The capability injection for hasWhisper is consistent with the pattern used for other providers. However, per src/providers/video.ts (context snippet 2), the video provider also performs its own hasFfmpeg() check internally using which ffmpeg. This creates an inconsistency: whisper availability is pre-detected and injected, while ffmpeg availability is checked at conversion time.

Consider adding hasFfmpeg to RuntimeCapabilities in src/core/runtime.ts and passing it to convertVideo() for consistency, especially if you need to mock or test capability combinations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/route-input.ts` around lines 43 - 46, The route handling injects
caps.hasWhisper into convertAudio/convertVideo but ffmpeg availability is still
checked inside src/providers/video.ts; add a hasFfmpeg boolean to the
RuntimeCapabilities type in src/core/runtime.ts, populate it where runtime
capabilities are built (using the same detection used now), and pass
caps.hasFfmpeg into convertVideo from src/core/route-input.ts (alongside
caps.hasWhisper) so convertVideo(received) uses the injected flag (or remove the
internal which ffmpeg check in convertVideo and rely on the injected hasFfmpeg)
to keep capability detection consistent and testable.
README.md (2)

332-346: Add language specifier to fenced code block.

Per markdownlint, fenced code blocks should have a language specified. For the directory structure, use txt or text:

📝 Proposed fix
-```
+```txt
 src/
   cli.ts          # CLI entry point
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 332 - 346, The fenced code block that lists the
project directory (the block starting with "src/" and the directory tree lines)
is missing a language specifier; update that block header from ``` to ```txt (or
```text) so markdownlint passes and the directory listing is treated as plain
text.

98-103: Add language specifier to fenced code block.

Per markdownlint, fenced code blocks should have a language specified. Since this is a plain text file format, use txt or text:

📝 Proposed fix
-```
+```txt
 # sources.txt — one source per line, # for comments
 https://www.youtube.com/watch?v=EqhKw0Oro_k
 https://hono.dev/docs/getting-started/basic
 https://example.com/some-article
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @README.md around lines 98 - 103, The fenced code block showing the example
"sources.txt — one source per line, # for comments" in README.md lacks a
language specifier; update that opening fence to include a plain-text language
(e.g., change "" to "txt" or "```text") so the block is marked as txt,
ensuring markdownlint compliance for the block containing the three example
URLs.


</details>

</blockquote></details>
<details>
<summary>src/mcp.ts (1)</summary><blockquote>

`154-158`: **Consider a more specific error type for unknown tools.**

The generic `Error` thrown for unknown tools is functional but could be improved with a more specific error that MCP clients can handle programmatically. This is a minor enhancement.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@src/mcp.ts` around lines 154 - 158, Replace the generic Error thrown for
unknown tools with a dedicated error class so callers can handle it
programmatically: add and export a new UnknownToolError (or ToolNotFoundError)
class in the module and throw new UnknownToolError(name) instead of throw new
Error(`Unknown tool: ${name}`) in the code that currently throws the generic
error; ensure the error includes the tool name/message and preserves the
original stack by extending Error so MCP clients can use instanceof
UnknownToolError to detect this condition.
```

</details>

</blockquote></details>
<details>
<summary>src/core/ingest.ts (2)</summary><blockquote>

`51-64`: **Escape markdown table cells in generated index.**

At Line 60, raw `doc.title` and `doc.source` are inserted into table cells. Values containing `|` or newlines will corrupt table rendering.



<details>
<summary>💡 Suggested fix (escape table cell content)</summary>

```diff
 function buildIndexMarkdown(docs: IngestDoc[], folderPath: string): string {
+  const esc = (v: string) => v.replace(/\|/g, "\\|").replace(/\r?\n/g, " ");
   const lines = [
@@
   for (const doc of docs) {
-    lines.push(`| [${doc.title}](${doc.fileName}) | ${doc.sourceType} | ${doc.source} |`);
+    lines.push(`| [${esc(doc.title)}](${doc.fileName}) | ${esc(doc.sourceType)} | ${esc(doc.source)} |`);
   }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@src/core/ingest.ts` around lines 51 - 64, buildIndexMarkdown inserts raw
doc.title and doc.source into markdown table cells which breaks rendering if
values contain '|' or newlines; create and use a small helper (e.g.
escapeMarkdownCell) inside buildIndexMarkdown to replace pipe characters with
'\|' and convert newlines to a safe inline representation (e.g. space or '<br>')
and pass escaped values (escapeMarkdownCell(doc.title),
escapeMarkdownCell(doc.source)) when building the table rows so table cells
remain valid.
```

</details>

---

`110-124`: **Extract duplicated graph-mapping logic into a helper.**

The same graph extraction/mapping block is repeated for manifest sources and regular files. Consolidating this reduces drift risk.




Also applies to: 145-159

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @CLAUDE.md:

  • Around line 58-61: The troubleshooting note describing the CLI's convert
    failure is out of date: update the documentation so it reflects the current
    behavior where running the CLI with no input returns "Missing input for convert
    command." and exit code 1 (as produced by the convert command handler), and
    remove or replace the old reference to the parser fallback "Input: convert"
    which no longer applies; ensure the note mentions the exact error string
    ("Missing input for convert command.") and the correct remediation (provide
    input or use the appropriate flags) and delete the obsolete "Input: convert"
    troubleshooting hint.

In @src/cli.ts:

  • Around line 70-79: The current write loop uses doc.fileName directly and can
    silently overwrite files; update the loop that writes files (the block using
    outputDir, result.docs, formatMarkdown, and writeFile) to detect filename
    collisions and generate unique names before writing (e.g., maintain a Set of
    used names and, when a collision on doc.fileName occurs, append a deterministic
    suffix like "-1", "-2", etc., to the basename until unique), then compute
    outPath from that unique name and call writeFile; ensure you preserve file
    extensions and keep the uniqueness logic scoped to this write routine so
    recursive ingests or same basenames do not clobber earlier outputs.

In @src/core/graph.ts:

  • Around line 77-83: The current code in src/core/graph.ts builds relations by
    linking every person to the first three places and concepts (the nested loops
    over persons, places.slice(0,3) and concepts.slice(0,3)), which creates spurious
    assertions; remove or disable those loops so the function only emits entities
    and leaves the relations array empty (or only adds relations when explicit local
    co‑occurrence/provenance data is available), and add a clear TODO comment in the
    same function (where relations is built) pointing to future logic that will
    infer relations from true local context.

In @src/core/ingest.ts:

  • Around line 167-177: The current block that always pushes an index document
    using fileName "_index.md" (inside the options.index branch using
    buildIndexMarkdown, docs, basename(folderPath), folderPath) can collide with
    existing documents; change the logic to reserve a unique index filename by
    computing a candidate (e.g. "_index.md") and, before pushing, loop/check
    docs.some(d => d.fileName === candidate) and if it exists append a numeric
    suffix (e.g. "_index-1.md", "_index-2.md", ...) until you find a name that is
    not present, then use that unique candidate as the fileName when pushing the
    index document.

In @src/mcp.ts:

  • Around line 85-104: In the convert branch, calling resolve(input) before
    passing to convertToMarkdown mangles URL strings; change the code to only
    resolve filesystem paths and leave URL inputs untouched by detecting URLs (e.g.,
    via new URL(...) or checking http(s) prefix) and passing the original input
    string to convertToMarkdown when it's a URL; update the convert branch around
    the input handling (the input variable used in the convertToMarkdown call) so
    URL inputs bypass resolve while non-URL paths continue to be resolved using
    resolve(input).

In @src/providers/audio.ts:

  • Around line 41-64: Update the audio provider to check for ffmpeg availability
    like the video provider: call a hasFfmpeg() (or ffmpegAvailable) check before
    invoking runWhisper and combine it with hasWhisper (i.e., if (!hasWhisper ||
    !ffmpegAvailable)). If either is missing, return the same fallback object but
    adjust the user-facing content and metadata to indicate which tool(s) are
    missing (mention whisper and/or ffmpeg), set metadata flags like
    whisper_available and ffmpeg_available accordingly, and avoid calling runWhisper
    when ffmpeg is not available; keep using runWhisper(transcript) only when both
    checks pass.
  • Around line 12-14: The current execSync call that builds a shell string with
    the user-controlled filePath should be replaced with child_process.execFile
    (async) to avoid shell injection and event-loop blocking: import execFile from
    'child_process' (or require it), call execFile('whisper', ['--output_dir',
    tmpDir, '--output_format', 'txt', filePath], { timeout: 300000 }, (err, stdout,
    stderr) => { handle errors, log stderr, and proceed when complete }), and remove
    shell interpolation; ensure you pass filePath and tmpDir as separate arguments
    in the array (no shell), handle the callback errors, and keep the same timeout
    behavior so the Whisper process is spawned without a shell and does not block
    the Node event loop.

In @src/providers/image.ts:

  • Line 8: The current use of execSync with a shell-interpolated filePath
    (variable filePath) in providers/image.ts creates a command-injection and blocks
    the event loop; replace this with a non-blocking child_process.execFile (or
    spawn) call that passes tesseract and arguments as an array (e.g.,
    execFile('tesseract', [filePath, 'stdout'])) so no shell interpolation occurs,
    remove the shell redirection 2>/dev/null and instead handle/ignore stderr in
    the callback or promise, make the call asynchronous (promisify execFile or use
    spawn with stdout collection) and add proper error handling around the call
    where result is currently used.

In @src/providers/video.ts:

  • Around line 22-24: The ffmpeg invocation using execSync currently uses an
    invalid stream selector "-map a"; update the command built in the execSync call
    (the call where you construct the ffmpeg string for extracting audio) to use an
    explicit input and stream specifier such as "-map 0:a:0" and add "-vn" to
    disable video processing (e.g., replace "-map a" with "-map 0:a:0 -vn") so
    ffmpeg extracts the first audio stream reliably.
  • Around line 22-28: The synchronous, shell-interpolated calls to execSync using
    backticks with filePath/audioPath/tmpDir must be replaced by async execFile
    calls that pass the binary and its arguments as an array to avoid shell parsing
    and command injection; change the surrounding function to async,
    import/promisify execFile (or use child_process/promises.execFile), call
    execFile("ffmpeg", ["-i", filePath, "-q:a", "0", "-map", "a", audioPath, "-y"])
    and execFile("whisper", [audioPath, "--output_dir", tmpDir, "--output_format",
    "txt"]) instead of the backticked commands, handle stdout/stderr via the
    returned promise (or options.stdio) instead of using shell redirections like
    2>/dev/null, and enforce timeouts via execFile's timeout option or an
    AbortController so the event loop is not blocked and inputs from
    ConvertRequest.input are never shell-interpolated.

In @src/server.ts:

  • Around line 60-91: The code is incorrectly calling resolve(input) before
    passing to convertToMarkdown which mangles URLs; change the call in the POST
    /convert handler to pass the raw input string (the variable input from the
    request body) directly to convertToMarkdown (i.e., remove resolve(input) usage)
    so convertToMarkdown/detectInputKind can correctly detect URLs vs file paths;
    keep the options object ({ ...DEFAULT_CONFIG.options, frontmatter }) and runtime
    parameter unchanged.

Outside diff comments:
In @src/providers/mobi.ts:

  • Around line 50-53: The call to execSync interpolates filePath into a shell
    string which allows command injection; replace this with execFileSync (or
    spawnSync) and pass "ebook-convert" as the executable and [filePath, outPath] as
    the args array, preserve the timeout option and ensure shell is false, and
    remove any shell redirection (2>/dev/null) so stderr can be captured via
    options.stdio or try/catch; update the code locations referencing execSync,
    filePath and outPath in src/providers/mobi.ts to use execFileSync/ spawnSync and
    argument arrays instead of string interpolation.

Nitpick comments:
In @README.md:

  • Around line 332-346: The fenced code block that lists the project directory
    (the block starting with "src/" and the directory tree lines) is missing a
    language specifier; update that block header from totxt (or ```text) so
    markdownlint passes and the directory listing is treated as plain text.
  • Around line 98-103: The fenced code block showing the example "sources.txt —
    one source per line, # for comments" in README.md lacks a language specifier;
    update that opening fence to include a plain-text language (e.g., change "" to "txt" or "```text") so the block is marked as txt, ensuring markdownlint
    compliance for the block containing the three example URLs.

In @src/core/ingest.ts:

  • Around line 51-64: buildIndexMarkdown inserts raw doc.title and doc.source
    into markdown table cells which breaks rendering if values contain '|' or
    newlines; create and use a small helper (e.g. escapeMarkdownCell) inside
    buildIndexMarkdown to replace pipe characters with '|' and convert newlines to
    a safe inline representation (e.g. space or '
    ') and pass escaped values
    (escapeMarkdownCell(doc.title), escapeMarkdownCell(doc.source)) when building
    the table rows so table cells remain valid.

In @src/core/route-input.ts:

  • Around line 43-46: The route handling injects caps.hasWhisper into
    convertAudio/convertVideo but ffmpeg availability is still checked inside
    src/providers/video.ts; add a hasFfmpeg boolean to the RuntimeCapabilities type
    in src/core/runtime.ts, populate it where runtime capabilities are built (using
    the same detection used now), and pass caps.hasFfmpeg into convertVideo from
    src/core/route-input.ts (alongside caps.hasWhisper) so convertVideo(received)
    uses the injected flag (or remove the internal which ffmpeg check in
    convertVideo and rely on the injected hasFfmpeg) to keep capability detection
    consistent and testable.

In @src/mcp.ts:

  • Around line 154-158: Replace the generic Error thrown for unknown tools with a
    dedicated error class so callers can handle it programmatically: add and export
    a new UnknownToolError (or ToolNotFoundError) class in the module and throw new
    UnknownToolError(name) instead of throw new Error(Unknown tool: ${name}) in
    the code that currently throws the generic error; ensure the error includes the
    tool name/message and preserves the original stack by extending Error so MCP
    clients can use instanceof UnknownToolError to detect this condition.

In @src/server.ts:

  • Around line 128-135: The console.log at module scope causes a startup
    side-effect on import; move that logging into the actual server-start path so it
    only runs when the server is started. Specifically, remove the module-level
    console.log and add the log inside the code that invokes the server (e.g., where
    you call Bun.serve or the function that starts the server), referencing the same
    port value (port) and app.fetch context; alternatively wrap the log in a
    conditional that checks a runtime flag (e.g., if (require.main === module) or an
    environment variable) so console.log is not executed during imports/tests.

In @tests/integration/url-live.test.ts:

  • Around line 40-48: The test duplicates the expect(result.kind).toBe("url")
    assertion; update tests to avoid redundant network calls by removing the
    duplicate expect(result.kind).toBe("url") from this "should report source_type
    as url" test (or consolidate it into the first test that also calls
    convertToMarkdown), keeping the check once where convertToMarkdown is already
    invoked; locate the assertions around the convertToMarkdown call that uses
    TARGET_URL, DEFAULT_CONFIG.options and runtime, and either delete the redundant
    result.kind assertion here or move any additional expectations (e.g.,
    usefulness_score) into the single test that retains the kind assertion (ensuring
    convertToMarkdown is still awaited and other expectations remain).

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `2b83351e-45c2-487a-a9d9-4034010b8ac0`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 5fae2670a1413b874461b87e397a4ef5b821887f and d78ec37ee8aab1f7610f60032f5cf1131789fef5.

</details>

<details>
<summary>⛔ Files ignored due to path filters (1)</summary>

* `bun.lock` is excluded by `!**/*.lock`

</details>

<details>
<summary>📒 Files selected for processing (19)</summary>

* `.mcp.json`
* `CLAUDE.md`
* `README.md`
* `package.json`
* `src/cli.ts`
* `src/core/graph.ts`
* `src/core/ingest.ts`
* `src/core/route-input.ts`
* `src/mcp.ts`
* `src/providers/audio.ts`
* `src/providers/image.ts`
* `src/providers/mobi.ts`
* `src/providers/pdf.ts`
* `src/providers/video.ts`
* `src/server.ts`
* `tests/fixtures/sample.html`
* `tests/fixtures/sample.json`
* `tests/integration/url-live.test.ts`
* `tests/integration/youtube-live.test.ts`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment on lines +58 to +61
If you run `bun run src/cli.ts convert` with no input, CLI returns:

- `Missing input for convert command.`
- Exit code `1`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The troubleshooting note contradicts the documented convert behavior.

Lines 58-61 already say bun run src/cli.ts convert with no input returns Missing input for convert command., but Line 100 still documents the older Input: convert failure mode. If that parser fallback is gone now, this note will send users toward the wrong fix.

Also applies to: 100-100

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` around lines 58 - 61, The troubleshooting note describing the
CLI's convert failure is out of date: update the documentation so it reflects
the current behavior where running the CLI with no input returns "Missing input
for convert command." and exit code 1 (as produced by the `convert` command
handler), and remove or replace the old reference to the parser fallback "Input:
convert" which no longer applies; ensure the note mentions the exact error
string ("Missing input for convert command.") and the correct remediation
(provide input or use the appropriate flags) and delete the obsolete "Input:
convert" troubleshooting hint.

Comment on lines +70 to +79
if (outputDir) {
await mkdir(outputDir, { recursive: true });
for (const doc of result.docs) {
const md = formatMarkdown(
{ ...doc, metadata: doc.metadata },
{ frontmatter: values.frontmatter },
);
const outPath = join(outputDir, doc.fileName);
await writeFile(outPath, md, "utf-8");
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prevent silent overwrite when writing ingest outputs.

At Line 77-78, files are written using doc.fileName without collision handling. Duplicate names (common with recursive ingest or same basename across folders) will overwrite earlier outputs and lose data.

💡 Suggested fix (deduplicate output filenames)
-import { resolve } from "node:path";
+import { resolve, parse } from "node:path";
@@
   if (outputDir) {
     await mkdir(outputDir, { recursive: true });
+    const nameCounts = new Map<string, number>();
     for (const doc of result.docs) {
       const md = formatMarkdown(
         { ...doc, metadata: doc.metadata },
         { frontmatter: values.frontmatter },
       );
-      const outPath = join(outputDir, doc.fileName);
+      const seen = nameCounts.get(doc.fileName) ?? 0;
+      nameCounts.set(doc.fileName, seen + 1);
+      const parsed = parse(doc.fileName);
+      const finalName =
+        seen === 0
+          ? doc.fileName
+          : `${parsed.name}-${seen + 1}${parsed.ext || ".md"}`;
+      const outPath = join(outputDir, finalName);
       await writeFile(outPath, md, "utf-8");
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (outputDir) {
await mkdir(outputDir, { recursive: true });
for (const doc of result.docs) {
const md = formatMarkdown(
{ ...doc, metadata: doc.metadata },
{ frontmatter: values.frontmatter },
);
const outPath = join(outputDir, doc.fileName);
await writeFile(outPath, md, "utf-8");
}
if (outputDir) {
await mkdir(outputDir, { recursive: true });
const nameCounts = new Map<string, number>();
for (const doc of result.docs) {
const md = formatMarkdown(
{ ...doc, metadata: doc.metadata },
{ frontmatter: values.frontmatter },
);
const seen = nameCounts.get(doc.fileName) ?? 0;
nameCounts.set(doc.fileName, seen + 1);
const parsed = parse(doc.fileName);
const finalName =
seen === 0
? doc.fileName
: `${parsed.name}-${seen + 1}${parsed.ext || ".md"}`;
const outPath = join(outputDir, finalName);
await writeFile(outPath, md, "utf-8");
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli.ts` around lines 70 - 79, The current write loop uses doc.fileName
directly and can silently overwrite files; update the loop that writes files
(the block using outputDir, result.docs, formatMarkdown, and writeFile) to
detect filename collisions and generate unique names before writing (e.g.,
maintain a Set of used names and, when a collision on doc.fileName occurs,
append a deterministic suffix like "-1", "-2", etc., to the basename until
unique), then compute outPath from that unique name and call writeFile; ensure
you preserve file extensions and keep the uniqueness logic scoped to this write
routine so recursive ingests or same basenames do not clobber earlier outputs.

Comment on lines +77 to +83
for (const person of persons) {
for (const place of places.slice(0, 3)) {
relations.push({ from: person.id, to: place.id, type: "located-in" });
}
for (const concept of concepts.slice(0, 3)) {
relations.push({ from: person.id, to: concept.id, type: "related-to" });
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t infer relations from document-wide cartesian products.

Lines 77-83 link every detected person to the first three places and concepts anywhere in the document. That turns unrelated mentions into asserted facts, so the graph becomes misleading as soon as a note contains multiple topics or people. If you don’t have local co-occurrence/provenance yet, it’s safer to emit entities only and leave relations empty.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/graph.ts` around lines 77 - 83, The current code in
src/core/graph.ts builds relations by linking every person to the first three
places and concepts (the nested loops over persons, places.slice(0,3) and
concepts.slice(0,3)), which creates spurious assertions; remove or disable those
loops so the function only emits entities and leaves the relations array empty
(or only adds relations when explicit local co‑occurrence/provenance data is
available), and add a clear TODO comment in the same function (where relations
is built) pointing to future logic that will infer relations from true local
context.

Comment on lines +167 to +177
if (options.index && docs.length > 0) {
const indexMarkdown = buildIndexMarkdown(docs, folderPath);
docs.push({
fileName: "_index.md",
title: `Index: ${basename(folderPath)}`,
source: folderPath,
sourceType: "text",
sections: [{ content: indexMarkdown }],
metadata: { extraction: "index", extraction_status: "ok" },
graph: { entities: [], relations: [], relatedNotes: [] },
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid _index.md filename collision with ingested documents.

Line 170 always emits _index.md. If the source set already includes a document that resolves to _index.md, this causes ambiguous outputs and potential overwrite downstream.

💡 Suggested fix (reserve a unique index filename)
   if (options.index && docs.length > 0) {
     const indexMarkdown = buildIndexMarkdown(docs, folderPath);
+    const taken = new Set(docs.map((d) => d.fileName));
+    const indexFileName = taken.has("_index.md") ? "_ingest_index.md" : "_index.md";
     docs.push({
-      fileName: "_index.md",
+      fileName: indexFileName,
       title: `Index: ${basename(folderPath)}`,
       source: folderPath,
       sourceType: "text",
       sections: [{ content: indexMarkdown }],
       metadata: { extraction: "index", extraction_status: "ok" },
       graph: { entities: [], relations: [], relatedNotes: [] },
     });
-    outputFiles.push("_index.md");
+    outputFiles.push(indexFileName);
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/ingest.ts` around lines 167 - 177, The current block that always
pushes an index document using fileName "_index.md" (inside the options.index
branch using buildIndexMarkdown, docs, basename(folderPath), folderPath) can
collide with existing documents; change the logic to reserve a unique index
filename by computing a candidate (e.g. "_index.md") and, before pushing,
loop/check docs.some(d => d.fileName === candidate) and if it exists append a
numeric suffix (e.g. "_index-1.md", "_index-2.md", ...) until you find a name
that is not present, then use that unique candidate as the fileName when pushing
the index document.

Comment on lines +41 to +64
if (!hasWhisper) {
return {
title: name,
source: filePath,
sourceType: "audio",
sections: [
{
heading: "Audio File",
content: `*File: ${name}*\n\nAudio transcription requires OpenAI Whisper.\n\nInstall with: \`pip install openai-whisper\``,
},
],
metadata: {
extraction: "unavailable",
extraction_status: "weak",
file_name: name,
file_size_bytes: fileSize,
format: ext,
whisper_available: false,
low_confidence_output: true,
},
};
}

const transcript = await runWhisper(filePath);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/providers/audio.ts | head -100

Repository: ojspace/md-anything

Length of output: 3590


🏁 Script executed:

fd -t f "video" src/providers/ | head -10

Repository: ojspace/md-anything

Length of output: 85


🏁 Script executed:

rg -A 5 -B 5 "ffmpeg" src/providers/

Repository: ojspace/md-anything

Length of output: 3849


Add ffmpeg availability check alongside hasWhisper in audio provider.

The audio provider only gates on hasWhisper, but OpenAI Whisper requires ffmpeg as an external runtime dependency. When ffmpeg is missing but Whisper is installed, runWhisper silently fails (caught at line 23) and returns null, resulting in a misleading "whisper-empty" message that suggests silent audio rather than a missing dependency. Implement a hasFfmpeg() check mirroring the video provider's approach, then combine the checks: if (!hasWhisper || !ffmpegAvailable) with messaging that specifies which tools are missing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/providers/audio.ts` around lines 41 - 64, Update the audio provider to
check for ffmpeg availability like the video provider: call a hasFfmpeg() (or
ffmpegAvailable) check before invoking runWhisper and combine it with hasWhisper
(i.e., if (!hasWhisper || !ffmpegAvailable)). If either is missing, return the
same fallback object but adjust the user-facing content and metadata to indicate
which tool(s) are missing (mention whisper and/or ffmpeg), set metadata flags
like whisper_available and ffmpeg_available accordingly, and avoid calling
runWhisper when ffmpeg is not available; keep using runWhisper(transcript) only
when both checks pass.

try {
const { execSync: exec } = await import("node:child_process");
const result = exec(`tesseract "${filePath}" stdout 2>/dev/null`, {
const result = execSync(`tesseract "${filePath}" stdout 2>/dev/null`, {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file and surrounding context
cat -n src/providers/image.ts | head -30

Repository: ojspace/md-anything

Length of output: 1106


🏁 Script executed:

# Also check if the file exists and what the actual content is around line 8
wc -l src/providers/image.ts

Repository: ojspace/md-anything

Length of output: 88


🏁 Script executed:

# Look for all uses of execSync and execFile in the repository to understand the pattern
rg -n "execSync|execFile" --type ts --type js

Repository: ojspace/md-anything

Length of output: 1432


🏁 Script executed:

# Check if there's any input validation or sanitization for filePath before this call
rg -n "filePath" src/providers/image.ts -A 3 -B 3

Repository: ojspace/md-anything

Length of output: 1294


Use execFile() or properly escape the filePath argument instead of shell-interpolating.

Line 8 invokes tesseract via execSync() with a shell string. This creates a command injection vulnerability: a filePath containing shell metacharacters (quotes, backticks, semicolons) can escape and execute arbitrary commands. Additionally, execSync() blocks the entire Node.js event loop; with the new HTTP/MCP server entrypoints, a slow OCR execution will freeze the process and deny service to other requests. Use execFile() with argument arrays instead, or properly escape the path using a library like shell-escape.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/providers/image.ts` at line 8, The current use of execSync with a
shell-interpolated filePath (variable filePath) in providers/image.ts creates a
command-injection and blocks the event loop; replace this with a non-blocking
child_process.execFile (or spawn) call that passes tesseract and arguments as an
array (e.g., execFile('tesseract', [filePath, 'stdout'])) so no shell
interpolation occurs, remove the shell redirection `2>/dev/null` and instead
handle/ignore stderr in the callback or promise, make the call asynchronous
(promisify execFile or use spawn with stdout collection) and add proper error
handling around the call where result is currently used.

Comment on lines +22 to +24
execSync(`ffmpeg -i "${filePath}" -q:a 0 -map a "${audioPath}" -y 2>/dev/null`, {
timeout: 120000,
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

ffmpeg -map syntax valid input selector

💡 Result:

-map takes either an input-stream selector or a filtergraph output label:

1) Input selector form

-map [-]input_file_id[:stream_specifier][?][,sync_file_id[:stream_specifier]]
  • input_file_id = 0-based input number (-i order).
  • [:stream_specifier] is optional; if omitted, it means “all streams from that input” (e.g. -map 0).
  • Prefix - makes it a negative map (removes matches from previously-added maps).
  • Trailing ? makes it optional (ignored if it matches nothing; still fails if the input file index is invalid).
  • Optional ,sync_file_id:stream_specifier sets the PTS sync reference.
    [1]

2) Filtergraph label form

-map [linklabel]

Maps a labeled output from -filter_complex (e.g. -map "[outv]").
[1]


Valid stream_specifier forms (the part after :)

Common ones (can be combined like a:2, v, etc.):

  • By stream index: 0:1 (stream #1 in input #0)
  • By type: 0:v, 0:a, 0:s, 0:d, 0:t (video/audio/subtitle/data/attachment)
  • By type + Nth of that type: 0:a:2 (3rd audio stream)
  • By metadata tag: 0:m:language:eng
  • By disposition: 0:disp:forced (and you can combine dispositions with +)
  • Usable streams only: 0:u
    [1][2]

Examples

  • Map everything from first input: -map 0
  • Map all video + 3rd audio: -map 0:v -map 0:a:2
  • Map all except 2nd audio: -map 0 -map -0:a:1
  • Map audio if present (don’t fail if none): -map 0:a?
    [1]

Sources: FFmpeg manual (-map syntax + examples) and FFmpeg “Stream specifiers” documentation. [1][2]


Use an explicit ffmpeg stream selector here.

The -map a syntax is invalid. FFmpeg's -map requires an input file ID (0-based index) followed by an optional stream specifier, e.g., -map 0:a or -map 0:a:0. Using just -map a lacks the required input file ID and will fail. Use -map 0:a:0 to extract the first audio stream, optionally with -vn to disable video processing.

Suggested fix
-    execSync(`ffmpeg -i "${filePath}" -q:a 0 -map a "${audioPath}" -y 2>/dev/null`, {
+    execSync(`ffmpeg -i "${filePath}" -vn -q:a 0 -map 0:a:0 "${audioPath}" -y 2>/dev/null`, {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
execSync(`ffmpeg -i "${filePath}" -q:a 0 -map a "${audioPath}" -y 2>/dev/null`, {
timeout: 120000,
});
execSync(`ffmpeg -i "${filePath}" -vn -q:a 0 -map 0:a:0 "${audioPath}" -y 2>/dev/null`, {
timeout: 120000,
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/providers/video.ts` around lines 22 - 24, The ffmpeg invocation using
execSync currently uses an invalid stream selector "-map a"; update the command
built in the execSync call (the call where you construct the ffmpeg string for
extracting audio) to use an explicit input and stream specifier such as "-map
0:a:0" and add "-vn" to disable video processing (e.g., replace "-map a" with
"-map 0:a:0 -vn") so ffmpeg extracts the first audio stream reliably.

Comment on lines +22 to +28
execSync(`ffmpeg -i "${filePath}" -q:a 0 -map a "${audioPath}" -y 2>/dev/null`, {
timeout: 120000,
});

execSync(`whisper "${audioPath}" --output_dir "${tmpDir}" --output_format txt 2>/dev/null`, {
timeout: 300000,
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n src/providers/video.ts | head -50

Repository: ojspace/md-anything

Length of output: 2010


🏁 Script executed:

rg -n "convertVideo" --type ts --type js

Repository: ojspace/md-anything

Length of output: 348


🏁 Script executed:

cat -n src/core/route-input.ts | head -60

Repository: ojspace/md-anything

Length of output: 2405


🏁 Script executed:

rg -n "routeInput" --type ts --type js -B 2

Repository: ojspace/md-anything

Length of output: 1233


🏁 Script executed:

cat -n src/core/convert.ts

Repository: ojspace/md-anything

Length of output: 1394


🏁 Script executed:

rg -n "normalize|sanitize|validate" src/providers/video.ts

Repository: ojspace/md-anything

Length of output: 45


🌐 Web query:

Node.js execFile execSync event loop blocking behavior

💡 Result:

  • child_process.execFile() is asynchronous: it starts the child process and returns immediately, so it does not block the Node.js event loop while the child runs. The completion callback is queued when the process exits. [1]
  • child_process.execSync() (and execFileSync(), spawnSync()) are synchronous: they block the Node.js event loop until the child process exits—no other JS, timers, or I/O callbacks can run in that Node process during the wait. [1][2]
  • Even though a child runs in a separate OS process, using the *Sync APIs still blocks your Node process because you explicitly wait synchronously for the result. [3]

Rule of thumb: use execFile in servers/throughput-sensitive code; reserve execSync/execFileSync for short CLI scripts or startup-time one-offs. [1]

Sources:
[1] Node.js docs: child_process (sync methods block; spawn() async) (nodejs.org)
[2] Node.js docs (older but explicit): sync methods “WILL block the Node.js event loop” (nodejs.org)
[3] Explanation of *Sync waiting behavior (stackoverflow.com)

Citations:


🌐 Web query:

Node.js execFile shell injection arguments array protection

💡 Result:

  • execFile(file, args) is mostly “shell-injection safe” specifically because it does not spawn a shell by default (shell: false) and runs the executable directly. With no shell involved, metacharacters like ;, &&, |, > are not interpreted as operators—they’re just bytes inside a single argument. [1]
  • It does not “sanitize” your inputs; it just avoids shell parsing. Untrusted args can still be dangerous as argument/option injection (e.g., turning on dangerous flags in the target program), so you still need allowlists / strict validation for what arguments are permitted. [2]
  • If you enable shell: true, you lose that protection. Node explicitly warns not to pass unsanitized input when the shell option is enabled because shell metacharacters can trigger arbitrary command execution. [3]
  • Windows gotcha: you generally can’t run .bat / .cmd with execFile; Windows batch files require cmd.exe (i.e., effectively a shell path), which reintroduces shell-injection risk if any part is attacker-controlled. [4]

Sources
[1] Node.js child_process.execFile() docs (no shell by default; spawns file directly) (nodejs.org)
[2] Stack Overflow discussion showing metacharacters in the args array are not executed as shell operators (illustrating “no shell parsing,” not “sanitization”) (stackoverflow.com)
[3] Node.js docs warning about shell option + unsanitized input (nodejs.org)
[4] Node.js docs on Windows .bat/.cmd requiring cmd.exe/shell-style invocation (nodejs.org)

Citations:


Replace these shell-string execSync() calls with execFile() and async/await to prevent command injection.

Lines 22-28 interpolate filePath directly into backtick-quoted shell strings passed to execSync(). Since filePath originates from user input (ConvertRequest.input), a user can inject arbitrary shell commands—for example, passing "video.mp4; rm -rf /; echo" would execute the malicious portion. Additionally, execSync() blocks the Node.js event loop, stalling all other requests in your Hono/MCP server while transcription runs. Use execFile() with argument arrays and switch to async/await to avoid shell parsing entirely and restore event-loop responsiveness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/providers/video.ts` around lines 22 - 28, The synchronous,
shell-interpolated calls to execSync using backticks with
filePath/audioPath/tmpDir must be replaced by async execFile calls that pass the
binary and its arguments as an array to avoid shell parsing and command
injection; change the surrounding function to async, import/promisify execFile
(or use child_process/promises.execFile), call execFile("ffmpeg", ["-i",
filePath, "-q:a", "0", "-map", "a", audioPath, "-y"]) and execFile("whisper",
[audioPath, "--output_dir", tmpDir, "--output_format", "txt"]) instead of the
backticked commands, handle stdout/stderr via the returned promise (or
options.stdio) instead of using shell redirections like 2>/dev/null, and enforce
timeouts via execFile's timeout option or an AbortController so the event loop
is not blocked and inputs from ConvertRequest.input are never
shell-interpolated.

Comment on lines +60 to +91
app.post("/convert", async (c) => {
let body: { input?: string; frontmatter?: boolean };
try {
body = await c.req.json();
} catch {
return c.json({ error: "Invalid JSON body" }, 400);
}

const { input, frontmatter = true } = body;

if (!input) {
return c.json({ error: "Missing required field: input" }, 400);
}

try {
const result = await convertToMarkdown(
{
input: resolve(input),
options: { ...DEFAULT_CONFIG.options, frontmatter },
},
runtime,
);

return c.json({
input: result.input,
kind: result.kind,
markdown: result.markdown,
metadata: result.metadata,
});
} catch (err) {
return c.json({ error: String(err) }, 500);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: resolve(input) mangles URL inputs.

The resolve(input) call on line 77 will convert URL strings like https://example.com into filesystem paths (e.g., /current/dir/https:/example.com), breaking URL-based conversions.

The core convertToMarkdown function expects the raw input string, as it internally detects whether the input is a URL or file path via detectInputKind.

🐛 Proposed fix
   try {
+    // Only resolve file paths, not URLs
+    const resolvedInput = input.startsWith("http://") || input.startsWith("https://")
+      ? input
+      : resolve(input);
     const result = await convertToMarkdown(
       {
-        input: resolve(input),
+        input: resolvedInput,
         options: { ...DEFAULT_CONFIG.options, frontmatter },
       },
       runtime,
     );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server.ts` around lines 60 - 91, The code is incorrectly calling
resolve(input) before passing to convertToMarkdown which mangles URLs; change
the call in the POST /convert handler to pass the raw input string (the variable
input from the request body) directly to convertToMarkdown (i.e., remove
resolve(input) usage) so convertToMarkdown/detectInputKind can correctly detect
URLs vs file paths; keep the options object ({ ...DEFAULT_CONFIG.options,
frontmatter }) and runtime parameter unchanged.

- Add htmlToMarkdown() utility: tables, headings, lists, code, links
- Add splitIntoSections() utility: heading-boundary section splitting
- Refactor HTML provider to use shared conversion utilities
- Refactor EPUB provider to use htmlToMarkdown per chapter
- Rewrite PDF provider: unpdf zero-dep primary, pdftotext fallback,
  detectStructure() heuristic for raw text (ALL-CAPS, numbered sections)
- Upgrade PDF support level: best-effort → strong
- Fix MCP URL handling: resolve() now skips http/https inputs
- Fix stale doctor output in mcp.ts, server.ts, and README
- Add CI workflow (ubuntu + macos, required tests + full suite)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ojspace ojspace merged commit 5c2d824 into main Mar 18, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant