[FE] Converts audio to markdown #147

Vaishnav2804 · 2025-11-17T20:36:35Z

closes #148

Example Usage:

from lexoid.api import parse

document_path ="inputs\harvard.wav"
parsed_md = parse(document_path, "AUTO",api="gemini")["raw"]
print(parsed_md)

TODOs:

Add google-genai via poetry

… in openAI does not support temperature value between 0-1

- Formatting and var type fixes - Handle image inputs - Split on "```json" for robustness - Add example to example_notebook.ipynb

create separate function to parse images using gemini

Copilot

Pull request overview

This PR adds support for converting audio files to markdown using the Gemini API. The implementation routes audio files through the LLM parser and creates a new audio-specific parsing function that uploads audio files to Gemini and transcribes them into well-structured markdown.

Key Changes

Added audio file type support to the file type checking and routing logic
Implemented parse_audio_with_gemini() function with audio-specific prompt template
Added example usage in the Colab notebook demonstrating audio transcription

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 7 comments.

File	Description
lexoid/core/utils.py	Added audio file type to supported formats and routing logic to direct audio files to LLM parser
lexoid/core/prompt_templates.py	Added `AUDIO_TO_MARKDOWN_PROMPT` template with instructions for transcription and markdown formatting
lexoid/core/parse_type/llm_parser.py	Implemented audio parsing with Gemini, including validation to restrict audio to Gemini API and new `parse_audio_with_gemini()` function
examples/example_notebook_colab.ipynb	Added example demonstrating audio file parsing with output

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

lexoid/core/parse_type/llm_parser.py

Copilot · 2025-11-25T22:46:50Z

lexoid/core/parse_type/llm_parser.py

+    audio_file = client.files.upload(file=path)
+    system_prompt = kwargs.get("system_prompt", None)
+    if system_prompt == "" or system_prompt is None:
+        system_prompt = AUDIO_TO_MARKDOWN_PROMPT + "Audo file name is: {path}\n"


The string formatting placeholder {path} is not being replaced with the actual path value. This line should use f-string formatting: system_prompt = AUDIO_TO_MARKDOWN_PROMPT + f"Audio file name is: {path}\n" (note the f prefix and corrected spelling of "Audio").

Suggested change

system_prompt = AUDIO_TO_MARKDOWN_PROMPT + "Audo file name is: {path}\n"

system_prompt = AUDIO_TO_MARKDOWN_PROMPT + f"Audio file name is: {path}\n"

Copilot · 2025-11-25T22:46:50Z

lexoid/core/parse_type/llm_parser.py

 import requests
 import torch
 from anthropic import Anthropic
+from google import genai


The code imports from google import genai and uses genai.Client(), which appears to be from the google-genai package. However, the pyproject.toml file specifies google-generativeai (a different package) as the dependency. According to the PR description TODO, google-genai needs to be added via poetry. Either add the correct package to pyproject.toml, or update the code to use the existing google-generativeai package API.

Suggested change

from google import genai

import google.generativeai as genai

Copilot · 2025-11-25T22:46:51Z

examples/example_notebook_colab.ipynb

+   "source": [
+    "from lexoid.api import parse\n",
+    "\n",
+    "document_path =\"inputs\\harvard.wav\"\n",


The path string uses a single backslash which Python interprets as an escape sequence, causing a SyntaxWarning (visible in the output at lines 1988-1991). Use either a raw string (r"inputs\harvard.wav") or forward slashes ("inputs/harvard.wav") to avoid this warning.

Copilot · 2025-11-25T22:46:51Z

examples/example_notebook_colab.ipynb

+    "from lexoid.api import parse\n",
+    "\n",
+    "document_path =\"inputs\\harvard.wav\"\n",
+    "parsed_md = parse(document_path, \"AUTO\",api=\"gemini\")[\"raw\"]\n",


Missing space after comma in the function call. Should be: parse(document_path, "AUTO", api="gemini")

Copilot · 2025-11-25T22:46:51Z

lexoid/core/parse_type/llm_parser.py

+def parse_audio_with_gemini(path: str, **kwargs) -> Dict:
+    client = genai.Client()
+    audio_file = client.files.upload(file=path)
+    system_prompt = kwargs.get("system_prompt", None)
+    if system_prompt == "" or system_prompt is None:
+        system_prompt = AUDIO_TO_MARKDOWN_PROMPT + "Audo file name is: {path}\n"
+
+    response = client.models.generate_content(
+        model=kwargs["model"], contents=[system_prompt, audio_file]
+    )
+
+    return {
+        "raw": response.text,
+        "segments": [
+            {
+                "metadata": {"page": 0},
+                "content": response.text,
+            }
+        ],
+        "title": kwargs.get("title", ""),
+        "url": kwargs.get("url", ""),
+        "parent_title": kwargs.get("parent_title", ""),
+        "recursive_docs": [],
+        "token_usage": {
+            "input": response.usage_metadata.prompt_token_count,
+            "output": response.usage_metadata.candidates_token_count,
+            "total": (
+                response.usage_metadata.prompt_token_count
+                + response.usage_metadata.candidates_token_count
+            ),
+        },
+    }


The new audio parsing functionality lacks test coverage. Consider adding a test case similar to the existing test_llm_parse and test_jpg_parse functions to verify audio file parsing works correctly with the Gemini API. This would help ensure the feature works as expected and prevent regressions.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Vaishnav2804 and others added 14 commits May 30, 2025 18:42

add functionality to parse based on schema

a4dd2f7

refactor system prompt

2c5051c

comment max_tokens for open_ai | set temperature to 0, as mini models…

24ad0d2

… in openAI does not support temperature value between 0-1

rename method to parse_with_schema

fc7523b

refactor pydoc for method: parse_with_schema

1c4f07d

Minor changes:

b282176

- Formatting and var type fixes - Handle image inputs - Split on "```json" for robustness - Add example to example_notebook.ipynb

add gemini support to parse pdf with schema

c32ad88

create separate function to parse images using gemini

Merge branch 'main' of github.com:oidlabs-com/Lexoid

864eeaf

Merge branch 'main' of github.com:oidlabs-com/Lexoid

cd2dfa6

add support for dataclass to parse_with_schema

804cc5b

remove parse schema code and organize code

260ec00

Merge branch 'main' of github.com:oidlabs-com/Lexoid

5721920

Merge branch 'main' of github.com:oidlabs-com/Lexoid

5d6df67

FE/audio2markdown initial commit

7a70a0d

Vaishnav2804 requested a review from dilithjay November 17, 2025 20:36

Vaishnav2804 self-assigned this Nov 17, 2025

Vaishnav2804 added the enhancement New feature or request label Nov 17, 2025

Merge branch 'main' into FE/audio2md

59a502e

pramitchoudhary requested a review from Copilot November 25, 2025 22:42

Copilot started reviewing on behalf of pramitchoudhary November 25, 2025 22:43 View session

Copilot finished reviewing on behalf of pramitchoudhary November 25, 2025 22:46

Copilot AI reviewed Nov 25, 2025

View reviewed changes

Vaishnav2804 and others added 2 commits November 27, 2025 20:47

Update lexoid/core/parse_type/llm_parser.py

ce78f7e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update lexoid/core/parse_type/llm_parser.py

82b80fa

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FE] Converts audio to markdown #147

[FE] Converts audio to markdown #147

Uh oh!

Vaishnav2804 commented Nov 17, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 25, 2025

Uh oh!

Copilot AI Nov 25, 2025

Uh oh!

Copilot AI Nov 25, 2025

Uh oh!

Copilot AI Nov 25, 2025

Uh oh!

Copilot AI Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	system_prompt = AUDIO_TO_MARKDOWN_PROMPT + "Audo file name is: {path}\n"
	system_prompt = AUDIO_TO_MARKDOWN_PROMPT + f"Audio file name is: {path}\n"

[FE] Converts audio to markdown #147

Are you sure you want to change the base?

[FE] Converts audio to markdown #147

Uh oh!

Conversation

Vaishnav2804 commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example Usage:

TODOs:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Vaishnav2804 commented Nov 17, 2025 •

edited

Loading