Skip to content

Conversation

@Vaishnav2804
Copy link
Collaborator

@Vaishnav2804 Vaishnav2804 commented Nov 17, 2025

closes #148

Example Usage:

from lexoid.api import parse

document_path ="inputs\harvard.wav"
parsed_md = parse(document_path, "AUTO",api="gemini")["raw"]
print(parsed_md)

TODOs:

  • Add google-genai via poetry

@Vaishnav2804 Vaishnav2804 self-assigned this Nov 17, 2025
@Vaishnav2804 Vaishnav2804 added the enhancement New feature or request label Nov 17, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for converting audio files to markdown using the Gemini API. The implementation routes audio files through the LLM parser and creates a new audio-specific parsing function that uploads audio files to Gemini and transcribes them into well-structured markdown.

Key Changes

  • Added audio file type support to the file type checking and routing logic
  • Implemented parse_audio_with_gemini() function with audio-specific prompt template
  • Added example usage in the Colab notebook demonstrating audio transcription

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 7 comments.

File Description
lexoid/core/utils.py Added audio file type to supported formats and routing logic to direct audio files to LLM parser
lexoid/core/prompt_templates.py Added AUDIO_TO_MARKDOWN_PROMPT template with instructions for transcription and markdown formatting
lexoid/core/parse_type/llm_parser.py Implemented audio parsing with Gemini, including validation to restrict audio to Gemini API and new parse_audio_with_gemini() function
examples/example_notebook_colab.ipynb Added example demonstrating audio file parsing with output

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

audio_file = client.files.upload(file=path)
system_prompt = kwargs.get("system_prompt", None)
if system_prompt == "" or system_prompt is None:
system_prompt = AUDIO_TO_MARKDOWN_PROMPT + "Audo file name is: {path}\n"
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string formatting placeholder {path} is not being replaced with the actual path value. This line should use f-string formatting: system_prompt = AUDIO_TO_MARKDOWN_PROMPT + f"Audio file name is: {path}\n" (note the f prefix and corrected spelling of "Audio").

Suggested change
system_prompt = AUDIO_TO_MARKDOWN_PROMPT + "Audo file name is: {path}\n"
system_prompt = AUDIO_TO_MARKDOWN_PROMPT + f"Audio file name is: {path}\n"

Copilot uses AI. Check for mistakes.
import requests
import torch
from anthropic import Anthropic
from google import genai
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code imports from google import genai and uses genai.Client(), which appears to be from the google-genai package. However, the pyproject.toml file specifies google-generativeai (a different package) as the dependency. According to the PR description TODO, google-genai needs to be added via poetry. Either add the correct package to pyproject.toml, or update the code to use the existing google-generativeai package API.

Suggested change
from google import genai
import google.generativeai as genai

Copilot uses AI. Check for mistakes.
"source": [
"from lexoid.api import parse\n",
"\n",
"document_path =\"inputs\\harvard.wav\"\n",
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path string uses a single backslash which Python interprets as an escape sequence, causing a SyntaxWarning (visible in the output at lines 1988-1991). Use either a raw string (r"inputs\harvard.wav") or forward slashes ("inputs/harvard.wav") to avoid this warning.

Copilot uses AI. Check for mistakes.
"from lexoid.api import parse\n",
"\n",
"document_path =\"inputs\\harvard.wav\"\n",
"parsed_md = parse(document_path, \"AUTO\",api=\"gemini\")[\"raw\"]\n",
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after comma in the function call. Should be: parse(document_path, "AUTO", api="gemini")

Copilot uses AI. Check for mistakes.
Comment on lines 783 to 814
def parse_audio_with_gemini(path: str, **kwargs) -> Dict:
client = genai.Client()
audio_file = client.files.upload(file=path)
system_prompt = kwargs.get("system_prompt", None)
if system_prompt == "" or system_prompt is None:
system_prompt = AUDIO_TO_MARKDOWN_PROMPT + "Audo file name is: {path}\n"

response = client.models.generate_content(
model=kwargs["model"], contents=[system_prompt, audio_file]
)

return {
"raw": response.text,
"segments": [
{
"metadata": {"page": 0},
"content": response.text,
}
],
"title": kwargs.get("title", ""),
"url": kwargs.get("url", ""),
"parent_title": kwargs.get("parent_title", ""),
"recursive_docs": [],
"token_usage": {
"input": response.usage_metadata.prompt_token_count,
"output": response.usage_metadata.candidates_token_count,
"total": (
response.usage_metadata.prompt_token_count
+ response.usage_metadata.candidates_token_count
),
},
}
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new audio parsing functionality lacks test coverage. Consider adding a test case similar to the existing test_llm_parse and test_jpg_parse functions to verify audio file parsing works correctly with the Gemini API. This would help ensure the feature works as expected and prevent regressions.

Copilot uses AI. Check for mistakes.
Vaishnav2804 and others added 2 commits November 27, 2025 20:47
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audio-2-markdown-using-gemini

2 participants