Skip to content

InterPARES-Audio is an automated, end-to-end pipeline that transforms complex, lengthy audio files into structured, insightful text

Notifications You must be signed in to change notification settings

UBC-NLP/InterPARES_audio

Repository files navigation

InterPARES_Audio.jpg

Multilingual Audio Analysis

InterPARES-Audio is a sophisticated system designed to listen, transcribe, translate, and analyze complex audio recordings. It is built to handle files with multiple speakers conversing in different languages, making it the ideal tool for processing archives of meetings, interviews, and panel discussions.

Demo

InterPARES-Audio designed to build a top-line system that focuses on:

  • Speaker Diarization: To identify and separate different speakers in an audio recording, even when they switch languages.

  • Robust Transcription and Translation: To accurately transcribe spoken words and translate them into a target language, preserving meaning across linguistic boundaries.

  • Summarization: To generate concise summaries that capture the essence of multi-speaker conversations, highlighting key points and decisions made.

Key Features

  • End-to-End Processing: Ingests a long audio file and outputs a structured text report.

  • Speaker Diarization: Determines "who spoke when" by identifying and tagging different speakers.

  • Multilingual Transcription & LID: Accurately transcribes speech to text while automatically identifying the language being spoken.

  • Advanced LLM Analysis: Uses a large language model to perform high-level analysis, summarizing content and extracting key information.

  • Structured Multilingual Output: Generates clean, organized reports in Arabic, English, French, Spanish, German, and Italian. The reports include:

    • Speaker profiles with predicted names and roles

    • Main topics discussed

    • Decisions made during the conversation

    • Action items assigned to participants

    • Key insights and takeaways from the discussion

Workflow

The pipeline consists of four main stages:

InterPARES-Audio Workflow Diagram

  1. Speaker Diarization and Segmentation: The long audio input is processed to identify speaker changes and segment the audio into a sequence of individual utterances, each tagged with a speaker ID.

  2. Multilingual Speech Model: Each utterance is fed into a speech model that performs both transcription (speech-to-text) and language identification (LID).

  3. Transcription Manager: This component merges the individual transcribed utterances, saves the full transcript, and creates manageable, contextually coherent chunks of text optimized for the LLM.

  4. LLM Analysis: The structured text chunks are analyzed by an LLM to transform raw transcript data into a meaningful and actionable structured report.

Online Demo

Live Demo: demos.dlnlp.ai/InterPARES/

Examples

Meeting report, Markdown

Meeting report, PDF

About

InterPARES-Audio is an automated, end-to-end pipeline that transforms complex, lengthy audio files into structured, insightful text

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •