This project leverages LangChain to process and generate multimedia content (audio, video, images) with plugin-based extensions and custom model integration.
- Text-to-Audio/Video/Image
- OCR
- Audio-to-Text
- Configurable Parameters Management
- Xinference
- OpenAI
- Stable Diffusion
- Clone the repository
git clone https://github.com/zhcn000000/langchain-multimedia.git cd langchain-multimedia - Create and activate a virtual environment
python3 -m venv venv source venv/bin/activate - Install dependencies
pip install -r requirements.txt
- Edit
config.yamlto set your API key, model parameters, etc. - Run an example script:
python examples/audio_to_text.py
from langchain_core.messages import HumanMessage, AIMessage
from langchain_multimedia import OpenAIAudioGenerator
model = OpenAIAudioGenerator(
base_url="https://api.example.com",
api_key="YOUR_API_KEY",
model="voice-1",
)
model.voice = "en-US-Wavenet-D" # Set the voice model
prompt = "Hello, world"
response = model.invoke(input=prompt)
'''
response = "/path/to/generated_audio.mp3"
'''from langchain_core.messages import HumanMessage, AIMessage
from langchain_multimedia import OpenAIImageGenerator
model = OpenAIImageGenerator(
base_url="https://api.example.com",
api_key="YOUR_API_KEY",
model="vision-1",
)
prompt = "Generate a landscape photo with mountains and a river"
response = model.invoke(input=prompt)
'''
response = "/path/to/generated_image.png"
'''from langchain_multimedia import OpenAITranscriber
from pathlib import Path
audio_file = "/path/to/audio.mp3"
audio_data = Path(audio_file).read_bytes()
model = OpenAITranscriber(
base_url="https://api.example.com",
api_key="YOUR_API_KEY",
model="whisper-1",
)
response = model.invoke(input=audio_data)
'''
response = "Transcribed text from the audio file"
'''In tests/api.json, you can configure:
api_key: API key for model servicemodel_name: Selected model nametimeout: Request timeout in seconds- Parameters for plugins and extensions
- Currently only OpenAI and XInference image and audio models have been tested; other models are not yet tested.
.
├── examples/ Example scripts
├── langchain_multimedia/ Core modules
├── tests/ Unit tests
├── tests/api.json tests api config file
├── requirements.txt Dependencies
└── README.md Project documentation
This project is licensed under the MIT License.