This is the repository of LLaQo, a Large Language Query-based music coach that leverages audio language modeling to provide detailed and formative assessments of music performances.
Our environment lam2 is downloadable from here. After downloading, simply do source /path/to/your/envs/lam2/bin/activate
. Or install via pip with requirement.txt
or conda with environment.yaml
.
checkpoints: please access from here. It contains: Vicuna-7b model; Our checkpoint; and audio encoder.
For the gradio inference demo, after setting up the environment and put the ckpts/
under root directory, please do:
python LLaQo-chat.py
For our new NeuroPiano-dataset, please refer to the hf repository as well as its analysis report. For other datasets, please see the following table for accessing audio data from their original place and our metadata file which contains the instruction-tuned QA. Additionally the qagen/
directory contains processing prompts for CROCUS and expert_novice.
The codebase is adapted from the codebase of APT, which was originally adapted from the BLIP-2, and the lavis codebase.
@article{zhang2024llaqoassessment,
title={{LLaQo: Towards a query-based coach in expressive performance assessment}},
author={Zhang, Huan and Cheung, Vincent and Nishioka, Hayato and Dixon, Simon and Furuya, Shinichi},
journal={arXiv preprint arXiv:2409.08795},
year={2024}
}