Official implementation of TISDiSS, a scalable framework for discriminative source separation that enables flexible model scaling at both training and inference time.
- [2025-10-18] We release the code and pre-trained model of TISDiSS! 🚀 Hugging Face
- State-of-the-art Performance: Achieves SOTA results on WSJ0-2mix, WHAMR!, and Libri2Mix datasets
- Dynamic Inference: Adjustable Reconstruction block repeat times (N_re) at inference stage for performance-efficiency trade-offs without retraining
- Effective Training Strategy for Low-Latency Separation: Training with more inference repetitions consistently improves shallow-inference performance, offering a practical solution for low-latency separation
arXiv: https://arxiv.org/abs/2509.15666
Status: Submitted to ICASSP 2026
git clone https://github.com/WingSingFung/TISDiSS.git
cd TISDiSSInstall the required dependencies:
pip install -r requirements.txtModify line 2 in egs2/wsj0_2mix/enh1/enh.sh for the espnet path
export PYTHONPATH="path/to/your/TISDiSS:$PYTHONPATH"
Navigate to the example directory and run inference on your audio files:
cd egs2/wsj0_2mix/enh1
python separate.py \
--model_path ./exp/enh_train_enh_tisdiss_tflocoformer_en-residual_en1x2_re1x6_l1+1x6_raw/valid.loss.ave_5best.pth \
--audio_path /path/to/input_audio \
--audio_output_dir /path/to/output_directoryParameters:
--model_path: Path to the pre-trained model checkpoint--audio_path: Path to input audio file or directory--audio_output_dir: Directory where separated audio will be saved
Navigate to the example directory:
cd egs2/wsj0_2mix/enh1Note: You need to download the WSJ0 dataset separately (commercial license required).
If your WSJ0 dataset is already in WAV format, create a symbolic link:
mkdir -p ./data/wsj0
ln -s /path/to/your/WSJ0 ./data/wsj0/wsj0Alternatively, modify line 24 in ./local/data.sh to point to your WSJ0 path:
wsj_full_wav=/path/to/your/WSJ0/If your dataset is in the original WSJ0 format:
- Uncomment lines 76-81 in
./egs2/wsj0_2mix/enh1/local/data.sh - Fill in the
WSJ0=path indb.sh
Run data preparation and statistics collection:
./run.sh --stage 1 --stop_stage 5Train the TISDiSS model with the following command:
CUDA_VISIBLE_DEVICES=0,1 ./run.sh \
--stage 6 \
--stop_stage 6 \
--enh_config conf/efficient_train/tisdiss/train_enh_tisdiss_tflocoformer_en-residual_en1x2_re1x6_l1+1x6.yaml \
--ngpu 2Training Configuration:
- The model uses TF-Locoformer as the backbone
- Training configuration: 2 Encoder blocks + 6 Reconstruction blocks
- Adjust
--ngputo use multiple GPUs if available
Run inference with various Reconstruction block configurations (N_re):
./infer_run.shYou can modify the script to test different N_re values:
for re in 3 6 8; do
# Your inference commands here
doneThis repository contains a streamlined version of ESPnet-Enh, designed for easier training and inference of TISDiSS. Since the full ESPnet framework can be complex for new users, we provide this simplified codebase focused specifically on our method.
For additional examples, features, and the complete ESPnet-Enh toolkit, please refer to the ESPnet-Enh repository.
If you find this work useful in your research, please consider citing:
@article{feng2025tisdiss,
title={TISDiSS: A Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation},
author={Feng, Yongsheng and Xu, Yuetonghui and Luo, Jiehui and Liu, Hongjia and Li, Xiaobing and Yu, Feng and Li, Wei},
journal={arXiv preprint arXiv:2509.15666},
year={2025}
}For questions or issues, please open an issue on GitHub or contact the authors.




