Skip to content

Official implementation of TISDiSS, a scalable framework for discriminative source separation.

License

Notifications You must be signed in to change notification settings

WingSingFung/TISDiSS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TISDiSS: Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation

访客统计 GitHub stars Static Badge

Official implementation of TISDiSS, a scalable framework for discriminative source separation that enables flexible model scaling at both training and inference time.

News

🏆 Highlights

  • State-of-the-art Performance: Achieves SOTA results on WSJ0-2mix, WHAMR!, and Libri2Mix datasets
  • Dynamic Inference: Adjustable Reconstruction block repeat times (N_re) at inference stage for performance-efficiency trade-offs without retraining
  • Effective Training Strategy for Low-Latency Separation: Training with more inference repetitions consistently improves shallow-inference performance, offering a practical solution for low-latency separation

🖼️ Architecture

Overall Framework

TISDiSS Framework

Separation Block

Separation Block

Reconstruction Block

Reconstruction Block

📊 Performance Comparison

WSJ0-2mix Benchmark

WSJ0-2mix Benchmark

WHAMR! and Libri2Mix Benchmark

WHAMR! and Libri2Mix Benchmark

📄 Paper

arXiv: https://arxiv.org/abs/2509.15666

Status: Submitted to ICASSP 2026


🚀 Quick Start

git clone https://github.com/WingSingFung/TISDiSS.git
cd TISDiSS

Environment Setup

Install the required dependencies:

pip install -r requirements.txt

Modify line 2 in egs2/wsj0_2mix/enh1/enh.sh for the espnet path

export PYTHONPATH="path/to/your/TISDiSS:$PYTHONPATH"

Inference

Navigate to the example directory and run inference on your audio files:

cd egs2/wsj0_2mix/enh1

python separate.py \
    --model_path ./exp/enh_train_enh_tisdiss_tflocoformer_en-residual_en1x2_re1x6_l1+1x6_raw/valid.loss.ave_5best.pth \
    --audio_path /path/to/input_audio \
    --audio_output_dir /path/to/output_directory

Parameters:

  • --model_path: Path to the pre-trained model checkpoint
  • --audio_path: Path to input audio file or directory
  • --audio_output_dir: Directory where separated audio will be saved

🔧 Training

1. Data Preparation

Navigate to the example directory:

cd egs2/wsj0_2mix/enh1

Note: You need to download the WSJ0 dataset separately (commercial license required).

Option A: WSJ0 in WAV Format

If your WSJ0 dataset is already in WAV format, create a symbolic link:

mkdir -p ./data/wsj0
ln -s /path/to/your/WSJ0 ./data/wsj0/wsj0

Alternatively, modify line 24 in ./local/data.sh to point to your WSJ0 path:

wsj_full_wav=/path/to/your/WSJ0/

Option B: WSJ0 in Original Format

If your dataset is in the original WSJ0 format:

  1. Uncomment lines 76-81 in ./egs2/wsj0_2mix/enh1/local/data.sh
  2. Fill in the WSJ0= path in db.sh

2. Preprocessing

Run data preparation and statistics collection:

./run.sh --stage 1 --stop_stage 5

3. Model Training

Train the TISDiSS model with the following command:

CUDA_VISIBLE_DEVICES=0,1 ./run.sh \
    --stage 6 \
    --stop_stage 6 \
    --enh_config conf/efficient_train/tisdiss/train_enh_tisdiss_tflocoformer_en-residual_en1x2_re1x6_l1+1x6.yaml \
    --ngpu 2

Training Configuration:

  • The model uses TF-Locoformer as the backbone
  • Training configuration: 2 Encoder blocks + 6 Reconstruction blocks
  • Adjust --ngpu to use multiple GPUs if available

4. Inference with Different Scalability Settings

Run inference with various Reconstruction block configurations (N_re):

./infer_run.sh

You can modify the script to test different N_re values:

for re in 3 6 8; do
    # Your inference commands here
done

📝 Note

This repository contains a streamlined version of ESPnet-Enh, designed for easier training and inference of TISDiSS. Since the full ESPnet framework can be complex for new users, we provide this simplified codebase focused specifically on our method.

For additional examples, features, and the complete ESPnet-Enh toolkit, please refer to the ESPnet-Enh repository.


📚 Citation

If you find this work useful in your research, please consider citing:

@article{feng2025tisdiss,
  title={TISDiSS: A Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation},
  author={Feng, Yongsheng and Xu, Yuetonghui and Luo, Jiehui and Liu, Hongjia and Li, Xiaobing and Yu, Feng and Li, Wei},
  journal={arXiv preprint arXiv:2509.15666},
  year={2025}
}

📧 Contact

For questions or issues, please open an issue on GitHub or contact the authors.

About

Official implementation of TISDiSS, a scalable framework for discriminative source separation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published