TISDiSS: Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation

Official implementation of TISDiSS, a scalable framework for discriminative source separation that enables flexible model scaling at both training and inference time.

News

[2025-10-18] We release the code and pre-trained model of TISDiSS! 🚀 Hugging Face

🏆 Highlights

State-of-the-art Performance: Achieves SOTA results on WSJ0-2mix, WHAMR!, and Libri2Mix datasets
Dynamic Inference: Adjustable Reconstruction block repeat times (N_re) at inference stage for performance-efficiency trade-offs without retraining
Effective Training Strategy for Low-Latency Separation: Training with more inference repetitions consistently improves shallow-inference performance, offering a practical solution for low-latency separation

🖼️ Architecture

Overall Framework

Separation Block

Reconstruction Block

📊 Performance Comparison

WSJ0-2mix Benchmark

WHAMR! and Libri2Mix Benchmark

📄 Paper

arXiv: https://arxiv.org/abs/2509.15666

Status: Submitted to ICASSP 2026

🚀 Quick Start

git clone https://github.com/WingSingFung/TISDiSS.git
cd TISDiSS

Environment Setup

Install the required dependencies:

pip install -r requirements.txt

Modify line 2 in egs2/wsj0_2mix/enh1/enh.sh for the espnet path

export PYTHONPATH="path/to/your/TISDiSS:$PYTHONPATH"

Inference

Navigate to the example directory and run inference on your audio files:

cd egs2/wsj0_2mix/enh1

python separate.py \
    --model_path ./exp/enh_train_enh_tisdiss_tflocoformer_en-residual_en1x2_re1x6_l1+1x6_raw/valid.loss.ave_5best.pth \
    --audio_path /path/to/input_audio \
    --audio_output_dir /path/to/output_directory

Parameters:

--model_path: Path to the pre-trained model checkpoint
--audio_path: Path to input audio file or directory
--audio_output_dir: Directory where separated audio will be saved

🔧 Training

1. Data Preparation

Navigate to the example directory:

cd egs2/wsj0_2mix/enh1

Note: You need to download the WSJ0 dataset separately (commercial license required).

Option A: WSJ0 in WAV Format

If your WSJ0 dataset is already in WAV format, create a symbolic link:

mkdir -p ./data/wsj0
ln -s /path/to/your/WSJ0 ./data/wsj0/wsj0

Alternatively, modify line 24 in ./local/data.sh to point to your WSJ0 path:

wsj_full_wav=/path/to/your/WSJ0/

Option B: WSJ0 in Original Format

If your dataset is in the original WSJ0 format:

Uncomment lines 76-81 in ./egs2/wsj0_2mix/enh1/local/data.sh
Fill in the WSJ0= path in db.sh

2. Preprocessing

Run data preparation and statistics collection:

./run.sh --stage 1 --stop_stage 5

3. Model Training

Train the TISDiSS model with the following command:

CUDA_VISIBLE_DEVICES=0,1 ./run.sh \
    --stage 6 \
    --stop_stage 6 \
    --enh_config conf/efficient_train/tisdiss/train_enh_tisdiss_tflocoformer_en-residual_en1x2_re1x6_l1+1x6.yaml \
    --ngpu 2

Training Configuration:

The model uses TF-Locoformer as the backbone
Training configuration: 2 Encoder blocks + 6 Reconstruction blocks
Adjust --ngpu to use multiple GPUs if available

4. Inference with Different Scalability Settings

Run inference with various Reconstruction block configurations (N_re):

./infer_run.sh

You can modify the script to test different N_re values:

for re in 3 6 8; do
    # Your inference commands here
done

📝 Note

This repository contains a streamlined version of ESPnet-Enh, designed for easier training and inference of TISDiSS. Since the full ESPnet framework can be complex for new users, we provide this simplified codebase focused specifically on our method.

For additional examples, features, and the complete ESPnet-Enh toolkit, please refer to the ESPnet-Enh repository.

📚 Citation

If you find this work useful in your research, please consider citing:

@article{feng2025tisdiss,
  title={TISDiSS: A Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation},
  author={Feng, Yongsheng and Xu, Yuetonghui and Luo, Jiehui and Liu, Hongjia and Li, Xiaobing and Yu, Feng and Li, Wei},
  journal={arXiv preprint arXiv:2509.15666},
  year={2025}
}

📧 Contact

For questions or issues, please open an issue on GitHub or contact the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
egs/wsj/asr1/local		egs/wsj/asr1/local
egs2		egs2
espnet		espnet
espnet2		espnet2
pics		pics
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TISDiSS: Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation

News

🏆 Highlights

🖼️ Architecture

Overall Framework

Separation Block

Reconstruction Block

📊 Performance Comparison

WSJ0-2mix Benchmark

WHAMR! and Libri2Mix Benchmark

📄 Paper

🚀 Quick Start

Environment Setup

Inference

🔧 Training

1. Data Preparation

Option A: WSJ0 in WAV Format

Option B: WSJ0 in Original Format

2. Preprocessing

3. Model Training

4. Inference with Different Scalability Settings

📝 Note

📚 Citation

📧 Contact

About

Uh oh!

Releases

Packages

Languages

License

WingSingFung/TISDiSS

Folders and files

Latest commit

History

Repository files navigation

TISDiSS: Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation

News

🏆 Highlights

🖼️ Architecture

Overall Framework

Separation Block

Reconstruction Block

📊 Performance Comparison

WSJ0-2mix Benchmark

WHAMR! and Libri2Mix Benchmark

📄 Paper

🚀 Quick Start

Environment Setup

Inference

🔧 Training

1. Data Preparation

Option A: WSJ0 in WAV Format

Option B: WSJ0 in Original Format

2. Preprocessing

3. Model Training

4. Inference with Different Scalability Settings

📝 Note

📚 Citation

📧 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages