Wireless Channel Modeling for Machine Learning -
A Critical View on Standardized Channel Models

Welcome to the repository for the paper "Wireless Channel Modeling for Machine Learning - A Critical View on Standardized Channel Models".

Citation

Here you find a link to a pre-print of our work.

If you use this code for your research, please cite our pre-print

@misc{boeck2025wireless_ch_mod4ml,
      title={Wireless Channel Modeling for Machine Learning - A Critical View on Standardized Channel Models}, 
      author={Benedikt B\"ock and Amar Kasibovic and Wolfgang Utschick},
      year={2025},
      eprint={2510.12279},
      archivePrefix={arXiv},
      primaryClass={eess.SP},
      note = {arXiv:2510.12279},
      url={https://arxiv.org/abs/2510.12279}, 
}

Overview

The provided code is split into different parts:

Code for generating link-level (TDL and CDL) channel data (using the 5G Toolbox of Matlab) (MATLAB)
Code for generating scenario-level (QuaDRiGa) channel data (using the QuaDRiGa source code) (MATLAB)
Code for the Autoencoder applied to CSI compression (python)
Code for the linear methods, i.e., the PCA, the LMMSE estimator, and the sample covariance Gaussian sampling (python)

The diffusion model estimator is not part of the code, but we used the implementation from this repository. Moreover, we refer to the DeepMIMO website for the DeepMIMO data generation.

Generating Link-Level Channel Data

Run the scripts link_level_data/TDLCDL/generate_cdl.m or link_level_data/TDLCDL/generate_tdl.m to generate link-level channel data. You can customize the type of model (TDL-A, TDL-B, ..) as well as configuration parameters such as the number of subcarriers within the scripts. The datasets are stored in link_level_data/TDLCDL. The file names contain the type of model as well as the number of generated samples.
To transform the .mat datasets to .npy run the mat_to_py.py file in link_level_data. It takes as input the particular system (ofdm or mimo), as well as the number of samples in the dataset, and the particular dataset (tdl_a , tdl_b,...) as parser arguments.
An example would be python mat_to_py.py -system ofdm -n_samples 80000 -ds tdl_a or python mat_to_py.py -system mimo -n_samples 2000 -ds cdl_e. Note that you must have generated a MATLAB dataset with the matching configuration beforehand.
We included toy datasets with 2000 samples from the TDL-E and CDL-E link-level channel models as .npy files.

Requirements

To be able to run these scripts you need the 5G Toolbox of MATLAB. We used MATLAB R2025b. Next to standard python packages you also require the h5py package.

Comment

Note that the generation of 50000 or more samples can take quite some time. We have executed the code on a regular CPU and generated 80000 samples per channel model. It took about a day for each dataset to get generated.

Generating Scenario-Level Channel Data

Run the scripts scenario_level_data/QuaDRiGa/generate_channels_with_structured_layout_rural.m or scenario_level_data/QuaDRiGa/generate_channels_with_structured_layout_urban.m to generate OFDM scenario-level channel data. You can customize configuration parameters such as the number of subcarriers within the scripts. The datasets are stored in scenario_level_data/QuaDRiGa. The file names contain the type of model as well as the number of generated samples.
To transform the .mat datasets to .npy run the mat_to_py.py file in scenario_level_data. It takes as input the number of samples in the dataset, and the particular dataset (rural or urban) as parser arguments.
An example would be python mat_to_py.py -n_samples 80000 -ds rural. Note that you must have generated a MATLAB dataset with the matching configuration beforehand.
We included toy datasets with 2000 samples from the QuaDRiGa rural scenario-level channel models as .npy files.

Requirements

To be able to run these scripts you need the quadriga_src code in scenario_level_data/QuaDRiGa. Note that we already uploaded the source code of QuaDRiGa with its license included (Version 2021.07.12_v2.6.1-0, only non-commercial use allowed!). You can download the source code also here. Next to standard python packages you also require the h5py package.

Comment

Note that the generation of 50000 or more samples can take quite some time. We have executed the code on a regular CPU and generated 80000 samples per channel model. It took about a day for each dataset to get generated. These QuaDRiGa scripts have been also used in our ICML25 Paper (to see more details about the datasets).

Applying Signal Processing and Machine Learning on the Datasets

To run the scripts covering the autoencoder, the PCA, the LMMSE estimator, and the Gaussian sampling, you need to execute the main_compression_autoencoder.py, main_compression_pca.py, main_estimation_lmmse.py, and main_generation_sCov.py, respectively. Each of these scripts take parser arguments as input. The autoencoder architecture has been also used in our ICML25 Paper (to see more details about the architecture).

main_compression_autoencoder.py takes the dataset (e.g., quadriga_rural, tdl_a, ...), the latent dimension, the number of training samples, the number of test samples and the device (e.g., cpu, cuda:0, cuda:1, ...) as parser arguments (Example: python main_compression_autoencoder.py -ds quadriga_rural -latent_dim 64 -ntrain 60000 -ntest 10000 -device cuda:0
main_compression_pca.py takes the dataset (e.g., quadriga_rural, tdl_a, ...), the latent dimension, the number of training samples and the number of test samples as parser arguments. Note that the latent dimension is meant complex valued, which is why it is twice the latent dimension (degree of freedom) in our work (Example: python main_compression_pca.py -ds quadriga_rural -latent_dim 64 -ntrain 60000 -ntest 10000
main_estimation_lmmse.py takes the dataset (e.g., cdl_a, cdl_b, ...), the number of training samples, the number of test samples, and the snr in dB as parser arguments (Example: python main_estimation_lmmse.py -ds cdl_a -ntrain 60000 -ntest 10000 -snr_db 10
main_generation_sCov.py takes the dataset (e.g., cdl_a, cdl_b, ...), the number of training samples, the number of samples to be generated, and the snr in dB (for the spectral efficiency evaluation) as parser arguments (Example: python main_generation_sCov.py -ds cdl_a -ntrain 60000 -n_samples 10000 -snr_db 10

Note that you need to have generated the matching dataset in .npy format beforehand. The src directory contains the dataset configuration file configs/dataset.ini. This file stores the path to the datasets. Note that if you adapt the number of generated samples, you also need to adapt the path names within this file (e.g., data_path = link_level_data/ofdm_tdl_a_5000.npy if you have generated 5000 samples with the TDL-A dataset). The corresponding section headers (e.g., tdl_a are used as the ds parser argument in the main files). The src directory also contains utility functions in utils that comprise the linear methods, evaluation methods and general organization methods such as the generation of directories for saving the results. It also contains the modules storing the autoencoder architecture.

All main-files store the results in newly generated directories in the results directory. The results contain the experiment_config.json file as well as an .npz file that stores important results.

Requirements

Next to standard python packages, you also require torch. We used python 3.10, pytorch 2.5.1 and pytorch-cuda 12.1.

Toy Examples

We have uploaded toy datasets. To test out whether the code works for you, you should immediately be able to run experiments with the following commands:

python main_compression_pca.py -ds tdl_e -latent_dim 2 -ntrain 1500 -ntest 500
python main_compression_pca.py -ds quadriga_rural -latent_dim 64 -ntrain 1500 -ntest 500
python main_estimation_lmmse.py -ds cdl_e -ntrain 1500 -ntest 500 -snr_db 10

By doing so, there should be new directories in the results directory containing the results as .npz files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Wireless Channel Modeling for Machine Learning -
A Critical View on Standardized Channel Models

Citation

Overview

Generating Link-Level Channel Data

Generating Scenario-Level Channel Data

Applying Signal Processing and Machine Learning on the Datasets

Toy Examples

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
link_level_data		link_level_data
results/results_compression		results/results_compression
scenario_level_data		scenario_level_data
src		src
LICENSE		LICENSE
README.md		README.md
main_compression_autoencoder.py		main_compression_autoencoder.py
main_compression_pca.py		main_compression_pca.py
main_estimation_lmmse.py		main_estimation_lmmse.py
main_generation_sCov.py		main_generation_sCov.py

License

beneboeck/wireless-chan-mod4ml

Folders and files

Latest commit

History

Repository files navigation

Wireless Channel Modeling for Machine Learning - A Critical View on Standardized Channel Models

Citation

Overview

Generating Link-Level Channel Data

Generating Scenario-Level Channel Data

Applying Signal Processing and Machine Learning on the Datasets

Toy Examples

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Wireless Channel Modeling for Machine Learning -
A Critical View on Standardized Channel Models

Packages