Welcome to the repository for the paper "Wireless Channel Modeling for Machine Learning - A Critical View on Standardized Channel Models".
Here you find a link to a pre-print of our work.
If you use this code for your research, please cite our pre-print
@misc{boeck2025wireless_ch_mod4ml,
title={Wireless Channel Modeling for Machine Learning - A Critical View on Standardized Channel Models},
author={Benedikt B\"ock and Amar Kasibovic and Wolfgang Utschick},
year={2025},
eprint={2510.12279},
archivePrefix={arXiv},
primaryClass={eess.SP},
note = {arXiv:2510.12279},
url={https://arxiv.org/abs/2510.12279},
}The provided code is split into different parts:
- Code for generating link-level (TDL and CDL) channel data (using the 5G Toolbox of Matlab) (
MATLAB) - Code for generating scenario-level (QuaDRiGa) channel data (using the QuaDRiGa source code) (
MATLAB) - Code for the Autoencoder applied to CSI compression (
python) - Code for the linear methods, i.e., the PCA, the LMMSE estimator, and the sample covariance Gaussian sampling (
python)
- Run the scripts
link_level_data/TDLCDL/generate_cdl.morlink_level_data/TDLCDL/generate_tdl.mto generate link-level channel data. You can customize the type of model (TDL-A,TDL-B, ..) as well as configuration parameters such as the number of subcarriers within the scripts. The datasets are stored inlink_level_data/TDLCDL. The file names contain the type of model as well as the number of generated samples. - To transform the
.matdatasets to.npyrun themat_to_py.pyfile inlink_level_data. It takes as input the particular system (ofdmormimo), as well as the number of samples in the dataset, and the particular dataset (tdl_a,tdl_b,...) as parser arguments.
An example would bepython mat_to_py.py -system ofdm -n_samples 80000 -ds tdl_aorpython mat_to_py.py -system mimo -n_samples 2000 -ds cdl_e. Note that you must have generated a MATLAB dataset with the matching configuration beforehand. - We included toy datasets with 2000 samples from the TDL-E and CDL-E link-level channel models as
.npyfiles.
Requirements
To be able to run these scripts you need the 5G Toolbox of MATLAB. We used MATLAB R2025b. Next to standard python packages you also require the h5py package.
Comment
Note that the generation of 50000 or more samples can take quite some time. We have executed the code on a regular CPU and generated 80000 samples per channel model. It took about a day for each dataset to get generated.
- Run the scripts
scenario_level_data/QuaDRiGa/generate_channels_with_structured_layout_rural.morscenario_level_data/QuaDRiGa/generate_channels_with_structured_layout_urban.mto generate OFDM scenario-level channel data. You can customize configuration parameters such as the number of subcarriers within the scripts. The datasets are stored inscenario_level_data/QuaDRiGa. The file names contain the type of model as well as the number of generated samples. - To transform the
.matdatasets to.npyrun themat_to_py.pyfile inscenario_level_data. It takes as input the number of samples in the dataset, and the particular dataset (ruralorurban) as parser arguments.
An example would bepython mat_to_py.py -n_samples 80000 -ds rural. Note that you must have generated a MATLAB dataset with the matching configuration beforehand. - We included toy datasets with 2000 samples from the QuaDRiGa rural scenario-level channel models as
.npyfiles.
Requirements
To be able to run these scripts you need the quadriga_src code in scenario_level_data/QuaDRiGa. Note that we already uploaded the source code of QuaDRiGa with its license included (Version 2021.07.12_v2.6.1-0, only non-commercial use allowed!). You can download the source code also here. Next to standard python packages you also require the h5py package.
Comment
Note that the generation of 50000 or more samples can take quite some time. We have executed the code on a regular CPU and generated 80000 samples per channel model. It took about a day for each dataset to get generated. These QuaDRiGa scripts have been also used in our ICML25 Paper (to see more details about the datasets).
To run the scripts covering the autoencoder, the PCA, the LMMSE estimator, and the Gaussian sampling, you need to execute the main_compression_autoencoder.py, main_compression_pca.py, main_estimation_lmmse.py, and main_generation_sCov.py, respectively. Each of these scripts take parser arguments as input. The autoencoder architecture has been also used in our ICML25 Paper (to see more details about the architecture).
-
main_compression_autoencoder.pytakes the dataset (e.g.,quadriga_rural,tdl_a, ...), the latent dimension, the number of training samples, the number of test samples and the device (e.g.,cpu,cuda:0,cuda:1, ...) as parser arguments (Example:python main_compression_autoencoder.py -ds quadriga_rural -latent_dim 64 -ntrain 60000 -ntest 10000 -device cuda:0 -
main_compression_pca.pytakes the dataset (e.g.,quadriga_rural,tdl_a, ...), the latent dimension, the number of training samples and the number of test samples as parser arguments. Note that the latent dimension is meant complex valued, which is why it is twice the latent dimension (degree of freedom) in our work (Example:python main_compression_pca.py -ds quadriga_rural -latent_dim 64 -ntrain 60000 -ntest 10000 -
main_estimation_lmmse.pytakes the dataset (e.g.,cdl_a,cdl_b, ...), the number of training samples, the number of test samples, and the snr in dB as parser arguments (Example:python main_estimation_lmmse.py -ds cdl_a -ntrain 60000 -ntest 10000 -snr_db 10 -
main_generation_sCov.pytakes the dataset (e.g.,cdl_a,cdl_b, ...), the number of training samples, the number of samples to be generated, and the snr in dB (for the spectral efficiency evaluation) as parser arguments (Example:python main_generation_sCov.py -ds cdl_a -ntrain 60000 -n_samples 10000 -snr_db 10
Note that you need to have generated the matching dataset in .npy format beforehand. The src directory contains the dataset configuration file configs/dataset.ini. This file stores the path to the datasets. Note that if you adapt the number of generated samples, you also need to adapt the path names within this file (e.g., data_path = link_level_data/ofdm_tdl_a_5000.npy if you have generated 5000 samples with the TDL-A dataset). The corresponding section headers (e.g., tdl_a are used as the ds parser argument in the main files). The src directory also contains utility functions in utils that comprise the linear methods, evaluation methods and general organization methods such as the generation of directories for saving the results. It also contains the modules storing the autoencoder architecture.
All main-files store the results in newly generated directories in the results directory. The results contain the experiment_config.json file as well as an .npz file that stores important results.
Requirements
Next to standard python packages, you also require torch. We used python 3.10, pytorch 2.5.1 and pytorch-cuda 12.1.
We have uploaded toy datasets. To test out whether the code works for you, you should immediately be able to run experiments with the following commands:
python main_compression_pca.py -ds tdl_e -latent_dim 2 -ntrain 1500 -ntest 500python main_compression_pca.py -ds quadriga_rural -latent_dim 64 -ntrain 1500 -ntest 500python main_estimation_lmmse.py -ds cdl_e -ntrain 1500 -ntest 500 -snr_db 10
By doing so, there should be new directories in the results directory containing the results as .npz files.