MARS-VFL: A Unified Benchmark for Vertical Federated Learning with Realistic Evaluation

The official codes of MARS-VFL, by MARS Group at the Wuhan University, led by Prof. Mang Ye.

MARS-VFL: A Unified Benchmark for Vertical Federated Learning with Realistic Evaluation

Wei Shen, Weiqi Liu, Mingde Chen, Wenke Huang, Mang Ye

Wuhan University

Abstract Vertical Federated Learning (VFL) has emerged as a critical privacy-preserving learning paradigm, enabling collaborative model training by leveraging distributed features across clients. However, due to privacy concerns, there are few publicly available real-world datasets for evaluating VFL methods, which poses significant challenges to related research. To bridge this gap, we propose MARS-VFL, a unified benchmark for realistic VFL evaluation. It integrates data from practical applications involving collaboration across different features, maintaining compatibility with the VFL setting. Based on this, we standardize the evaluation of VFL methods from the mainstream aspects of efficiency, robustness, and security. We conduct comprehensive experiments to assess different VFL approaches, providing references for unified evaluation. Furthermore, we are the first to unify the evaluation of robustness challenges in VFL and introduce a new method for addressing robustness challenges, establishing standard baselines for future research.

Last Update

2025/5/13 We have released the codes.

Guidelines

1. Setup

All experiments were conducted on a server with 8 NVIDIA GeForce RTX 4090 GPUs. Clone the repository and install the dependencies from requirements.txt using the Anaconda environment:

conda create -n marsvfl python=3.9
conda activate marsvfl
git clone 'https://github.com/shentt67/MARS-VFL.git'
cd MARS-VFL
pip install requirements.txt

2. Datasets

We use 12 different public datasets across five real-world applications:

Human Activity Recognition

UCI-HAR: UCI-HAR.
KU-HAR: KU-HAR.

Robotics

MUJOCO: Download the data from links: gentle_push_10.hdf5, gentle_push_300.hdf5, gentle_push_1000.hdf5.
VISION&TOUCH: VISION&TOUCH.

Healthcare

MIMIC-III: MIMIC-III.
PTB-XL: PTB-XL.

Emotion Analysis

UR-FUNNY: UR-FUNNY.
MUSTARD: MUSTARD.
CMU-MOSI: CMU-MOSI.
CMU-MOSEI: CMU-MOSEI.

Multimedia

NUS-WIDE: NUS-WIDE.
MM-IMDB: MM-IMDB.

3. Evaluation

Base usage

For instance, to run the basic pipeline of VFL on the UCI-HAR dataset:

python main.py --dataset UCIHAR --device 0 --epoch 150 --batch_size 256 --lr 0.01 --client_num 2 --aggregation concat --method_name base --optimizer sgd --seeds 100

Efficiency

We include three different metrics to evaluate the efficiency of various methods. To run the basic pipeline and perform efficiency evaluation:

python main.py --dataset UCIHAR --device 0 --epoch 150 --batch_size 256 --lr 0.01 --client_num 2 --aggregation concat --eval_mode efficiency --method_name base --optimizer sgd --seeds 100, 200, 300, 400, 500

Change the --method_name argument to evaluate different methods (fedbcd, cvfl and efvfl).

Robustness

To execute the base pipline and evaluate the robustness:

python main.py --dataset UCIHAR --device 0 --epoch 150 --batch_size 256 --lr 0.01 --client_num 2 --aggregation concat --eval_mode robustness --perturb_type missing --perturb_rate_train 0 --perturb_rate_test 0 --method_name base --optimizer sgd --seeds 100

Change the --perturb_type argument for different perturbations (missing, corrupted, misaligned), and Change the --method_name argument for different methods (leefvfl, laservfl, rvflaug, rvflalign).

Security

For instance, to evaluate the pmc method on UCI-HAR dataset:

python main.py --dataset UCIHAR --device 0 --epoch 150 --batch_size 64 --lr 0.01 --client_num 2 --aggregation concat --eval_mode security --method_name pmc --optimizer sgd --seeds 100

Change the --method_name argument for evaluating different methods (pmc, amc, grna, mia, tecb, lfba).

4. Integrate New Methods

The implementations of all evaluated methods are provided in .\method, and can be easily extended to include new methods.

Contact

weishen@whu.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dataset		dataset
method		method
model		model
utils		utils
README.md		README.md
framework.svg		framework.svg
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MARS-VFL: A Unified Benchmark for Vertical Federated Learning with Realistic Evaluation

Last Update

Guidelines

1. Setup

2. Datasets

Human Activity Recognition

Robotics

Healthcare

Emotion Analysis

Multimedia

3. Evaluation

Base usage

Efficiency

Robustness

Security

4. Integrate New Methods

Contact

About

Uh oh!

Releases

Packages

Languages

Lupecal/MARS-VFL

Folders and files

Latest commit

History

Repository files navigation

MARS-VFL: A Unified Benchmark for Vertical Federated Learning with Realistic Evaluation

Last Update

Guidelines

1. Setup

2. Datasets

Human Activity Recognition

Robotics

Healthcare

Emotion Analysis

Multimedia

3. Evaluation

Base usage

Efficiency

Robustness

Security

4. Integrate New Methods

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages