This project implements a Federated Learning simulation using the Flower framework and TensorFlow. It includes a simulated blockchain-based audit trail (Model Registry, Inference Log, Feedback Log), SHAP for explainable AI, and EZKL for generating Zero-Knowledge Proofs of the model inferences.
src/: Contains the source code for the simulation.client.py:FlowerClientimplementations (Honest and Malicious).strategy.py: Custom server strategy to capture aggregated model weights.model.py: Keras model definition.data_processing.py: Data loading, preprocessing, and partitioning.blockchain_sim.py: Simulation of smart contracts/ledgers using Pandas.explainability.py: Logic for SHAP analysis and summary plots.zk_proof.py: Logic for converting models to ONNX and generating ZKP using EZKL.
data/: Directory to store the datasets.notebooks/: Original Jupyter notebooks.main.py: The entry point to run the full simulation workflow.
- Federated Learning Simulation: Simulates multiple clients (honest and malicious) training a model collaboratively without sharing raw data.
- Blockchain Audit Trail: Logs model updates and inference events to a simulated tamper-proof ledger.
- Explainable AI (XAI): Uses SHAP (SHapley Additive exPlanations) to explain global model predictions.
- Zero-Knowledge Proofs (ZKP): Uses EZKL to generate validity proofs for model inferences, ensuring computational integrity without revealing weights.
- Python 3.9+
- pip
-
Clone the repository:
git clone <repository_url> cd trustflow-task
-
Install the required dependencies:
pip install -r requirements.txt
This project supports two datasets: Heart Disease and Breast Cancer. You need to download them manually and place them in the data/ directory.
- Source: Kaggle - Heart Disease Dataset
- Action: Download
heart.csvand place it indata/heart.csv.
- Source: Kaggle - Breast Cancer Wisconsin (Diagnostic) Data
- Action: Download
data.csvand place it indata/data.csv.
To switch datasets, modify the DATASET_NAME constant in main.py (default is 'heart_disease').
To start the full workflow simulation:
python main.pyThis will:
- Initialize the simulated blockchain ledgers.
- Load and partition the data.
- Start a simulation of Federated Averaging (default 5 rounds).
- Log the final global model to the Model Registry.
- Run SHAP analysis to generate explanations for test set predictions.
- Run EZKL to generate a Zero-Knowledge Proof for a sample inference.
- Save metrics and charts (e.g.,
simulation_results.png,zkp_overhead_chart.png).