Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
204 changes: 98 additions & 106 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,147 +1,139 @@
# Mobility HFL/VFL Simulator (Beta)
# Simulador de Federated Learning para Tráfego Urbano

The Mobility Federated Learning Simulator is a Flower-powered research and experimentation harness that exercises **horizontal (HFL)**, **vertical (VFL)**, and **hybrid** federation flows over realistic mobility datasets. The beta release emphasises reproducibility, observability, and short feedback loops so that new scenarios can be designed, executed, and compared within minutes.
## Visão geral

---
Este projeto implementa um simulador/benchmark de *federated learning* (FL) para previsão de tráfego urbano. Ele inclui:

## Key capabilities
* **Estratégias de federação** e ciclo completo de treino avaliável por rounds (`src/federation.py`, `src/runtime.py`).
* **Pipeline de dados** para leitura/preparo/particionamento em clientes (`src/data_pipeline.py`).
* **Modelos e baselines**, com codificadores auxiliares (`src/baselines.py`, `src/codificadores.py`).
* **Métricas, avaliação e relatórios** integrados (`src/metrics.py`, `src/evaluation.py`, `src/reporting.py`, `src/analysis.py`).
* **Validação de configuração** e **versionamento** (`src/config_loader.py`, `src/schema_validation.py`, `src/versioning.py`).
* **Testes** para componentes de federação (`tests/test_federation_components.py`).

- **Unified configuration-as-code** – A single YAML file (see [`configs/`](configs/)) declaratively defines datasets, client partitions, model families, and federation strategies. Variants are expanded deterministically and inherit global defaults.
- **Multi-paradigm federation** – Run HFL, VFL, or hybrid experiments locally (fast, dependency-light) or hand off to Flower/Ray when distributed simulation is required. Supported aggregators include `fedavg`, `fedprox`, `fedopt` (Adam/Yogi/Adagrad), `feddyn`, `fednova`, `scaffold`, and `rfa`.
- **Mobility-aware preprocessing** – The data pipeline normalises multi-resolution time series, derives rolling windows, synthesises vertical feature views, and preserves metadata such as segment length, free-flow speed, and congestion thresholds for downstream metrics.
- **Rich reporting artefacts** – Every run exports JSON summaries, CSV/Markdown tables, training curves, and optional raw time-series dumps. Metrics span mobility (TTI, delay, buffer and reliability indices), model quality (MAE, RMSE, MAPE, loss), and federation cost/robustness.
- **Safety rails for research** – Deterministic seeding, dataset validation, secure aggregation toggles, differential privacy hooks, communication-compression profiles, and CLI checks keep experiments auditable and repeatable.
Diagramas e descrições arquiteturais estão em `docs/`.

---
## Instalação

## Repository layout

```
federated_learning_traffic/
├── configs/ # YAML scenarios (baseline, benchmarks, full simulations)
├── data/raw/ # Synthetic but structured mobility datasets bundled with the simulator
├── docs/ # Architecture notes, academic summary, Mermaid diagrams
├── src/ # Python packages for configuration, data, modelling, federation, metrics, and reporting
├── tests/ # Pytest suites with parametrised scenarios and regression checks
├── requirements.txt / pyproject # Dependency manifests (pip or Poetry)
└── artifacts/ # Created at runtime – holds experiment outputs grouped by run/variant
```

---

## Installation

Choose one of the following workflows:

<details>
<summary><strong>Pip (lightweight)</strong></summary>
Pré-requisitos: Python 3.x e (opcional) CUDA.

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

</details>
# clone
git clone <este-repo>.git
cd <este-repo>

<details>
<summary><strong>Poetry (managed)</strong></summary>

```bash
# com Poetry (recomendado)
poetry install
poetry shell

# ou com pip
python -m venv .venv && source .venv/bin/activate # (Linux/macOS)
# .venv\Scripts\activate # (Windows)
pip install -r requirements.txt # se houver; caso contrário, adapte a partir do pyproject
```

</details>
## Estrutura de pastas (essencial para navegar e modificar)

The packaged datasets sit under `data/raw/`. If you plan to use your own data, mimic the column layout described in the [Datasets](#bundled-datasets) section.
```
src/
analysis.py
baselines.py
codificadores.py
config_loader.py
data_pipeline.py
evaluation.py
federation.py
main.py
metrics.py
privacy.py
reporting.py
runtime.py
schema_validation.py
tracking.py
versioning.py
configs/
config.yaml
cenarios_simulacao.yaml
docs/
arquitetura_do_software.md
Descricao_do_artigo_IEEE.md
Mobility_HFL_VFL_Simulator_architecture.mmd
tests/
test_federation_components.py
data/
raw/TaxiBJ21.npy # exemplo de dado cru
```

---
## Como executar (mínimo viável)

## Running simulations
1. **Configure o experimento** em `configs/config.yaml` (ou escolha um cenário em `configs/cenarios_simulacao.yaml`).
2. **Execute o *runner***:

1. **Pick or craft a configuration** – Start with [`configs/config.yaml`](configs/config.yaml) or the more exhaustive [`configs/full_simulation.yaml`](configs/full_simulation.yaml). Both files describe `global` defaults, `datasets`, and `experiment_sets`.
2. **Execute via the CLI**:
```bash
python -m src.main --config configs/config.yaml --mode hfl
```
- `--mode` overrides the `partitions.mode` declared per experiment (`hfl`, `vfl`, `hybrid`).
- `--variant` can restrict execution to a specific label defined under `experiment_sets[].variants`.
- Use `--list` to print a dry-run summary of the experiments without executing them.
3. **Inspect artefacts** – Results land in `<output_root>/<experiment_name>/<variant>/` (defaults to `artifacts/`). Expect:
- `summary.json` – consolidated metrics for the variant.
- `metrics_mobility.csv|md`, `metrics_learning.*`, `metrics_federated.*` – tabulated breakdowns.
- `training_curves.png` – loss/metric trajectories.
- Optional raw exports (`raw_timeseries.parquet`, `client_metrics.json`) when enabled via `reporting.export`.
```bash
# com Poetry
poetry run python -m src.main --config configs/config.yaml

To hand experiments off to Flower, set `global.execution.engine: flower` or override per experiment using `federation.engine`. Flower/Ray must be available in your environment.
# ou sem Poetry
python -m src.main --config configs/config.yaml
```

---
Saídas típicas:

## Configuration reference
* *Logs* de progresso por rodada/cliente.
* Artefatos de avaliação (métricas agregadas) e relatórios gerados por `src/evaluation.py` e `src/reporting.py`.
* Checagens de esquema e versão do experimento (via `schema_validation.py` e `versioning.py`).

Each experiment entry inside `experiment_sets` inherits from `global` defaults and then layers its own settings. The most relevant blocks are:
## Como os dados são preparados e particionados

- **`datasets`** – Maps dataset identifiers to loaders defined in `src/data_pipeline.py`. Required keys: `path`, `loader`, `time_column`, `feature_columns`, `target_column`, and optional `metadata` for richer reporting.
- **`partitions`** – Controls federation layout.
- `mode`: `hfl`, `vfl`, or `hybrid`.
- `num_clients` / `num_parties`: participant counts.
- `distribution`: `iid`, `non_iid`, or `dirichlet` (paired with `dirichlet_alpha`).
- `participation_rate`, `client_balance`, and optional `straggler_simulation` emulate real-world behaviour.
- **`model`** – Declares architecture (`gru`, `lstm`, `transformer`, `mlp`, or `gnn`) with architecture-specific parameters plus `input_window` and `forecast_horizon`.
- **`training`** – Sets optimiser (`adam`, `sgd`, `rmsprop`), `local_epochs`, `batch_size`, `learning_rate`, gradient clipping, dropout, and optional communication/differential-privacy hooks.
- **`federation`** – Chooses aggregator, number of rounds, execution engine, and strategy-specific knobs (`proximal_mu`, server optimisers for FedOpt, minimum clients, etc.). Communication compression and secure aggregation are nested here.
- **`reporting`** – Selects which metric families to compute and which artefacts to emit.
O arquivo `src/data_pipeline.py` centraliza:

For heavily customised studies you can include `evaluation` blocks (offline test sets), `tracking` (MLflow integration via `src/tracking.py`), and `privacy.dp` settings for gradient perturbation.
* Leitura dos *datasets* suportados;
* Normalização/escala e janelas temporais;
* **Particionamento em clientes** (para simular FL).

---
A execução do `src.main` carrega os dados declarados no YAML, aplica validações de esquema e grava as partições temporais antes do treino. Reexecute o comando principal sempre que ajustar o `config.yaml` para regenerar os blocos.

## Bundled datasets
> Dica: mantenha *hashes* e *seeds* dentro do YAML para reprodutibilidade. O projeto já possui validação de esquema em `src/schema_validation.py`.

| Key | File | Loader | Description |
| --- | --- | ------ | ----------- |
| `metr_la` | `data/raw/metr-la.csv` | `metr_la_speed` | Five-minute speed readings from freeway loop detectors with congestion annotations.|
| `sunt_boardings` | `data/raw/boarding-2024-03-01.parquet` | `sunt_boardings` | Daily bus boardings summarised per stop order, capturing demand fluctuations and boarding deltas.|
| `taxi_beijing` | `data/raw/TaxiBJ21.npy` | `taxi_beijing_grid` | Hourly taxi grid intensities aggregated into mean/sum/max/min features and target flow volumes.|
## Ciclo de federação (o que acontece de fato)

To plug in external data, implement a loader in `src/data_pipeline.py` (see `load_metr_la`, `load_sunt_boardings`, `load_taxi_beijing`) and declare it in the configuration. The loader must yield normalised tensors along with mobility metadata for metrics.
A lógica de rounds/seleção de clientes/agregação ocorre em:

---
* **`src/federation.py`**: define componentes de federação, incluindo agregação de pesos/gradientes e controle de *rounds*.
* **`src/runtime.py`**: orquestra a execução (ordem das etapas, *hooks*, checkpoints se aplicável).

## Reporting & analysis
Para alterar o comportamento de seleção de clientes, agregação ou número de *rounds*, modifique o YAML e/ou os métodos em `src/federation.py`.

`src/reporting.py` collates metrics from the federation layer and from post-hoc evaluation:
## Métricas e avaliação

- **Mobility** – Travel Time Index, delay, congestion ratio, buffer index, reliability index, throughput, and custom KPIs derived from dataset metadata.
- **Learning** – RMSE, MAE, MAPE, MSE, final/average loss, coefficient of determination (where available).
- **Federated** – Communication cost (bytes with/without compression), participation rate, robustness (late/failed clients), aggregator selection, and DP/secure aggregation status.
* **`src/metrics.py`** implementa métricas como MAE, RMSE, MAPE e R².
* **`src/evaluation.py`** executa a avaliação padronizada após o treino, salvando indicadores e, quando configurado, comparativos entre cenários.
* **`src/reporting.py`** e **`src/analysis.py`** geram tabelas e gráficos consolidados.

All metrics are logged to stdout, saved to tables, and summarised in JSON. If MLflow tracking is configured, these values are also recorded under the specified experiment name for cross-run comparisons.
Para reavaliar modelos a partir de checkpoints, importe `src.evaluation` e utilize `reconstruct_model`/`predict_splits` em um script próprio, reaproveitando o YAML original para carregar o experimento via `config_loader`.

---
## Como modificar: extensões comuns (com ponteiros de arquivo)

## Development workflow
* **Adicionar/alterar um *baseline***: edite `src/baselines.py`.
* **Criar novo codificador/transformação**: `src/codificadores.py`.
* **Incluir/ajustar métrica**: `src/metrics.py` (e registre em `src/evaluation.py`).
* **Nova estratégia de federação**: `src/federation.py` (adicione classe/método e exponha no `config.yaml`).
* **Rastrear informações extras durante o treino**: `src/tracking.py`.

1. **Run tests** – Execute the end-to-end suite with coverage to ensure configuration, data, and federation changes remain stable:
```bash
pytest --cov=src --cov-report=term-missing
```
2. **Static checks** – Optional helpers include `ruff` (style) and `mypy` (typing) should you wish to extend the simulator with stricter guarantees.
3. **Regenerate docs** – Update the Markdown files inside `docs/` and the Mermaid diagram whenever architecture or workflows evolve.
> Sempre alinhe o **YAML** com `src/schema_validation.py` para que a validação capture erros cedo.

---
## Testes

## Troubleshooting
Há testes para componentes de federação em `tests/test_federation_components.py`. Execute:

- **Configuration errors** – The loader validates dataset presence, supported aggregators, and experiment completeness. Check the CLI output for highlighted blocks when a run aborts early.
- **Slow experiments** – Reduce `training.local_epochs`, cap `training.max_batches_per_epoch`, or switch `global.execution.engine` to `local` for quick iteration.
- **Flower dependency issues** – When running purely local simulations, the Flower and Ray extras are optional. Only install them when using distributed engines.
- **Artifacts missing** – Confirm `global.output_root` exists or is creatable and that the process has write permissions.
```bash
pytest -q
```

---
Mantenha novos recursos acompanhados por testes mínimos (ex.: seleção de clientes, agregação, métricas regressando valores esperados).

## Citing or extending the simulator
## Documentação complementar

If you use this simulator for academic work, cite the accompanying IEEE-style summary in [`docs/Descricao_do_artigo_IEEE.md`](docs/Descricao_do_artigo_IEEE.md). Contributions are welcome: open an issue outlining the dataset, model, or aggregator you intend to add so that maintainers can help scope the integration.
* **Arquitetura de software**: `docs/arquitetura_do_software.md`
* **Descrição associada ao artigo IEEE**: `docs/Descricao_do_artigo_IEEE.md`
* **Diagrama Mermaid (HFL/VFL)**: `docs/Mobility_HFL_VFL_Simulator_architecture.mmd`

Loading