MatteusStranger · MatteusStranger · Oct 28, 2025
diff --git a/README.md b/README.md
@@ -1,147 +1,139 @@
-# Mobility HFL/VFL Simulator (Beta)
+# Simulador de Federated Learning para Tráfego Urbano
 
-The Mobility Federated Learning Simulator is a Flower-powered research and experimentation harness that exercises **horizontal (HFL)**, **vertical (VFL)**, and **hybrid** federation flows over realistic mobility datasets. The beta release emphasises reproducibility, observability, and short feedback loops so that new scenarios can be designed, executed, and compared within minutes.
+## Visão geral
 
----
+Este projeto implementa um simulador/benchmark de *federated learning* (FL) para previsão de tráfego urbano. Ele inclui:
 
-## Key capabilities
+* **Estratégias de federação** e ciclo completo de treino avaliável por rounds (`src/federation.py`, `src/runtime.py`).
+* **Pipeline de dados** para leitura/preparo/particionamento em clientes (`src/data_pipeline.py`).
+* **Modelos e baselines**, com codificadores auxiliares (`src/baselines.py`, `src/codificadores.py`).
+* **Métricas, avaliação e relatórios** integrados (`src/metrics.py`, `src/evaluation.py`, `src/reporting.py`, `src/analysis.py`).
+* **Validação de configuração** e **versionamento** (`src/config_loader.py`, `src/schema_validation.py`, `src/versioning.py`).
+* **Testes** para componentes de federação (`tests/test_federation_components.py`).
 
-- **Unified configuration-as-code** – A single YAML file (see [`configs/`](configs/)) declaratively defines datasets, client partitions, model families, and federation strategies. Variants are expanded deterministically and inherit global defaults.
-- **Multi-paradigm federation** – Run HFL, VFL, or hybrid experiments locally (fast, dependency-light) or hand off to Flower/Ray when distributed simulation is required. Supported aggregators include `fedavg`, `fedprox`, `fedopt` (Adam/Yogi/Adagrad), `feddyn`, `fednova`, `scaffold`, and `rfa`.
-- **Mobility-aware preprocessing** – The data pipeline normalises multi-resolution time series, derives rolling windows, synthesises vertical feature views, and preserves metadata such as segment length, free-flow speed, and congestion thresholds for downstream metrics.
-- **Rich reporting artefacts** – Every run exports JSON summaries, CSV/Markdown tables, training curves, and optional raw time-series dumps. Metrics span mobility (TTI, delay, buffer and reliability indices), model quality (MAE, RMSE, MAPE, loss), and federation cost/robustness.
-- **Safety rails for research** – Deterministic seeding, dataset validation, secure aggregation toggles, differential privacy hooks, communication-compression profiles, and CLI checks keep experiments auditable and repeatable.
+Diagramas e descrições arquiteturais estão em `docs/`.
 
----
+## Instalação
 
-## Repository layout
-
-```
-federated_learning_traffic/
-├── configs/                      # YAML scenarios (baseline, benchmarks, full simulations)
-├── data/raw/                     # Synthetic but structured mobility datasets bundled with the simulator
-├── docs/                         # Architecture notes, academic summary, Mermaid diagrams
-├── src/                          # Python packages for configuration, data, modelling, federation, metrics, and reporting
-├── tests/                        # Pytest suites with parametrised scenarios and regression checks
-├── requirements.txt / pyproject  # Dependency manifests (pip or Poetry)
-└── artifacts/                    # Created at runtime – holds experiment outputs grouped by run/variant
-```
-
----
-
-## Installation
-
-Choose one of the following workflows:
-
-<details>
-<summary><strong>Pip (lightweight)</strong></summary>
+Pré-requisitos: Python 3.x e (opcional) CUDA.
 
 ```bash
-python -m venv .venv
-source .venv/bin/activate
-pip install -r requirements.txt
-```
-
-</details>
+# clone
+git clone <este-repo>.git
+cd <este-repo>
 
-<details>
-<summary><strong>Poetry (managed)</strong></summary>
-
-```bash
+# com Poetry (recomendado)
 poetry install
-poetry shell
+
+# ou com pip
+python -m venv .venv && source .venv/bin/activate  # (Linux/macOS)
+# .venv\Scripts\activate                           # (Windows)
+pip install -r requirements.txt  # se houver; caso contrário, adapte a partir do pyproject
 ```
 
-</details>
+## Estrutura de pastas (essencial para navegar e modificar)
 
-The packaged datasets sit under `data/raw/`. If you plan to use your own data, mimic the column layout described in the [Datasets](#bundled-datasets) section.
+```
+src/
+  analysis.py
+  baselines.py
+  codificadores.py
+  config_loader.py
+  data_pipeline.py
+  evaluation.py
+  federation.py
+  main.py
+  metrics.py
+  privacy.py
+  reporting.py
+  runtime.py
+  schema_validation.py
+  tracking.py
+  versioning.py
+configs/
+  config.yaml
+  cenarios_simulacao.yaml
+docs/
+  arquitetura_do_software.md
+  Descricao_do_artigo_IEEE.md
+  Mobility_HFL_VFL_Simulator_architecture.mmd
+tests/
+  test_federation_components.py
+data/
+  raw/TaxiBJ21.npy   # exemplo de dado cru
+```
 
----
+## Como executar (mínimo viável)
 
-## Running simulations
+1. **Configure o experimento** em `configs/config.yaml` (ou escolha um cenário em `configs/cenarios_simulacao.yaml`).
+2. **Execute o *runner***:
 
-1. **Pick or craft a configuration** – Start with [`configs/config.yaml`](configs/config.yaml) or the more exhaustive [`configs/full_simulation.yaml`](configs/full_simulation.yaml). Both files describe `global` defaults, `datasets`, and `experiment_sets`.
-2. **Execute via the CLI**:
-   ```bash
-   python -m src.main --config configs/config.yaml --mode hfl
-   ```
-   - `--mode` overrides the `partitions.mode` declared per experiment (`hfl`, `vfl`, `hybrid`).
-   - `--variant` can restrict execution to a specific label defined under `experiment_sets[].variants`.
-   - Use `--list` to print a dry-run summary of the experiments without executing them.
-3. **Inspect artefacts** – Results land in `<output_root>/<experiment_name>/<variant>/` (defaults to `artifacts/`). Expect:
-   - `summary.json` – consolidated metrics for the variant.
-   - `metrics_mobility.csv|md`, `metrics_learning.*`, `metrics_federated.*` – tabulated breakdowns.
-   - `training_curves.png` – loss/metric trajectories.
-   - Optional raw exports (`raw_timeseries.parquet`, `client_metrics.json`) when enabled via `reporting.export`.
+```bash
+# com Poetry
+poetry run python -m src.main --config configs/config.yaml
 
-To hand experiments off to Flower, set `global.execution.engine: flower` or override per experiment using `federation.engine`. Flower/Ray must be available in your environment.
+# ou sem Poetry
+python -m src.main --config configs/config.yaml
+```
 
----
+Saídas típicas:
 
-## Configuration reference
+* *Logs* de progresso por rodada/cliente.
+* Artefatos de avaliação (métricas agregadas) e relatórios gerados por `src/evaluation.py` e `src/reporting.py`.
+* Checagens de esquema e versão do experimento (via `schema_validation.py` e `versioning.py`).
 
-Each experiment entry inside `experiment_sets` inherits from `global` defaults and then layers its own settings. The most relevant blocks are:
+## Como os dados são preparados e particionados
 
-- **`datasets`** – Maps dataset identifiers to loaders defined in `src/data_pipeline.py`. Required keys: `path`, `loader`, `time_column`, `feature_columns`, `target_column`, and optional `metadata` for richer reporting.
-- **`partitions`** – Controls federation layout.
-  - `mode`: `hfl`, `vfl`, or `hybrid`.
-  - `num_clients` / `num_parties`: participant counts.
-  - `distribution`: `iid`, `non_iid`, or `dirichlet` (paired with `dirichlet_alpha`).
-  - `participation_rate`, `client_balance`, and optional `straggler_simulation` emulate real-world behaviour.
-- **`model`** – Declares architecture (`gru`, `lstm`, `transformer`, `mlp`, or `gnn`) with architecture-specific parameters plus `input_window` and `forecast_horizon`.
-- **`training`** – Sets optimiser (`adam`, `sgd`, `rmsprop`), `local_epochs`, `batch_size`, `learning_rate`, gradient clipping, dropout, and optional communication/differential-privacy hooks.
-- **`federation`** – Chooses aggregator, number of rounds, execution engine, and strategy-specific knobs (`proximal_mu`, server optimisers for FedOpt, minimum clients, etc.). Communication compression and secure aggregation are nested here.
-- **`reporting`** – Selects which metric families to compute and which artefacts to emit.
+O arquivo `src/data_pipeline.py` centraliza:
 
-For heavily customised studies you can include `evaluation` blocks (offline test sets), `tracking` (MLflow integration via `src/tracking.py`), and `privacy.dp` settings for gradient perturbation.
+* Leitura dos *datasets* suportados;
+* Normalização/escala e janelas temporais;
+* **Particionamento em clientes** (para simular FL).
 
----
+A execução do `src.main` carrega os dados declarados no YAML, aplica validações de esquema e grava as partições temporais antes do treino. Reexecute o comando principal sempre que ajustar o `config.yaml` para regenerar os blocos.
 
-## Bundled datasets
+> Dica: mantenha *hashes* e *seeds* dentro do YAML para reprodutibilidade. O projeto já possui validação de esquema em `src/schema_validation.py`.
 
-| Key | File | Loader | Description |
-| --- | --- | ------ | ----------- |
-| `metr_la` | `data/raw/metr-la.csv` | `metr_la_speed` | Five-minute speed readings from freeway loop detectors with congestion annotations.|
-| `sunt_boardings` | `data/raw/boarding-2024-03-01.parquet` | `sunt_boardings` | Daily bus boardings summarised per stop order, capturing demand fluctuations and boarding deltas.|
-| `taxi_beijing` | `data/raw/TaxiBJ21.npy` | `taxi_beijing_grid` | Hourly taxi grid intensities aggregated into mean/sum/max/min features and target flow volumes.|
+## Ciclo de federação (o que acontece de fato)
 
-To plug in external data, implement a loader in `src/data_pipeline.py` (see `load_metr_la`, `load_sunt_boardings`, `load_taxi_beijing`) and declare it in the configuration. The loader must yield normalised tensors along with mobility metadata for metrics.
+A lógica de rounds/seleção de clientes/agregação ocorre em:
 
----
+* **`src/federation.py`**: define componentes de federação, incluindo agregação de pesos/gradientes e controle de *rounds*.
+* **`src/runtime.py`**: orquestra a execução (ordem das etapas, *hooks*, checkpoints se aplicável).
 
-## Reporting & analysis
+Para alterar o comportamento de seleção de clientes, agregação ou número de *rounds*, modifique o YAML e/ou os métodos em `src/federation.py`.
 
-`src/reporting.py` collates metrics from the federation layer and from post-hoc evaluation:
+## Métricas e avaliação
 
-- **Mobility** – Travel Time Index, delay, congestion ratio, buffer index, reliability index, throughput, and custom KPIs derived from dataset metadata.
-- **Learning** – RMSE, MAE, MAPE, MSE, final/average loss, coefficient of determination (where available).
-- **Federated** – Communication cost (bytes with/without compression), participation rate, robustness (late/failed clients), aggregator selection, and DP/secure aggregation status.
+* **`src/metrics.py`** implementa métricas como MAE, RMSE, MAPE e R².
+* **`src/evaluation.py`** executa a avaliação padronizada após o treino, salvando indicadores e, quando configurado, comparativos entre cenários.
+* **`src/reporting.py`** e **`src/analysis.py`** geram tabelas e gráficos consolidados.
 
-All metrics are logged to stdout, saved to tables, and summarised in JSON. If MLflow tracking is configured, these values are also recorded under the specified experiment name for cross-run comparisons.
+Para reavaliar modelos a partir de checkpoints, importe `src.evaluation` e utilize `reconstruct_model`/`predict_splits` em um script próprio, reaproveitando o YAML original para carregar o experimento via `config_loader`.
 
----
+## Como modificar: extensões comuns (com ponteiros de arquivo)
 
-## Development workflow
+* **Adicionar/alterar um *baseline***: edite `src/baselines.py`.
+* **Criar novo codificador/transformação**: `src/codificadores.py`.
+* **Incluir/ajustar métrica**: `src/metrics.py` (e registre em `src/evaluation.py`).
+* **Nova estratégia de federação**: `src/federation.py` (adicione classe/método e exponha no `config.yaml`).
+* **Rastrear informações extras durante o treino**: `src/tracking.py`.
 
-1. **Run tests** – Execute the end-to-end suite with coverage to ensure configuration, data, and federation changes remain stable:
-   ```bash
-   pytest --cov=src --cov-report=term-missing
-   ```
-2. **Static checks** – Optional helpers include `ruff` (style) and `mypy` (typing) should you wish to extend the simulator with stricter guarantees.
-3. **Regenerate docs** – Update the Markdown files inside `docs/` and the Mermaid diagram whenever architecture or workflows evolve.
+> Sempre alinhe o **YAML** com `src/schema_validation.py` para que a validação capture erros cedo.
 
----
+## Testes
 
-## Troubleshooting
+Há testes para componentes de federação em `tests/test_federation_components.py`. Execute:
 
-- **Configuration errors** – The loader validates dataset presence, supported aggregators, and experiment completeness. Check the CLI output for highlighted blocks when a run aborts early.
-- **Slow experiments** – Reduce `training.local_epochs`, cap `training.max_batches_per_epoch`, or switch `global.execution.engine` to `local` for quick iteration.
-- **Flower dependency issues** – When running purely local simulations, the Flower and Ray extras are optional. Only install them when using distributed engines.
-- **Artifacts missing** – Confirm `global.output_root` exists or is creatable and that the process has write permissions.
+```bash
+pytest -q
+```
 
----
+Mantenha novos recursos acompanhados por testes mínimos (ex.: seleção de clientes, agregação, métricas regressando valores esperados).
 
-## Citing or extending the simulator
+## Documentação complementar
 
-If you use this simulator for academic work, cite the accompanying IEEE-style summary in [`docs/Descricao_do_artigo_IEEE.md`](docs/Descricao_do_artigo_IEEE.md). Contributions are welcome: open an issue outlining the dataset, model, or aggregator you intend to add so that maintainers can help scope the integration.
+* **Arquitetura de software**: `docs/arquitetura_do_software.md`
+* **Descrição associada ao artigo IEEE**: `docs/Descricao_do_artigo_IEEE.md`
+* **Diagrama Mermaid (HFL/VFL)**: `docs/Mobility_HFL_VFL_Simulator_architecture.mmd`