feat(survivability): add Phase 2 infrastructure for survivability v1 extensions #132
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
✨ Feature Pull Request
Related Feature Request:
Implements Phase 2 Infrastructure as specified in docs/survivability-v1/phase2-infrastructure/
Feature Summary:
This PR implements the Phase 2 infrastructure foundation for FUSION's survivability v1 extensions. It adds four critical modules that enable network failure simulation, protection schemes, offline RL policies, and reproducible experiments.
🎯 Feature Implementation
Components Added/Modified:
fusion/configs/)fusion/core/)fusion/modules/routing/)fusion/modules/failures/)tests/)New Modules Created:
Failures Module (
fusion/modules/failures/)errors.py: Custom exception hierarchy (FailureError, FailureConfigError, etc.)failure_types.py: F1-F4 failure implementations (link, node, SRLG, geographic)failure_manager.py: Core FailureManager class for injection, tracking, and feasibilityregistry.py: Failure handler lookup systemK-Path Cache (
fusion/modules/routing/)k_path_cache.py: Pre-computed K shortest paths with feature extractionConfiguration System Extensions (
fusion/configs/)templates/survivability_experiment.ini: Configuration template for survivability experimentsschemas/survivability.json: JSON Schema validation for survivability configsvalidate.py: Extended with survivability-specific validation functionsDeterminism & Seed Management (
fusion/core/)simulation.py: Extended with seed_all_rngs(), validate_seed(), generate_seed_from_time()batch_runner.py: Extended with run_multi_seed_experiment()New Dependencies:
None - all modules use existing dependencies (networkx, numpy, pytest)
Configuration Changes:
```ini
New survivability experiment template
[failure_settings]
failure_type = none # none, link, node, srlg, geo
t_fail_arrival_index = -1
t_repair_after_arrivals = 1000
failed_link_src = 0
failed_link_dst = 1
geo_center_node = 5
geo_hop_radius = 2
[protection_settings]
protection_mode = none # none, 1plus1
protection_switchover_ms = 50.0
restoration_latency_ms = 100.0
[offline_rl_settings]
policy_type = ksp_ff # ksp_ff, one_plus_one, bc, iql
device = cpu
fallback_policy = ksp_ff
[dataset_logging]
log_offline_dataset = false
dataset_output_path = datasets/offline_data.jsonl
```
🧪 Feature Testing
New Test Coverage:
Test Breakdown:
Test Configuration Used:
```ini
[general_settings]
max_iters = 5
num_requests = 2000
seed = 42
[topology_settings]
network = NSFNet
cores_per_link = 7
[failure_settings]
failure_type = link
failed_link_src = 0
failed_link_dst = 1
t_fail_arrival_index = -1
t_repair_after_arrivals = 1000
```
Manual Testing Steps:
📊 Performance Impact
Benchmarks:
Performance Test Results:
All modules are infrastructure-only and have minimal overhead when not actively used. K-path pre-computation is one-time cost at simulation start.
📚 Documentation Updates
Documentation Added/Updated:
Usage Examples:
```python
Failure injection
from fusion.modules.failures import FailureManager
manager = FailureManager(topology)
event = manager.inject_failure(
failure_type='link',
t_fail=10.0,
t_repair=20.0,
link_id=(0, 1)
)
Check path feasibility
path = [0, 1, 2, 3]
is_feasible = manager.is_path_feasible(path)
K-path cache with features
from fusion.modules.routing import KPathCache
cache = KPathCache(topology, k=4, weight='weight')
paths = cache.get_k_paths(src=0, dst=5)
features = cache.get_path_features(paths[0], network_spectrum, manager)
Multi-seed experiments
from fusion.sim.batch_runner import run_multi_seed_experiment
config = load_config('survivability_experiment.ini')
results = run_multi_seed_experiment(
config,
seed_list=[42, 43, 44, 45, 46],
output_dir='results/'
)
```
🔄 Backward Compatibility
Compatibility Impact:
All survivability features are disabled by default (failure_type = none, protection_mode = none). Existing simulations continue to work without modification.
🚀 Feature Checklist
Core Implementation:
Integration:
Quality Assurance:
🎉 Feature Demo
Before/After Comparison:
Before: FUSION could simulate optical networks but had no:
After: FUSION can now:
📝 Reviewer Notes
Focus Areas for Review:
Known Limitations:
Future Enhancements:
📋 Commit History
This PR follows the atomic commit strategy per 03-version-control.md:
```
|\
| * 26a4895 feat(survivability): add determinism and seed management
|/
|\
| * ce2f468 feat(survivability): add configuration system
|/
|\
| * 52a3d97 feat(survivability): add K-path cache
|/
|\
| * 617580e feat(survivability): add failures module
|/
```
Each module was developed in a sub-branch, committed atomically, and merged back to the phase2 branch.
🔍 Additional Context
Specification Documents:
Test Results:
All tests pass locally (68/68 passing). Some tests require numpy import which works correctly in the main environment.
Next Steps (out of scope for this PR):
🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com