This repository contains the full implementation of the framework presented in:
"Optimal Solution for Entanglement Rate and Fidelity Maximization under Requests Expiration Constraint for Free-Space Quantum Networks" Muhammad Tauseef Mushtaq, Vito Guida, Nicola Cordeschi Department of Electrical and Information Engineering, Politecnico di Bari
We address the problem of scheduling entanglement requests in Low Earth Orbit (LEO) satellite quantum networks using a physics-grounded Markov Decision Process (MDP) solved with Proximal Policy Optimization (PPO) and dynamic action masking. The environment integrates a complete physical-layer model containing free-space diffraction, atmospheric attenuation, iterative entanglement purification, probabilistic swapping for end-to-end connection, and time-decaying fidelity decoherence. The simulator also obtains Starlink orbital ephemeris data and tests and validates the optimal solutions under the available satellites constraint.
Against a fidelity-greedy deterministic baseline, the trained PPO agent achieves:
- ↑ 84% higher throughput (served requests per second) at Time-To-Live (TTL) = 1 s
- ↓ 4× fewer expired requests at TTL = 1 s
- Stable 110–122 req/s across all TTL regimes (1 s – 5 s)
- > 99.9% success rate and ~94% average fidelity in all configurations
The Greedy baseline degrades as TTL increases(queue congestion) while the PPO exploits longer coherence windows, demonstrating that AI-driven scheduling is a necessary complement to hardware improvements in future quantum network infrastructure.
| Symbol | Description |
|---|---|
| Requests that completed all 3 entangled pairs at timestep |
|
| Total requests in queue (fixed at 3) | |
| Average post-purification fidelity of links established at |
|
| Requests dropped at |
|
| Normalized weights for throughput, fidelity, and expiry penalty | |
| Scheduling policy (satellite selection logic) | |
| TTL | Time-To-Live coherence limit (seconds) |
The scheduler maximizes cumulative entanglement throughput, quantum link quality, and TTL compliance over an episode
The term inside the summation is used directly as the instantaneous reward
State
Action
Reward
The micro-step architecture freezes the physical clock while the three concurrent requests are evaluated sequentially, reducing the action space from
PPO maintains stable throughput across all TTL regimes while the Greedy baseline degrades as TTL increases due to queue clogging.
Figure: Throughput comparison across TTL = 1–5 s.
Figure: Expired requests comparison across TTL = 1–5 s.
All four PPO weight configurations achieve ≥ 99.9% success rate and ~94% average fidelity. The Greedy baseline reaches 97.5% success rate and 93.7% average fidelity at the same TTL.
| Agent | Throughput | Req Generated | Req Served | Expired | |||
|---|---|---|---|---|---|---|---|
| PPO | 0.45 | 0.30 | 0.25 | 93 | 67,875 | 67,491 | 384 |
| PPO | 0.30 | 0.45 | 0.25 | 120 | 86,888 | 86,705 | 183 |
| PPO | 0.45 | 0.25 | 0.30 | 111 | 80,186 | 79,909 | 277 |
| PPO | 0.25 | 0.30 | 0.45 | 114 | 82,645 | 82,420 | 225 |
| Greedy | — | — | — | 65 | 46,920 | 45,757 | 1,163 |
Best PPO configuration (bold) delivers 84% higher throughput and 6.3× fewer expired requests than Greedy at TTL = 1 s.