Actions as Operators on State Space
Generalized Reinforcement Learning (GRL) redefines the concept of "action" in reinforcement learning. Instead of treating actions as discrete indices or fixed-dimensional vectors, GRL models actions as parametric operators that transform the state space.
flowchart TB
subgraph TRL["🔵 Traditional RL"]
direction LR
S1["<b>State</b><br/>s"] --> P1["<b>Policy</b><br/>π"]
P1 --> A1["<b>Action Symbol</b><br/>a ∈ A"]
A1 --> NS1["<b>Next State</b><br/>s'"]
end
TRL --> GRL
subgraph GRL["✨ Generalized RL"]
direction LR
S2["<b>State</b><br/>s"] --> P2["<b>Policy</b><br/>π"]
P2 --> AP["<b>Operator Params</b><br/>θ"]
AP --> OP["<b>Operator</b><br/>Ô<sub>θ</sub>"]
OP --> ST["<b>State Transform</b><br/>s' = Ô<sub>θ</sub>(s)"]
end
style S1 fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style NS1 fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style A1 fill:#fff9c4,stroke:#f57c00,stroke-width:3px,color:#000
style P1 fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000
style S2 fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style ST fill:#c8e6c9,stroke:#388e3c,stroke-width:3px,color:#000
style AP fill:#fff59d,stroke:#fbc02d,stroke-width:3px,color:#000
style OP fill:#ffcc80,stroke:#f57c00,stroke-width:3px,color:#000
style P2 fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000
style TRL fill:#fafafa,stroke:#666,stroke-width:2px
style GRL fill:#fafafa,stroke:#666,stroke-width:2px
linkStyle 4 stroke:#666,stroke-width:2px
This formulation, inspired by the least-action principle in physics, leads to policies that are not only optimal but also physically grounded—preferring smooth, efficient transformations over abrupt changes.
Status: 🔄 In progress (9/10 chapters complete)
Particle-based belief representation, energy landscapes, and functional learning over augmented state-action space.
Start Learning → | Research Roadmap →
| Section | Chapters | Topics |
|---|---|---|
| Foundations | 0, 1, 2, 3 | Augmented space, particles, RKHS, energy |
| Field & Memory | 4, 4a, 5, 6, 6a | Functional fields, Riesz theorem, belief states, MemoryUpdate, advanced memory |
| Algorithms | 7 | RF-SARSA (next) |
| Interpretation | 8-10 | Soft transitions, POMDP, synthesis |
Status: 📋 Planned (after Part I)
Spectral discovery of hierarchical concepts through functional clustering in RKHS.
| Section | Chapters | Topics |
|---|---|---|
| Functional Clustering | 11 | Clustering in function space |
| Spectral Concepts | 12 | Concepts as eigenmodes |
| Hierarchical Control | 13 | Multi-level abstraction |
Based on: Section V of the original paper
Reading time: ~10 hours total (both parts)
Status: 🔬 Advanced topics (9 chapters complete)
Mathematical connections to quantum mechanics and novel probability formulations for ML.
| Theme | Chapters | Topics |
|---|---|---|
| Foundations | 01, 01a, 02 | RKHS-QM parallel, state vs. wavefunction, amplitude interpretation |
| Complex RKHS | 03 | Complex-valued kernels, interference, phase semantics |
| Projections | 04, 05, 06 | Action/state fields, concept subspaces, belief dynamics |
| Learning & Memory | 07, 08 | Beyond GP, memory dynamics, principled consolidation |
Novel Contributions:
- Amplitude-based RL: Complex-valued value functions with phase semantics
- MDL consolidation: Information-theoretic memory management
- Concept-based MoE: Hierarchical RL via subspace projections
| Aspect | Classical RL | GRL |
|---|---|---|
| Action | Discrete index or vector | Parametric operator |
| Action Space | Finite or bounded | Continuous manifold |
| Value Function | Reinforcement field |
|
| Experience | Replay buffer | Particle memory in RKHS |
| Policy | Learned function | Inferred from energy landscape |
| Uncertainty | External (dropout, ensembles) | Emergent from particle sparsity |
Key Insight: Traditional RL algorithms (Q-learning, DQN, PPO, SAC, RLHF for LLMs) are special cases of GRL!
When you:
- Discretize actions → GRL recovers Q-learning
- Use neural networks → GRL recovers DQN
- Apply Boltzmann policies → GRL recovers REINFORCE/Actor-Critic
- Fine-tune LLMs → GRL generalizes RLHF
See: Recovering Classical RL from GRL →
- Generalization: Subsumes existing methods as special cases
- Continuous actions: No discretization, full precision
- Smooth interpolation: Nearby parameters → similar behavior
- Compositional: Operators can be composed (operator algebra)
- Uncertainty: Sparse particles = high uncertainty (no ensembles needed)
- Interpretability: Energy landscapes, particle inspection
- Modern applications: Applies to RLHF, prompt optimization, neural architecture search
# Clone the repository
git clone https://github.com/pleiadian53/GRL.git
cd GRL
# Create environment with mamba/conda
mamba env create -f environment.yml
mamba activate grl
# Install in development mode
pip install -e .
# Verify installation (auto-detects CPU/GPU/MPS)
python scripts/verify_installation.pySee INSTALL.md for detailed instructions.
- Read the tutorial: Start with Chapter 0: Overview
- Explore concepts: Work through Chapter 1: Core Concepts
- Understand algorithms: See the algorithm chapters (coming soon)
- Implement: Follow the implementation guide
GRL/
├── src/grl/ # Core library
│ ├── core/ # Particle memory, kernels
│ ├── algorithms/ # MemoryUpdate, RF-SARSA
│ ├── envs/ # Environments
│ └── visualization/ # Plotting tools
├── docs/ # 📚 Public documentation
│ └── GRL0/ # Tutorial paper (Reinforcement Fields)
│ ├── tutorials/ # Tutorial chapters (6/10 complete)
│ ├── paper/ # Paper-ready sections
│ └── implementation/ # Implementation specs
├── notebooks/ # Jupyter notebooks
│ └── vector_field.ipynb # Vector field demonstrations
├── examples/ # Runnable examples
├── scripts/ # Utility scripts
├── tests/ # Unit tests
└── configs/ # Configuration files
Part I: Particle-Based Learning (6/10 chapters complete)
- Start Here — Overview
- Tutorials — Chapter-by-chapter learning
- Implementation — Technical specifications
Part II: Emergent Structure & Spectral Abstraction (Planned)
- Installation Guide — Detailed setup instructions
- Interactive Notebooks — Jupyter demos with visualizations (best viewed on Pages)
- View source — Raw notebooks in repository
Po-Hsiang Chiu, Manfred Huber
arXiv:2208.04822 (2022) — 37 pages, 15 figures
The foundational work introducing particle-based belief states, reinforcement fields, and concept-driven learning.
Reinforcement Fields Framework — Enhanced exposition with modern formalization
Part I: Particle-Based Learning
- Functional fields over augmented state-action space
- Particle memory as belief state in RKHS
- MemoryUpdate and RF-SARSA algorithms
- Emergent soft state transitions, POMDP interpretation
Status: 🔄 Tutorial in progress (6/10 chapters complete)
Part II: Emergent Structure & Spectral Abstraction
- Functional clustering (clustering functions, not points)
- Spectral methods on kernel matrices
- Concepts as coherent subspaces of the reinforcement field
- Hierarchical policy organization
Status: 📋 Planned (after Part I)
| Paper | Title | Status | Progress |
|---|---|---|---|
| Paper A | Generalized Reinforcement Learning — Actions as Operators | 🟢 Draft Complete | ~70% |
| Operator algebra, generalized Bellman equation, energy regularization | Complete draft, 3/7 figures, proofs outlined | ||
| Paper B | Operator Policies — Learning State-Space Operators with Neural Operator Networks (tentative) | ⏳ Planned | ~0% |
| Neural operators, scalable training, operator-actor-critic | After Paper A | ||
| Paper C | Applications of GRL to Physics, Robotics, and Differentiable Control (tentative) | ⏳ Planned | ~0% |
| Physics-based control, compositional behaviors, transfer learning | After Paper B |
Timeline:
- Paper A: Target submission April 2026 (NeurIPS/ICML)
- Paper B: Target submission June 2026 (ICML/NeurIPS)
- Paper C: Target submission July 2026 (CoRL)
See: Research Roadmap for detailed timeline and additional research directions.
flowchart LR
A["🌍 <b>State</b><br/>s"] --> B["💾 <b>Query</b><br/>Memory Ω"]
B --> C["📊 <b>Compute</b><br/>Field Q⁺"]
C --> D["🎯 <b>Infer</b><br/>Action θ"]
D --> E["⚡ <b>Execute</b><br/>Operator"]
E --> F["👁️ <b>Observe</b><br/>s', r"]
F --> G["✨ <b>Create</b><br/>Particle"]
G --> H["🔄 <b>Update</b><br/>Memory"]
H -->|Loop| B
style A fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style B fill:#fff9c4,stroke:#f57c00,stroke-width:3px,color:#000
style C fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000
style D fill:#fff59d,stroke:#fbc02d,stroke-width:3px,color:#000
style E fill:#ffcc80,stroke:#f57c00,stroke-width:3px,color:#000
style F fill:#c8e6c9,stroke:#388e3c,stroke-width:3px,color:#000
style G fill:#f8bbd0,stroke:#c2185b,stroke-width:3px,color:#000
style H fill:#b2dfdb,stroke:#00796b,stroke-width:3px,color:#000
from grl.core import ParticleMemory
from grl.core import RBFKernel
from grl.algorithms import MemoryUpdate, RFSarsa
# Create particle memory (the agent's belief state)
memory = ParticleMemory()
# Define similarity kernel
kernel = RBFKernel(lengthscale=1.0)
# Learning loop
for episode in range(num_episodes):
state = env.reset()
for step in range(max_steps):
# Infer action from particle memory
action = infer_action(memory, state, kernel)
# Execute and observe
next_state, reward, done = env.step(action)
# Update particle memory (belief transition)
memory = memory_update(memory, state, action, reward, kernel)
state = next_stateThe foundational work is available on arXiv:
Chiu, P.-H., & Huber, M. (2022). Generalized Reinforcement Learning: Experience Particles, Action Operator, Reinforcement Field, Memory Association, and Decision Concepts. arXiv:2208.04822.
@article{chiu2022generalized,
title={Generalized Reinforcement Learning: Experience Particles, Action Operator,
Reinforcement Field, Memory Association, and Decision Concepts},
author={Chiu, Po-Hsiang and Huber, Manfred},
journal={arXiv preprint arXiv:2208.04822},
year={2022},
url={https://arxiv.org/abs/2208.04822}
}The tutorial series provides enhanced exposition and modern formalization:
Part I: Particle-Based Learning (In progress)
@article{chiu2026part1,
title={Reinforcement Fields: Particle-Based Learning},
author={Chiu, Po-Hsiang and Huber, Manfred},
journal={In preparation},
year={2026}
}Part II: Emergent Structure & Spectral Abstraction (Planned)
@article{chiu2026part2,
title={Reinforcement Fields: Emergent Structure and Spectral Abstraction},
author={Chiu, Po-Hsiang and Huber, Manfred},
journal={In preparation},
year={2026}
}@article{chiu2026operators,
title={Generalized Reinforcement Learning — Actions as Operators},
author={Chiu, Po-Hsiang},
journal={In preparation},
year={2026+}
}This project is licensed under the MIT License - see the LICENSE file for details.
GRL (Generalized Reinforcement Learning) is a family of methods that rethink how actions are represented and learned.
Original paper: arXiv:2208.04822 (Chiu & Huber, 2022)
Two-Part Tutorial Series:
Part I: Particle-Based Learning
- Actions as continuous parameters in augmented state-action space
- Particle memory as belief state, kernel-induced value functions
- Learning through energy landscape navigation
Part II: Emergent Structure & Spectral Abstraction
- Concepts emerge from functional clustering in RKHS
- Spectral methods discover hierarchical structure
- Multi-level policy organization
Key Innovation: Learning emerges from particle dynamics in function space, not explicit policy optimization.
Core Idea: Actions as parametric operators that transform state space, with operator algebra providing compositional structure.
Key Innovation: Operator manifolds replace fixed action spaces, enabling compositional behaviors and physical interpretability.
Core Framework:
- Formulated in Reproducing Kernel Hilbert Spaces (RKHS) — the functional framework for particle-based belief states
- Kernel methods define the geometry and similarity structure of augmented state-action space
- Inspired by the least-action principle in classical mechanics
Quantum-Inspired Probability:
- Probability amplitudes instead of direct probabilities — RKHS inner products as amplitude overlaps
- Complex-valued RKHS enabling interference effects and phase semantics for temporal/contextual dynamics
- Wave function analogy — The reinforcement field as a superposition of particle basis states
- This formulation is novel to mainstream ML and opens new directions for probabilistic reasoning
See: Quantum-Inspired Extensions for technical details.
- Energy-based models (EBMs) — Control as energy landscape navigation
- POMDPs and belief-based control — Particle ensembles as implicit belief states
- Score-based methods — Energy gradients guide policy inference
- Gaussian process regression can model scalar energy fields (but is not essential to the framework)
- Neural operators for learning parametric action transformations
- Diffusion models share the gradient-field perspective