Forger is a Reinforcement Learning (RL) library in Rust, offering a robust and efficient framework for implementing RL algorithms. It features a modular design with components for agents, environments, policies, and utilities, facilitating easy experimentation and development of RL models.
- Modular Components: Includes agents, environments, and policies as separate modules.
- Efficient and Safe: Built in Rust, ensuring high performance and safety.
- Customizable Environments: Provides a framework to create and manage different RL environments.
- Flexible Agent Implementations: Supports various agent strategies and learning algorithms.
- Extensible Policy Framework: Allows for the implementation of diverse action selection policies.
-
Policy (
policy
):- Defines the interface for action selection policies.
- Includes an implementation of Epsilon Greedy (with Decay) Policy.
-
Agent (
agent
):- Outlines the structure for RL agents.
- Implements Value Iteration - Every Visit Monte Carlo (
VEveryVisitMC
) and Q-Learning - Every Visit Monte Carlo (QEveryVisitMC
).
-
Environment (
env
):- Provides the
Env
trait to define RL environments. - Contains
LineWorld
, a simple linear world environment for experimentation.
- Provides the
-
Prelude (
prelude
):- Exports commonly used items from the
env
,agent
, andpolicy
modules for convenient access.
- Exports commonly used items from the
- Rust Programming Environment
In your project directory, run the following command:
cargo add forger
use forger::prelude::*;
use forger::env::lineworld::{LineWorld, LineWorldAction};
pub type S = usize; // State
pub type A = LineWorldAction; // Action
pub type P = EGreedyPolicy<A>; // Policy
pub type E = LineWorld; // Environment
fn main() {
let env = LineWorld::new(
5, // number of states
1, // initial state
4, // goal state
vec![0] // terminal states
);
let mut agent = QEveryVisitMC::<S, A, P, E>::new(0.9); // Q-learning (Everyvisit MC, gamma = 0.9)
let mut policy = EGreedyPolicy::new(0.5, 0.95); // Epsilon Greedy Policy (epsilon = 0.5, decay = 0.95)
for _ in 0 .. 200 {
let mut episode = vec![];
let mut state = env.get_init_state();
loop {
let action = agent.select_action(&state, &mut policy, &env);
let (next_state, reward) = env.transition(&state, &action);
episode.push((state, action.unwrap(), reward));
match next_state {
Some(s) => state = s,
None => break,
}
}
agent.update(&episode);
policy.decay_epsilon();
}
}
-
Monte Carlo with Epsilon Decay in
LineWorld
:- Demonstrates the use of the Q-Learning Every Visit Monte Carlo (
QEveryVisitMC
) agent with an Epsilon Greedy Policy (with decay) in theLineWorld
environment. - Illustrates the process of running multiple episodes, selecting actions, updating the agent, and decaying the epsilon value over time.
- Updates the agent after each episode.
- Demonstrates the use of the Q-Learning Every Visit Monte Carlo (
-
TD0 with Epsilon Decay in
GridWorld
:- Demonstrates the use of the TD0 (
TD0
) agent with an Epsilon Greedy Policy (with decay) in theGridWorld
environment. - Illustrates the process of running multiple episodes, selecting actions, updating the agent, and decaying the epsilon value over time.
- Updates the agent every steps in each episode.
- Include test process of trained agent.
- Demonstrates the use of the TD0 (
Contributions to Forger are welcome! If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.
Forger is licensed under the MIT License or the Apache 2.0 License.