Merge pull request #1 from redd-rl/patch-1

redd-rl · web-flow · commit 9542781fe01f · 2025-09-01T16:26:02.000+02:00
Patch 1
diff --git a/README.md b/README.md
@@ -1,29 +1,34 @@
 # ExampleRocketLeagueBot
-Code from my tutorials: https://www.youtube.com/watch?v=_IbWTCQNsxE
-Join the rlgym server for help making rocket league bots, my username is RichardsWorld in the server.
+Code from my tutorials:
+[![How to make a Grand Champion Rocket League Bot](https://img.youtube.com/vi/_IbWTCQNsxE/0.jpg)](https://www.youtube.com/watch?v=_IbWTCQNsxE)
+Join the [RLGym](https://discord.gg/E6CDtwgP8F) server for help making Rocket League bots, my username is @`RichardsWorld` in the server.
+Do keep in mind a basic understanding of python and ML in general is expected.
+Cheaters are not welcome.
 
-Invite link to rlgym discord server: https://discord.gg/E6CDtwgP8F
-
-For those that have been following along on the journey, thank you, make sure to subscribe to my youtube channel, hopefully I will be doing more updated in the future, and check my youtube channel as i might be streaming my bot training for the rocket league bot championships in October/November
+For those that have been following along with my journey, thank you, make sure to subscribe to my [YouTube](https://www.youtube.com/channel/UChM_9g85SvPZoLlH65aXayQ) channel, hopefully I will be providing more updates in the future, and check my [YouTube](https://www.youtube.com/channel/UChM_9g85SvPZoLlH65aXayQ) channel as I might be streaming my bot training for the RLBot Championships coming up in November and October.
 
 # Installation
 Go watch my tutorial listed above, but here is a quick tutorial
-1. This tutorial only works on a windows pc
-2. This really isn't needed, but I prefer to use conda instead of terminal/command line, because it sorts packages better, and conda already comes with python(3.12 I think?), so you don't need to download python if you are using conda, navigate to the enviorments, and just left click on base, and choose open terminal, and you should be all set.: https://www.anaconda.com/download
-3. Make sure python is installed, install python versions from 3.10-3.13, download python from the official website: https://www.python.org/downloads/
-4. Install git: https://git-scm.com/downloads/win
-5. If you have nvidia gpu, install pytorch with cuda, if only cpu/other gpu(like AMD), then choose to download with cpu only, just run the command in the terminal: https://pytorch.org/get-started/locally/
-6. Run the command `pip install git+https://github.com/AechPro/rlgym-ppo` then press enter, it should download, then run the command(just copy this and paste this into the terminal) `pip install rlgym`. If you would like to use rlgym tools, then run the command `pip install rlgym-tools'.
-7. Install rocketsimvis by cloning it: https://github.com/ZealanL/RocketSimVis
-8. Install os via `pip install os` and keyboard via `pip install keyboard`.
-9. Open up the `example.py` file in this github to get started, you can just run example.py in the terminal, make sure to navigate to where your example.py file is in terminal after cloning it via the `cd` command.
+1. This tutorial only works for Windows.
+2. This really isn't needed, but I prefer to use conda instead of terminal/command line, because it sorts packages better, and conda already comes with Python 3.11, so you don't need to download python if you are using conda, navigate to the environments, and just left click on base, and choose open terminal, and you should be all set.: https://www.anaconda.com/download
+   Alternatively you can make use of the venv module to create a virtual environment, `python -m venv venv` to create one in the local directory. 
+4. Ensure [Python](https://www.python.org/downloads/) is installed, if it isn't then install any version of Python between 3.10 and 3.13.
+5. Ensure you have [git](https://git-scm.com/downloads/win) installed.
+6. If you have an NVIDIA GPU, install [PyTorch](https://pytorch.org/get-started/locally/) with CUDA support, if you don't have an NVIDIA GPU then go with with CPU only, follow the page instructions.
+7. Run the command `pip install git+https://github.com/AechPro/rlgym-ppo` then press enter, it should download, then run the command(just copy this and paste this into the terminal) `pip install rlgym`. 
+8. Install rocketsimvis by cloning it: https://github.com/ZealanL/RocketSimVis
+9. Run `pip install -r requirements.txt`
+10. Open up the `example.py` file in this repository to get started, you can just run example.py in the command line, make sure to navigate to where your example.py file with the command line after cloning it via the `cd` command.
 
-# Tips\Extra facts
+# Optional bonus stuff:
+You can install [RLGym-tools](https://github.com/RLGym/rlgym-tools), which provides extra reward functions, observation builders and action parsers, among other useful utilities, some of which are utilized in this guide.
 
-- If you are stuck, watch my tutorial!!!!!!! I cannot stress this enough, the tutorial will help you, I promise.
-- Do not leave the visualizer open, it will slow down training.
-- Set the critic/policy sizes to the same, and increase the sizes so that your pc is running it at around 8-12k sps
+# Tips\Extra facts
+- If you are stuck, watch my [tutorial](https://www.youtube.com/watch?v=_IbWTCQNsxE)!!!!!!! I cannot stress this enough, the [tutorial](https://www.youtube.com/watch?v=_IbWTCQNsxE) will help you, I promise.
+- Do not leave the visualizer open as rendering while learning will greatly slow down your training.
+- Play around with policy and critic layer sizes, if you don't know what you're doing then I'd advise you to keep the layer sizes the same, and see what the highest you can go is before you start to lose performance (measured overall by your total steps per second (also referred to as sps)
 - If you wanna start a new run, just change the `project name` variable to a new name, and it will automatically create a new run.
-- Do not change the policy and critic sizes during a run, it will change the architecture of the policy and the critic, only change it if you are starting a new run.
+- If you want to make changes to the policy or critic layer sizes you will be forced to reset your run entirely, as you're changing the architecture of the model.
+- Same goes for your action parser and observation builder, once you have those you _cannot_ change them unless you're willing to reset your run.
 - Subscribe to my channel :D
-- Join the rlgym discord for help(also mentioned above): https://discord.gg/E6CDtwgP8F
+- Join the [RLGym](https://discord.gg/E6CDtwgP8F) Discord server for help (also mentioned above)
diff --git a/examplebot.py b/examplebot.py
@@ -1,234 +1,9 @@
-from typing import List, Dict, Any
-
-from rlgym.api import RewardFunction, AgentID
-from rlgym.rocket_league.api import GameState
+from rlgym_tools.rocket_league.renderers.rocketsimvis_renderer import RocketSimVisRenderer
 import os
-import json
-import socket
-from typing import Dict, Any
-
 import numpy as np
-from rlgym.api import Renderer
-from rlgym.rocket_league.api import GameState, Car
 
-DEFAULT_UDP_IP = "127.0.0.1"
-DEFAULT_UDP_PORT = 9273  # Default RocketSimVis port
 project_name="ExampleBot"
 
-BUTTON_NAMES = ("throttle", "steer", "pitch", "yaw", "roll", "jump", "boost", "handbrake")
-
-
-class RocketSimVisRenderer(Renderer[GameState]):
-    """
-    A renderer that sends game state information to RocketSimVis.
-
-    This is just the client side, you need to run RocketSimVis to see the visualization.
-    Code is here: https://github.com/ZealanL/RocketSimVis
-    """
-    def __init__(self, udp_ip=DEFAULT_UDP_IP, udp_port=DEFAULT_UDP_PORT):
-        self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)  # UDP
-        self.udp_ip = udp_ip
-        self.udp_port = udp_port
-
-    @staticmethod
-    def write_physobj(physobj):
-        j = {
-            'pos': physobj.position.tolist(),
-            'forward': physobj.forward.tolist(),
-            'up': physobj.up.tolist(),
-            'vel': physobj.linear_velocity.tolist(),
-            'ang_vel': physobj.angular_velocity.tolist()
-        }
-
-        return j
-
-    @staticmethod
-    def write_car(car: Car, controls=None):
-        j = {
-            'team_num': int(car.team_num),
-            'phys': RocketSimVisRenderer.write_physobj(car.physics),
-            'boost_amount': car.boost_amount,
-            'on_ground': bool(car.on_ground),
-            "has_flipped_or_double_jumped": bool(car.has_flipped or car.has_double_jumped),
-            'is_demoed': bool(car.is_demoed),
-            'has_flip': bool(car.can_flip)
-        }
-
-        if controls is not None:
-            if isinstance(controls, np.ndarray):
-                controls = {
-                    k: float(v)
-                    for k, v in zip(BUTTON_NAMES, controls)
-                }
-            j['controls'] = controls
-
-        return j
-
-    def render(self, state: GameState, shared_info: Dict[str, Any]) -> Any:
-        if "controls" in shared_info:
-            controls = shared_info["controls"]
-        else:
-            controls = {}
-        j = {
-            'ball_phys': self.write_physobj(state.ball),
-            'cars': [
-                self.write_car(car, controls.get(agent_id))
-                for agent_id, car in state.cars.items()
-            ],
-            'boost_pad_states': (state.boost_pad_timers <= 0).tolist()
-        }
-
-        self.sock.sendto(json.dumps(j).encode('utf-8'), (self.udp_ip, self.udp_port))
-
-    def close(self):
-        pass
-
-from typing import List, Dict, Any
-from rlgym.api import RewardFunction, AgentID
-from rlgym.rocket_league.api import GameState
-from rlgym.rocket_league import common_values
-import numpy as np
-
-from typing import Any, Dict, List
-import numpy as np
-from rlgym.rocket_league.common_values import BALL_MAX_SPEED
-
-class AdvancedTouchReward(RewardFunction[AgentID, GameState, float]):
-    def __init__(self, touch_reward: float = 0.0, acceleration_reward: float = 1, use_touch_count: bool = False):
-        self.touch_reward = touch_reward
-        self.acceleration_reward = acceleration_reward
-        self.use_touch_count = use_touch_count
-
-        self.prev_ball = None
-
-    def reset(self, agents: List[AgentID], initial_state: GameState, shared_info: Dict[str, Any]) -> None:
-        self.prev_ball = initial_state.ball
-
-    def get_rewards(self, agents: List[AgentID], state: GameState, is_terminated: Dict[AgentID, bool],
-                    is_truncated: Dict[AgentID, bool], shared_info: Dict[str, Any]) -> Dict[AgentID, float]:
-        rewards = {agent: 0 for agent in agents}
-        ball = state.ball
-        for agent in agents:
-            touches = state.cars[agent].ball_touches
-
-            if touches > 0:
-                if not self.use_touch_count:
-                    touches = 1
-                acceleration = np.linalg.norm(ball.linear_velocity - self.prev_ball.linear_velocity) / BALL_MAX_SPEED
-                rewards[agent] += self.touch_reward * touches
-                rewards[agent] += acceleration * self.acceleration_reward
-
-        self.prev_ball = ball
-
-        return rewards
-
-class FaceBallReward(RewardFunction):
-    """Rewards the agent for facing the ball"""
-    def reset(self, agents: List[AgentID], initial_state: GameState, shared_info: Dict[str, Any]) -> None:
-        pass
-
-
-    def get_rewards(self, agents: List[AgentID], state: GameState, is_terminated: Dict[AgentID, bool],
-                    is_truncated: Dict[AgentID, bool], shared_info: Dict[str, Any]) -> Dict[AgentID, float]:
-        rewards = {}
-
-        for agent in agents:
-            car = state.cars[agent]
-            ball = state.ball
-
-            car_pos = car.physics.position
-            ball_pos = ball.position
-            direction_to_ball = ball_pos - car_pos
-            norm = np.linalg.norm(direction_to_ball)
-
-            if norm > 0:
-                direction_to_ball /= norm
-
-            car_forward = car.physics.forward
-            dot_product = np.dot(car_forward, direction_to_ball)
-
-            reward = dot_product  # Dot product directly indicates alignment (-1 to 1)
-            rewards[agent] = reward
-
-        return rewards
-                        
-class SpeedTowardBallReward(RewardFunction[AgentID, GameState, float]):
-    """Rewards the agent for moving quickly toward the ball"""
-    
-    def reset(self, agents: List[AgentID], initial_state: GameState, shared_info: Dict[str, Any]) -> None:
-        pass
-    
-    def get_rewards(self, agents: List[AgentID], state: GameState, is_terminated: Dict[AgentID, bool],
-                    is_truncated: Dict[AgentID, bool], shared_info: Dict[str, Any]) -> Dict[AgentID, float]:
-        rewards = {}
-        for agent in agents:
-            car = state.cars[agent]
-            car_physics = car.physics if car.is_orange else car.inverted_physics
-            ball_physics = state.ball if car.is_orange else state.inverted_ball
-            player_vel = car_physics.linear_velocity
-            pos_diff = (ball_physics.position - car_physics.position)
-            dist_to_ball = np.linalg.norm(pos_diff)
-            dir_to_ball = pos_diff / dist_to_ball
-
-            speed_toward_ball = np.dot(player_vel, dir_to_ball)
-
-            rewards[agent] = max(speed_toward_ball / common_values.CAR_MAX_SPEED, 0.0)
-        return rewards
-
-class InAirReward(RewardFunction[AgentID, GameState, float]):
-    """Rewards the agent for being in the air"""
-    
-    def reset(self, agents: List[AgentID], initial_state: GameState, shared_info: Dict[str, Any]) -> None:
-        pass
-    
-    def get_rewards(self, agents: List[AgentID], state: GameState, is_terminated: Dict[AgentID, bool],
-                    is_truncated: Dict[AgentID, bool], shared_info: Dict[str, Any]) -> Dict[AgentID, float]:
-        return {agent: float(not state.cars[agent].on_ground) for agent in agents}
-
-class VelocityBallToGoalReward(RewardFunction[AgentID, GameState, float]):
-    """Rewards the agent for hitting the ball toward the opponent's goal"""
-    
-    def reset(self, agents: List[AgentID], initial_state: GameState, shared_info: Dict[str, Any]) -> None:
-        pass
-    
-    def get_rewards(self, agents: List[AgentID], state: GameState, is_terminated: Dict[AgentID, bool],
-                    is_truncated: Dict[AgentID, bool], shared_info: Dict[str, Any]) -> Dict[AgentID, float]:
-        rewards = {}
-        for agent in agents:
-            car = state.cars[agent]
-            ball = state.ball
-            if car.is_orange:
-                goal_y = -common_values.BACK_NET_Y
-            else:
-                goal_y = common_values.BACK_NET_Y
-
-            ball_vel = ball.linear_velocity
-            pos_diff = np.array([0, goal_y, 0]) - ball.position
-            dist = np.linalg.norm(pos_diff)
-            dir_to_goal = pos_diff / dist
-            
-            vel_toward_goal = np.dot(ball_vel, dir_to_goal)
-            rewards[agent] = max(vel_toward_goal / common_values.BALL_MAX_SPEED, 0)
-        return rewards
-
-
-class TouchReward(RewardFunction[AgentID, GameState, float]):
-    """
-    A RewardFunction that gives a reward of 1 if the agent touches the ball, 0 otherwise.
-    """
-
-    def reset(self, agents: List[AgentID], initial_state: GameState, shared_info: Dict[str, Any]) -> None:
-        pass
-
-    def get_rewards(self, agents: List[AgentID], state: GameState, is_terminated: Dict[AgentID, bool],
-                    is_truncated: Dict[AgentID, bool], shared_info: Dict[str, Any]) -> Dict[AgentID, float]:
-        return {agent: self._get_reward(agent, state) for agent in agents}
-
-    def _get_reward(self, agent: AgentID, state: GameState) -> float:
-        return 1. if state.cars[agent].ball_touches > 0 else 0.
-
-
-
 def build_rlgym_v2_env():
     import numpy as np
     from rlgym.api import RLGym
@@ -258,9 +33,9 @@ def build_rlgym_v2_env():
 
     reward_fn = CombinedReward(
         (InAirReward(), 0.15),
-        (SpeedTowardBallReward(), 5),
-        (VelocityBallToGoalReward(), 10),
-        (TouchReward(), 50),
+        (SpeedTowardBallReward(), 5.0),
+        (VelocityBallToGoalReward(), 10.0),
+        (TouchReward(), 50.0),
         (SpeedTowardBallReward(), 5.0),
         (FaceBallReward(), 1.0),
         (VelocityBallToGoalReward(), 10.0),
@@ -315,27 +90,27 @@ def build_rlgym_v2_env():
     learner = Learner(build_rlgym_v2_env,
                       n_proc=n_proc,
                       min_inference_size=min_inference_size,
-                      metrics_logger=None, # Leave this empty for now.
+                      metrics_logger=None, # Leave this empty, if you provide something here this is what will give you concrete game information from training, depending on what you add.
                       ppo_batch_size=100_000,  # batch size - much higher than 300K doesn't seem to help most people
-                      policy_layer_sizes=[512, 512, 512],  # policy network
-                      critic_layer_sizes=[512, 512, 512],  # critic network
+                      policy_layer_sizes=[512, 512, 512],  # policy network layer sizes
+                      critic_layer_sizes=[512, 512, 512],  # critic network layer sizes
                       ts_per_iteration=100_000,  # timesteps per training iteration - set this equal to the batch size
                       exp_buffer_size=300_000,  # size of experience buffer - keep this 2 - 3x the batch size
                       ppo_minibatch_size=50_000,  # minibatch size - set this as high as your GPU can handle
                       ppo_ent_coef=0.01,
                       render=True,
-                      render_delay=0.047,
+                      render_delay=0, # to create this you should define constants TICK_SKIP and TICK_RATE (120) and create a fraction that tells you how long one step is in one second.
                       add_unix_timestamp=False,
                       checkpoint_load_folder=checkpoint_load_folder,
                       checkpoints_save_folder=checkpoint_folder,                      # entropy coefficient - this determines the impact of exploration
-                      policy_lr=2e-4, # policy learning rate
+                      policy_lr=2e-4, # policy learning rate, ensure this matches the critic learning rate.
                       device="auto", #device to use
-                      critic_lr=2e-4,  # critic learning rate
+                      critic_lr=2e-4,  # critic learning rate, keep the same as your policy's learning rate
                       ppo_epochs=2,   # number of PPO epochs
                       standardize_returns=True, # Don't touch these.
                       standardize_obs=False, # Don't touch these.
                       save_every_ts=10_000_000,  # save every 1M steps
                       timestep_limit=50_000_000_000,  # Train for 1B steps
-                      log_to_wandb=False # Set this to True if you want to use Weights & Biases for logging.
+                      log_to_wandb=False # Set this to True if you want to use Weights & Biases for logging, Weights & Biases is generally optimal and the most used option.
                       ) 
     learner.learn()