CraftGround - Reinforcement Learning Environment for Minecraft

RL experiments using lightweight minecraft environment

Environment

Utilizing protocol buffers, we've constructed a reinforcement learning environment specifically tailored for Minecraft. Below are detailed specifications of the environment's architecture:

Experiments

Please see code/experiments to see various settings for the experiments.

Tasks

Escape a husk in a superflat world using sound information
Escape three husks in a superflat world using sound information
Escape a warden in a superflat world using sound information
Escape a husk in a superflat world using visual information
Escape three husks in a superflat world using visual information
Fish in a normal world
Find a village in a normal world using flying
Fish using an enchanted fishing rod

Models

DQN, Double DQN, Dueling DQN
- CNN (stride = 2, kernel_size=5)
- Fully Connected Network (hidden_dim = 128)
Dueling DRQN
A2C
TODO: PPO, SAC, DDPG etc

Encoding

Vision: 3 channels, w, h rgb array
Audio: [Dx, Dz, [Dy]] * number of sounds, player hit sound, cos(yaw), sin(yaw)
Token: 0 to 1 float32 array, defined by the user

Example Model Figures (DQN)

Experiments

How to make an environment

First, there are some concepts you need to know to create a custom experiment.

Environment: The environment that the agent interacts with. You may choose one from environments. The environment need a name and params to be created.
Wrapper: The wrapper helps you to connect your models, algorithms to the environment. It provides custom rewards, or observation space conversion.
Algorithm: It runs the training and test loops, and log the results. As the algorithm is tightly coupled with the model, you may need to implement your own algorithm to use your custom model.

To create a custom experiment, you need to create a new file in code/experiments directory. The file name should be {your_experiment_name}.yml. Then, you need to specify the following specs. The specific specs may vary depending on the algorithm you use.

seed: null # The seed to generate the world. If null, the seed will be randomly generated.
env_path: null # The path to the MinecraftEnv project. If null, it will use the default path.
group: jsrl_fish # The name of the group of the experiment. The experiments will be grouped by this name in the wandb console.
record_video: true # Whether to record the video of the experiment.
device: null # The device to run the experiment. If null, it will use the default device.

env: # The environment to run the experiment in.
  name: # The name of the pre-defined environment. You can choose one from the list in environments.py
    params: # The parameters to create the environment. You can see the parameters in environments.py
      hud: true # Whether to show the HUD in the environment.
      verbose: false # Whether to show the verbose information in the environment.
      port: 8000 # Via which port the environment will communicate with this agent.
      render_action: true # Whether to include the agent's action in the video.
      size_x: 256 # The size of the screen in x axis.
      size_y: 256 # The size of the screen in y axis.

wrappers: # The wrappers that wraps the environment. It is an array of wrappers that will be applied in order.
  - name: 'ActionWrapper' # The convenience wrapper for the discrete actions.
    enabled_actions: # The specific actions that are enabled by this wrapper.
      - NO_OP # No operation action.
      - USE # Use action.

  - name: 'SoundWrapper' # The wrapper that adds sound-based features to the environment.
    coord_dim: 2 # The dimension of sound coordinates.
  - name: 'FishCodWrapper' # Additional wrapper for fishing reward.

algorithm: # Details of the algorithm to be used.
  name: "SoundJSRLDQNAlgorithm" # The specific name of the algorithm.
  params: # Parameters for the algorithm.
    num_episodes: 1000 # Total number of episodes to run.
    warmup_episodes: 10 # Episodes before starting the main training.
    steps_per_episode: 400 # Steps to be taken per episode.
    test_frequency: 10 # Frequency of tests during the episodes.
    solved_criterion: # Criterion to determine if the environment is solved.
      name: 'ScoreCriterion'
      params:
        min_episode: 100
        min_avg_score: 195
        min_test_score: 195
        min_avg_test_score: 195
  hidden_dim: 128 # Dimension of the hidden layers.
  epsilon_init: 1.0 # Initial value of epsilon for the epsilon-greedy policy.
  epsilon_decay: 0.99 # Decay rate of epsilon.
  epsilon_min: 0.01 # Minimum value of epsilon.
  update_frequency: 1000 # Frequency of updates.
  train_frequency: 1 # Training frequency.
  replay_buffer_size: 1000000 # Size of the replay buffer.
  batch_size: 256 # Batch size for training.
  gamma: 0.99 # Discount factor.
  learning_rate: 0.00001 # Learning rate for the optimizer.
  weight_decay: 0.00001 # Weight decay for regularization.
  tau: 1.0 # Target network update rate.
  guide_policy: # Policy that guides the agent.
    name: 'FishingGuide'
    params:
      min_episode: 100
  decrease_guide_step_threshold: 0.5 # Threshold for decreasing guide steps.

Environment list

See code/environments for the implementations of the environments.

Env name	Description
husk	Escaping from a single husk in a superflat world. The husk position is fixed.
husks	Escaping from multiple husks in a superflat world. The positions are fixed.
husk-noisy	Escaping from a husk, with many other animals.
husks-noisy	Escaping from husks, with many other animals
husk-darkness	Escaping from a husk, with darkness effect
husks-darkness	Escaping from husks, with darkness effect
find-animal	Searching for randomly arranged animals in a animal pen
husk-random	Escaping from a randomly positioned husk.
husks-random	Escaping from randomly positioned husks.
husks-random-darkness	Escaping from randomly positioned husks with darkness effect applied
husks-continuous	Husks are summoned nearby the player continuously
husk-random-terrain	Escape from a husk, in a normal terrain
husk-random-forest	Escape from a husk, in a forest
husk-hunt	Hunting a husk in a superflat world using a diamond sword.
mansion	Escaping from a mansion
skeleton-random	Escaping from a skeleton
find-village	Searching for a village
flat-night	Escaping from every threats in a superflat world at night
fishing	Fish a cod on a beach

Wrapper list

Wrapper Name	Description
CleanUpFastResetWrapper	A wrapper for fast environment resetting, every wrappers should inherit this.
action	Defines discrete action spaces and operations for the agent.
continuous_action	Enables the agent to take actions in a continuous action space.
fly_helper	Assists the agent in flying operations within the environment.
mineclip	Allows the agent to use MineCLIP reward.
surrounding_sound	Adds auditory feedback from the environment, indicating surrounding entities.
attack_kill	Enables the agent to execute attack and eliminate operations.
avoid_damage	Helps the agent in strategies to prevent or minimize damage.
bimodal	Provides vision and audio input for the agent.
find_animal	Assists the agent in locating specific animals in the environment.
find_village	Aids the agent in discovering villages within the environment.
fish_cod	Enables the agent to perform fishing operations.
go_up	Assists the agent in upward movement or climbing actions.
go_up_2	An extended version or variant of `go_up`, offering more functionalities.
jump_helper	Aids the agent in performing jumping actions correctly.
simple_navigation	Provides basic navigation functionalities for the agent.
simplest_navigation	A more streamlined version of `simple_navigation` with minimal functionalities.
sound	Provides sound-based feedback or actions for the agent.
survival	Enables survival strategies and behaviors for the agent.
terminate_on_death	Ends the episode or session upon the agent's death.
vision	Incorporates visual feedback or vision-based actions for the agent.
reward_token_change	Provides token-based rewards for the agent.
token_providers	Provides the tokens for the agent.
bimodal_token	Provides bimodal observation and the tokens for the agent.

Algorithm list

Algorithm Name	Description
a2c	Advantage Actor-Critic algorithm for policy and value function approximation.
bimodal_dqn	DQN variant designed for environments with bimodal observation space.
dqn	Deep Q-Network algorithm for Q-value approximation using deep neural networks.
epsilon_greedy	A simple exploration strategy using epsilon probability for random actions.
jsrl_dqn	Custom DQN variant base specifically tailored for JSRL(Jump Start Reinforcement Learning) environments.
sound_a2c	A2C algorithm with sound-based inputs or feedback.
sound_dqn	DQN variant that utilizes sound-based observations.
sound_drqn	Deep Recurrent Q-Network with sound inputs for environments with temporal dependencies.
sound_jsrl_dqn	Custom JSRL DQN variant leveraging sound-based observations.
vision_a2c	A2C algorithm with visual-based inputs or feedback.
vision_dqn	DQN variant that utilizes visual observations.
vision_drqn	Deep Recurrent Q-Network with visual inputs for environments with temporal dependencies.
bimodal_a2c	A2C variant designed for environments with bimodal observations.
bimodal_drqn	DRQN variant for environments with bimodal observations.
drqn	Deep Recurrent Q-Network for environments with temporal dependencies.

Models

Model Name	Description
dqn	Basic Deep Q-Network model for value function approximation.
dueling_bimodal_attention_dqn	Dueling DQN with attention mechanism for bimodal inputs.
dueling_bimodal_dqn	Dueling DQN architecture for environments with bimodal observations.
dueling_sound_dqn	Dueling DQN model that utilizes sound-based observations.
dueling_sound_drqn	Dueling Deep Recurrent Q-Network with sound inputs for environments with temporal dependencies.
dueling_vision_dqn	Dueling DQN model that utilizes visual observations.
dueling_vision_drqn	Dueling Deep Recurrent Q-Network with visual inputs for environments with temporal dependencies.
per	Prioritized Experience Replay mechanism to weigh experiences based on their TD-error.
ppo	Proximal Policy Optimization, a policy gradient method for reinforcement learning.
recurrent_replay_buffer	Replay buffer designed for recurrent models to store sequences of experiences.
replay_buffer	Basic replay buffer to store and sample experiences.
sound_a2c	Advantage Actor-Critic model tailored for sound-based observations.
vision_a2c	Advantage Actor-Critic model tailored for visual observations.
bimodal_a2c	A2C model designed for environments with bimodal observations.
bimodal_replay_buffer	Replay buffer tailored for environments with bimodal observations.
sumtree	Data structure for efficient computation in prioritized experience replay.
transition	Data structure or method for representing state transitions in the environment.
bimodal_token_replay_buffer	Replay buffer for bimodal observation with tokens.
dueling_bimodal_token_dqn	Dueling DQN model for bimodal observation with tokens.

Devaju font license

https://dejavu-fonts.github.io/License.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CraftGround - Reinforcement Learning Environment for Minecraft

Environment

Experiments

Tasks

Models

Encoding

Example Model Figures (DQN)

Experiments

How to make an environment

Environment list

Wrapper list

Algorithm list

Models

Devaju font license

Files

README.md

Latest commit

History

README.md

File metadata and controls

CraftGround - Reinforcement Learning Environment for Minecraft

Environment

Experiments

Tasks

Models

Encoding

Example Model Figures (DQN)

Experiments

How to make an environment

Environment list

Wrapper list

Algorithm list

Models

Devaju font license