Using RNN #292

anirjoshi · 2024-02-27T06:55:16Z

Is there any example that shows the use of RNN with RL?

anirjoshi · 2024-02-27T07:10:53Z

In particular I have the following custom made gym environment and would like to use some RL based algorithm to solve it, is it possible? Some help in this regards would be helpful. Also note this this environment has variable size inputs.

class ModuloComputationEnv(gym.Env):
    """Environment in which an agent must learn to output mod 2,3,4 of the sum of
       seen observations.

    Observations are squences of integer numbers ,
    e.g. (1,3,4,5)

    The action space is just 3 values first for the sum of inputs till now %2, second %3 
    and third %4.

    Rewards are r=-abs(self.ac1-action[0]) - abs(self.ac2-action[1]) - abs(self.ac3-action[2]), 
    for all steps.
    """

    def __init__(self, config):
        
        #the input sequence can have any number from 0,99
        self.observation_space = Sequence(Discrete(100), seed=2)

        #the action is a vector of 3, [%2, %3, %4], of the sum of the input sequence
        self.action_space = MultiDiscrete([2,3,4])

        self.cur_obs = None
        
        #this variable maintains the episode_length
        self.episode_len = 0

        #this variable maintains %2
        self.ac1 = 0
        
        #this variable maintains %3
        self.ac2 = 0

        #this variable maintains %4
        self.ac3 = 0

    def reset(self, *, seed=None, options=None):
        """Resets the episode and returns the initial observation of the new one.
        """

        # Reset the episode len.
        self.episode_len = 0
        
        # Sample a random sequence from our observation space.
        self.cur_obs = self.observation_space.sample()

        #take the sum of the initial observation
        sum_obs = sum(self.cur_obs)

        #consider the %2, %3, and %4 of the initial observation
        self.ac1 = sum_obs%2
        self.ac2 = sum_obs%3
        self.ac3 = sum_obs%4

        # Return initial observation.
        return self.cur_obs, {}

    def step(self, action):
        """Takes a single step in the episode given `action`

        Returns:
            New observation, reward, done-flag, info-dict (empty).
        """
        # Set `truncated` flag after 10 steps.
        self.episode_len += 1
        truncated = False
        terminated = self.episode_len >= 10

        #the reward is the negative of further away from computing the individual values
        reward = abs(self.ac1-action[0]) + abs(self.ac2-action[1]) + abs(self.ac3-action[2])
        reward = -reward


        # Set a new observation (random sample).
        self.cur_obs = self.observation_space.sample()

        #recompute the %2, %3 and %4 values
        self.ac1 = (self.cur_obs+self.ac1)%2
        self.ac2 = (self.cur_obs+self.ac2)%3
        self.ac3 = (self.cur_obs+self.ac3)%4
        
        return self.cur_obs, reward, terminated, truncated, {}

alex-petrenko · 2024-03-03T05:55:48Z

Hey @anirjoshi !

RNN policies are first-class citizens in Sample Factory. In fact, with the default configuration you will train an RNN (GRU) policy.

See these parameter descriptions in cfg.py or here https://www.samplefactory.dev/02-configuration/cfg-params/:

[--use_rnn USE_RNN] [--rnn_size RNN_SIZE]
[--rnn_type {gru,lstm}]
[--rnn_num_layers RNN_NUM_LAYERS]

anirjoshi · 2024-03-03T12:47:32Z

@alex-petrenko Thank you for your response! Is there any example that uses this? So, I can directly incorporate that example with my environment?

alex-petrenko · 2024-03-04T06:25:02Z

Hi @anirjoshi

literally any example would work since, again, this is a default configuration.

you can start by reading these tutorials: https://www.samplefactory.dev/03-customization/custom-environments/

https://samplefactory.dev/03-customization/custom-models/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using RNN #292

Using RNN #292

anirjoshi commented Feb 27, 2024 •

edited

Loading

anirjoshi commented Feb 27, 2024 •

edited

Loading

alex-petrenko commented Mar 3, 2024

anirjoshi commented Mar 3, 2024

alex-petrenko commented Mar 4, 2024

Using RNN #292

Using RNN #292

Comments

anirjoshi commented Feb 27, 2024 • edited Loading

anirjoshi commented Feb 27, 2024 • edited Loading

alex-petrenko commented Mar 3, 2024

anirjoshi commented Mar 3, 2024

alex-petrenko commented Mar 4, 2024

anirjoshi commented Feb 27, 2024 •

edited

Loading

anirjoshi commented Feb 27, 2024 •

edited

Loading