Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using RNN #292

Open
anirjoshi opened this issue Feb 27, 2024 · 4 comments
Open

Using RNN #292

anirjoshi opened this issue Feb 27, 2024 · 4 comments

Comments

@anirjoshi
Copy link

anirjoshi commented Feb 27, 2024

Is there any example that shows the use of RNN with RL?

@anirjoshi
Copy link
Author

anirjoshi commented Feb 27, 2024

In particular I have the following custom made gym environment and would like to use some RL based algorithm to solve it, is it possible? Some help in this regards would be helpful. Also note this this environment has variable size inputs.

class ModuloComputationEnv(gym.Env):
    """Environment in which an agent must learn to output mod 2,3,4 of the sum of
       seen observations.

    Observations are squences of integer numbers ,
    e.g. (1,3,4,5)

    The action space is just 3 values first for the sum of inputs till now %2, second %3 
    and third %4.

    Rewards are r=-abs(self.ac1-action[0]) - abs(self.ac2-action[1]) - abs(self.ac3-action[2]), 
    for all steps.
    """

    def __init__(self, config):
        
        #the input sequence can have any number from 0,99
        self.observation_space = Sequence(Discrete(100), seed=2)

        #the action is a vector of 3, [%2, %3, %4], of the sum of the input sequence
        self.action_space = MultiDiscrete([2,3,4])

        self.cur_obs = None
        
        #this variable maintains the episode_length
        self.episode_len = 0

        #this variable maintains %2
        self.ac1 = 0
        
        #this variable maintains %3
        self.ac2 = 0

        #this variable maintains %4
        self.ac3 = 0

    def reset(self, *, seed=None, options=None):
        """Resets the episode and returns the initial observation of the new one.
        """

        # Reset the episode len.
        self.episode_len = 0
        
        # Sample a random sequence from our observation space.
        self.cur_obs = self.observation_space.sample()

        #take the sum of the initial observation
        sum_obs = sum(self.cur_obs)

        #consider the %2, %3, and %4 of the initial observation
        self.ac1 = sum_obs%2
        self.ac2 = sum_obs%3
        self.ac3 = sum_obs%4

        # Return initial observation.
        return self.cur_obs, {}

    def step(self, action):
        """Takes a single step in the episode given `action`

        Returns:
            New observation, reward, done-flag, info-dict (empty).
        """
        # Set `truncated` flag after 10 steps.
        self.episode_len += 1
        truncated = False
        terminated = self.episode_len >= 10

        #the reward is the negative of further away from computing the individual values
        reward = abs(self.ac1-action[0]) + abs(self.ac2-action[1]) + abs(self.ac3-action[2])
        reward = -reward


        # Set a new observation (random sample).
        self.cur_obs = self.observation_space.sample()

        #recompute the %2, %3 and %4 values
        self.ac1 = (self.cur_obs+self.ac1)%2
        self.ac2 = (self.cur_obs+self.ac2)%3
        self.ac3 = (self.cur_obs+self.ac3)%4
        
        return self.cur_obs, reward, terminated, truncated, {}


@alex-petrenko
Copy link
Owner

Hey @anirjoshi !

RNN policies are first-class citizens in Sample Factory. In fact, with the default configuration you will train an RNN (GRU) policy.

See these parameter descriptions in cfg.py or here https://www.samplefactory.dev/02-configuration/cfg-params/:

[--use_rnn USE_RNN] [--rnn_size RNN_SIZE]
[--rnn_type {gru,lstm}]
[--rnn_num_layers RNN_NUM_LAYERS]

@anirjoshi
Copy link
Author

@alex-petrenko Thank you for your response! Is there any example that uses this? So, I can directly incorporate that example with my environment?

@alex-petrenko
Copy link
Owner

Hi @anirjoshi

literally any example would work since, again, this is a default configuration.

you can start by reading these tutorials: https://www.samplefactory.dev/03-customization/custom-environments/

https://samplefactory.dev/03-customization/custom-models/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants