In baselines/common/distributions.py, CategoricalPd.Sample seems have a bug.

I found that when calling the CategoricalPd.Sample(), the sampling results are very biased. After inspection, it is found that _self.logits_ should be _tf.log(self.logits)_.
According to this page: https://en.wikipedia.org/wiki/Categorical_distribution

```
def sample(self):
     u = tf.random_uniform(tf.shape(self.logits), dtype=self.logits.dtype)
     return tf.argmax(self.logits - tf.log(-tf.log(u)), axis=-1)
```
→
```
def sample(self):
     u = tf.random_uniform(tf.shape(self.logits), dtype=self.logits.dtype)
     return tf.argmax(tf.log(self.logits) - tf.log(-tf.log(u)), axis=-1)
```

I also did experiments to verify this result. After adding _tf.log_, the sampling data conforms to the given distribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In baselines/common/distributions.py, CategoricalPd.Sample seems have a bug. #1219

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

In baselines/common/distributions.py, CategoricalPd.Sample seems have a bug. #1219

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions