DDPG bug: layer norm not really applied when initializing the critic (Q) network

In the DDPG implementation, in models.py, note that the `**network_kwargs` in 
`self.network_builder = get_network_builder(network)(**network_kwargs)`
does not contain `layer_norm=True/False`. Thus, when the critic uses this network builder to build mlp, layer norm is not applied. This causes the model to fail on many environments such as HalfCheetah.

Variable names of the critic in the original code:
critic/mlp_fc0/w:0
critic/mlp_fc0/b:0
critic/mlp_fc1/w:0
critic/mlp_fc1/b:0
critic/output/kernel:0
critic/output/bias:0

Variable names of the critic should be:
critic/mlp_fc0/w:0
critic/mlp_fc0/b:0
critic/LayerNorm/beta:0
critic/LayerNorm/gamma:0
critic/mlp_fc1/w:0
critic/mlp_fc1/b:0
critic/LayerNorm_1/beta:0
critic/LayerNorm_1/gamma:0
critic/output/kernel:0
critic/output/bias:0




However, even after fixing this, DDPG still runs poorly on HalfCheetah after 2M time steps (reward is less than 1000). It should reach a reward of ~3000+ according to many papers. It is possible that there are other bugs. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DDPG bug: layer norm not really applied when initializing the critic (Q) network #913

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DDPG bug: layer norm not really applied when initializing the critic (Q) network #913

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions