Skip to content

DDPG bug: layer norm not really applied when initializing the critic (Q) network #913

Open
@xuanlinli17

Description

@xuanlinli17

In the DDPG implementation, in models.py, note that the **network_kwargs in
self.network_builder = get_network_builder(network)(**network_kwargs)
does not contain layer_norm=True/False. Thus, when the critic uses this network builder to build mlp, layer norm is not applied. This causes the model to fail on many environments such as HalfCheetah.

Variable names of the critic in the original code:
critic/mlp_fc0/w:0
critic/mlp_fc0/b:0
critic/mlp_fc1/w:0
critic/mlp_fc1/b:0
critic/output/kernel:0
critic/output/bias:0

Variable names of the critic should be:
critic/mlp_fc0/w:0
critic/mlp_fc0/b:0
critic/LayerNorm/beta:0
critic/LayerNorm/gamma:0
critic/mlp_fc1/w:0
critic/mlp_fc1/b:0
critic/LayerNorm_1/beta:0
critic/LayerNorm_1/gamma:0
critic/output/kernel:0
critic/output/bias:0

However, even after fixing this, DDPG still runs poorly on HalfCheetah after 2M time steps (reward is less than 1000). It should reach a reward of ~3000+ according to many papers. It is possible that there are other bugs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions