Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Convergence for CUDNN RNN #104

Open
lissahyacinth opened this issue Jun 24, 2020 · 1 comment
Open

Slow Convergence for CUDNN RNN #104

lissahyacinth opened this issue Jun 24, 2020 · 1 comment
Assignees

Comments

@lissahyacinth
Copy link
Contributor

Network being used is LSTM -> Linear -> Sigmoid network using Mackey-Glass.

    net_cfg.add_layer(LayerConfig::new(
        // Layer name is only used internally - can be changed to anything
        "LSTMInitial",
        RnnConfig {
            hidden_size: 5,
            num_layers: 1,
            dropout_seed: 123,
            dropout_probability: 0.5,
            rnn_type: RnnNetworkMode::LSTM,
            input_mode: RnnInputMode::LinearInput,
            direction_mode: DirectionMode::UniDirectional,
        },
    ));
    net_cfg.add_layer(LayerConfig::new("linear1", LinearConfig { output_size: 1 }));
    net_cfg.add_layer(LayerConfig::new("sigmoid", LayerType::Sigmoid));

Using Batch Size 12 and LR 0.1, the network begins the converge successfully. During training it was observed that the output for the RNN -> Linear stage was similar in output predictions despite a difference in RNN outputs, which was believed to be an issue with the weight initialisation.

Ideally this network can be trained at a higher batch size across a single epoch, with a MSE of 0.05, as the function being approximated is fairly simple.

Current theories

  • SGD is not suitable to this problem - RMSProp may be
  • Weight initialisation is done incorrectly somewhere, or Glorot is unsuitable to the LSTM we're using
  • LSTM is improperly setup, and is causing an issue with performance.
@drahnr drahnr self-assigned this Jun 24, 2020
@drahnr
Copy link
Member

drahnr commented Oct 21, 2020

@lissahyacinth coud your attach the gist with the data you used? The links posted in chat unfortunately already expired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants