We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Authors of the original paper use learning rate of 0.05 in their CNTK implementation. Was there a particular reason you used 0.025 instead?