Skip to content

Commit 996b58b

Browse files
committed
paper links
1 parent ff0d5c0 commit 996b58b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+92
-92
lines changed

labml_nn/capsule_networks/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
# Capsule Networks
1111
1212
This is a [PyTorch](https://pytorch.org) implementation/tutorial of
13-
[Dynamic Routing Between Capsules](https://arxiv.org/abs/1710.09829).
13+
[Dynamic Routing Between Capsules](https://papers.labml.ai/paper/1710.09829).
1414
1515
Capsule network is a neural network architecture that embeds features
1616
as capsules and routes them with a voting mechanism to next layer of capsules.

labml_nn/capsule_networks/mnist.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
This is an annotated PyTorch code to classify MNIST digits with PyTorch.
1010
1111
This paper implements the experiment described in paper
12-
[Dynamic Routing Between Capsules](https://arxiv.org/abs/1710.09829).
12+
[Dynamic Routing Between Capsules](https://papers.labml.ai/paper/1710.09829).
1313
"""
1414
from typing import Any
1515

labml_nn/capsule_networks/readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# [Capsule Networks](https://nn.labml.ai/capsule_networks/index.html)
22

33
This is a [PyTorch](https://pytorch.org) implementation/tutorial of
4-
[Dynamic Routing Between Capsules](https://arxiv.org/abs/1710.09829).
4+
[Dynamic Routing Between Capsules](https://papers.labml.ai/paper/1710.09829).
55

66
Capsule network is a neural network architecture that embeds features
77
as capsules and routes them with a voting mechanism to next layer of capsules.

labml_nn/gan/cycle_gan/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
# Cycle GAN
1010
1111
This is a [PyTorch](https://pytorch.org) implementation/tutorial of the paper
12-
[Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593).
12+
[Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://papers.labml.ai/paper/1703.10593).
1313
1414
I've taken pieces of code from [eriklindernoren/PyTorch-GAN](https://github.com/eriklindernoren/PyTorch-GAN).
1515
It is a very good resource if you want to checkout other GAN variations too.

labml_nn/gan/cycle_gan/readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# [Cycle GAN](https://nn.labml.ai/gan/cycle_gan/index.html)
22

33
This is a [PyTorch](https://pytorch.org) implementation/tutorial of the paper
4-
[Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593).
4+
[Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://papers.labml.ai/paper/1703.10593).

labml_nn/gan/dcgan/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
# Deep Convolutional Generative Adversarial Networks (DCGAN)
88
99
This is a [PyTorch](https://pytorch.org) implementation of paper
10-
[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/1511.06434).
10+
[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://papers.labml.ai/paper/1511.06434).
1111
1212
This implementation is based on the [PyTorch DCGAN Tutorial](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html).
1313
"""

labml_nn/gan/dcgan/readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# [Deep Convolutional Generative Adversarial Networks - DCGAN](https://nn.labml.ai/gan/dcgan/index.html)
22

33
This is a [PyTorch](https://pytorch.org) implementation of paper
4-
[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/1511.06434).
4+
[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://papers.labml.ai/paper/1511.06434).

labml_nn/gan/original/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
# Generative Adversarial Networks (GAN)
88
99
This is an implementation of
10-
[Generative Adversarial Networks](https://arxiv.org/abs/1406.2661).
10+
[Generative Adversarial Networks](https://papers.labml.ai/paper/1406.2661).
1111
1212
The generator, $G(\pmb{z}; \theta_g)$ generates samples that match the
1313
distribution of data, while the discriminator, $D(\pmb{x}; \theta_g)$

labml_nn/gan/original/readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# [Generative Adversarial Networks - GAN](https://nn.labml.ai/gan/original/index.html)
22

33
This is an annotated implementation of
4-
[Generative Adversarial Networks](https://arxiv.org/abs/1406.2661).
4+
[Generative Adversarial Networks](https://papers.labml.ai/paper/1406.2661).

labml_nn/gan/stylegan/__init__.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,12 @@
88
# StyleGAN 2
99
1010
This is a [PyTorch](https://pytorch.org) implementation of the paper
11-
[Analyzing and Improving the Image Quality of StyleGAN](https://arxiv.org/abs/1912.04958)
11+
[Analyzing and Improving the Image Quality of StyleGAN](https://papers.labml.ai/paper/1912.04958)
1212
which introduces **StyleGAN 2**.
1313
StyleGAN 2 is an improvement over **StyleGAN** from the paper
14-
[A Style-Based Generator Architecture for Generative Adversarial Networks](https://arxiv.org/abs/1812.04948).
14+
[A Style-Based Generator Architecture for Generative Adversarial Networks](https://papers.labml.ai/paper/1812.04948).
1515
And StyleGAN is based on **Progressive GAN** from the paper
16-
[Progressive Growing of GANs for Improved Quality, Stability, and Variation](https://arxiv.org/abs/1710.10196).
16+
[Progressive Growing of GANs for Improved Quality, Stability, and Variation](https://papers.labml.ai/paper/1710.10196).
1717
All three papers are from the same authors from [NVIDIA AI](https://twitter.com/NVIDIAAI).
1818
1919
*Our implementation is a minimalistic StyleGAN 2 model training code.
@@ -650,7 +650,7 @@ class DownSample(nn.Module):
650650
The down-sample operation [smoothens](#smooth) each feature channel and
651651
scale $2 \times$ using bilinear interpolation.
652652
This is based on the paper
653-
[Making Convolutional Networks Shift-Invariant Again](https://arxiv.org/abs/1904.11486).
653+
[Making Convolutional Networks Shift-Invariant Again](https://papers.labml.ai/paper/1904.11486).
654654
"""
655655

656656
def __init__(self):
@@ -672,7 +672,7 @@ class UpSample(nn.Module):
672672
673673
The up-sample operation scales the image up by $2 \times$ and [smoothens](#smooth) each feature channel.
674674
This is based on the paper
675-
[Making Convolutional Networks Shift-Invariant Again](https://arxiv.org/abs/1904.11486).
675+
[Making Convolutional Networks Shift-Invariant Again](https://papers.labml.ai/paper/1904.11486).
676676
"""
677677

678678
def __init__(self):
@@ -824,7 +824,7 @@ class GradientPenalty(nn.Module):
824824
## Gradient Penalty
825825
826826
This is the $R_1$ regularization penality from the paper
827-
[Which Training Methods for GANs do actually Converge?](https://arxiv.org/abs/1801.04406).
827+
[Which Training Methods for GANs do actually Converge?](https://papers.labml.ai/paper/1801.04406).
828828
829829
$$R_1(\psi) = \frac{\gamma}{2} \mathbb{E}_{p_\mathcal{D}(x)}
830830
\Big[\Vert \nabla_x D_\psi(x)^2 \Vert\Big]$$

labml_nn/gan/stylegan/readme.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# [StyleGAN 2](https://nn.labml.ai/gan/stylegan/index.html)
22

33
This is a [PyTorch](https://pytorch.org) implementation of the paper
4-
[Analyzing and Improving the Image Quality of StyleGAN](https://arxiv.org/abs/1912.04958)
4+
[Analyzing and Improving the Image Quality of StyleGAN](https://papers.labml.ai/paper/1912.04958)
55
which introduces **StyleGAN2**.
66
StyleGAN 2 is an improvement over **StyleGAN** from the paper
7-
[A Style-Based Generator Architecture for Generative Adversarial Networks](https://arxiv.org/abs/1812.04948).
7+
[A Style-Based Generator Architecture for Generative Adversarial Networks](https://papers.labml.ai/paper/1812.04948).
88
And StyleGAN is based on **Progressive GAN** from the paper
9-
[Progressive Growing of GANs for Improved Quality, Stability, and Variation](https://arxiv.org/abs/1710.10196).
9+
[Progressive Growing of GANs for Improved Quality, Stability, and Variation](https://papers.labml.ai/paper/1710.10196).
1010
All three papers are from the same authors from [NVIDIA AI](https://twitter.com/NVIDIAAI).

labml_nn/gan/wasserstein/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
# Wasserstein GAN (WGAN)
88
99
This is an implementation of
10-
[Wasserstein GAN](https://arxiv.org/abs/1701.07875).
10+
[Wasserstein GAN](https://papers.labml.ai/paper/1701.07875).
1111
1212
The original GAN loss is based on Jensen-Shannon (JS) divergence
1313
between the real distribution $\mathbb{P}_r$ and generated distribution $\mathbb{P}_g$.

labml_nn/gan/wasserstein/gradient_penalty/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
# Gradient Penalty for Wasserstein GAN (WGAN-GP)
1010
1111
This is an implementation of
12-
[Improved Training of Wasserstein GANs](https://arxiv.org/abs/1704.00028).
12+
[Improved Training of Wasserstein GANs](https://papers.labml.ai/paper/1704.00028).
1313
1414
[WGAN](../index.html) suggests clipping weights to enforce Lipschitz constraint
1515
on the discriminator network (critic).
@@ -19,7 +19,7 @@
1919
1. Limiting the capacity of the discriminator
2020
2. Exploding and vanishing gradients (without [Batch Normalization](../../../normalization/batch_norm/index.html)).
2121
22-
The paper [Improved Training of Wasserstein GANs](https://arxiv.org/abs/1704.00028)
22+
The paper [Improved Training of Wasserstein GANs](https://papers.labml.ai/paper/1704.00028)
2323
proposal a better way to improve Lipschitz constraint, a gradient penalty.
2424
2525
$$\mathcal{L}_{GP} = \lambda \underset{\hat{x} \sim \mathbb{P}_{\hat{x}}}{\mathbb{E}}

labml_nn/gan/wasserstein/gradient_penalty/readme.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# [Gradient Penalty for Wasserstein GAN (WGAN-GP)](https://nn.labml.ai/gan/wasserstein/gradient_penalty/index.html)
22

33
This is an implementation of
4-
[Improved Training of Wasserstein GANs](https://arxiv.org/abs/1704.00028).
4+
[Improved Training of Wasserstein GANs](https://papers.labml.ai/paper/1704.00028).
55

66
[WGAN](https://nn.labml.ai/gan/wasserstein/index.html) suggests
77
clipping weights to enforce Lipschitz constraint
@@ -12,5 +12,5 @@ L1, L2 weight decay have problems:
1212
1. Limiting the capacity of the discriminator
1313
2. Exploding and vanishing gradients (without [Batch Normalization](https://nn.labml.ai/normalization/batch_norm/index.html)).
1414

15-
The paper [Improved Training of Wasserstein GANs](https://arxiv.org/abs/1704.00028)
15+
The paper [Improved Training of Wasserstein GANs](https://papers.labml.ai/paper/1704.00028)
1616
proposal a better way to improve Lipschitz constraint, a gradient penalty.

labml_nn/gan/wasserstein/readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# [Wasserstein GAN - WGAN](https://nn.labml.ai/gan/wasserstein/index.html)
22

33
This is an implementation of
4-
[Wasserstein GAN](https://arxiv.org/abs/1701.07875).
4+
[Wasserstein GAN](https://papers.labml.ai/paper/1701.07875).

labml_nn/graphs/gat/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
# Graph Attention Networks (GAT)
99
1010
This is a [PyTorch](https://pytorch.org) implementation of the paper
11-
[Graph Attention Networks](https://arxiv.org/abs/1710.10903).
11+
[Graph Attention Networks](https://papers.labml.ai/paper/1710.10903).
1212
1313
GATs work on graph data.
1414
A graph consists of nodes and edges connecting nodes.

labml_nn/graphs/gat/readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# [Graph Attention Networks (GAT)](https://nn.labml.ai/graphs/gat/index.html)
22

33
This is a [PyTorch](https://pytorch.org) implementation of the paper
4-
[Graph Attention Networks](https://arxiv.org/abs/1710.10903).
4+
[Graph Attention Networks](https://papers.labml.ai/paper/1710.10903).
55

66
GATs work on graph data.
77
A graph consists of nodes and edges connecting nodes.

labml_nn/graphs/gatv2/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
---
77
# Graph Attention Networks v2 (GATv2)
88
This is a [PyTorch](https://pytorch.org) implementation of the GATv2 operator from the paper
9-
[How Attentive are Graph Attention Networks?](https://arxiv.org/abs/2105.14491).
9+
[How Attentive are Graph Attention Networks?](https://papers.labml.ai/paper/2105.14491).
1010
1111
GATv2s work on graph data similar to [GAT](../gat/index.html).
1212
A graph consists of nodes and edges connecting nodes.

labml_nn/graphs/gatv2/readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# [Graph Attention Networks v2 (GATv2)](https://nn.labml.ai/graphs/gatv2/index.html)
22

33
This is a [PyTorch](https://pytorch.org) implementation of the GATv2 operator from the paper
4-
[How Attentive are Graph Attention Networks?](https://arxiv.org/abs/2105.14491).
4+
[How Attentive are Graph Attention Networks?](https://papers.labml.ai/paper/2105.14491).
55

66
GATv2s work on graph data.
77
A graph consists of nodes and edges connecting nodes.

labml_nn/hypernetworks/hyper_lstm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
# HyperNetworks - HyperLSTM
88
99
We have implemented HyperLSTM introduced in paper
10-
[HyperNetworks](https://arxiv.org/abs/1609.09106), with annotations
10+
[HyperNetworks](https://papers.labml.ai/paper/1609.09106), with annotations
1111
using [PyTorch](https://pytorch.org).
1212
[This blog post](https://blog.otoro.net/2016/09/28/hyper-networks/)
1313
by David Ha gives a good explanation of HyperNetworks.

labml_nn/normalization/batch_channel_norm/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
# Batch-Channel Normalization
99
1010
This is a [PyTorch](https://pytorch.org) implementation of Batch-Channel Normalization from the paper
11-
[Micro-Batch Training with Batch-Channel Normalization and Weight Standardization](https://arxiv.org/abs/1903.10520).
11+
[Micro-Batch Training with Batch-Channel Normalization and Weight Standardization](https://papers.labml.ai/paper/1903.10520).
1212
We also have an [annotated implementation of Weight Standardization](../weight_standardization/index.html).
1313
1414
Batch-Channel Normalization performs batch normalization followed

labml_nn/normalization/batch_norm/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
# Batch Normalization
99
1010
This is a [PyTorch](https://pytorch.org) implementation of Batch Normalization from paper
11-
[Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167).
11+
[Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://papers.labml.ai/paper/1502.03167).
1212
1313
### Internal Covariate Shift
1414

labml_nn/normalization/batch_norm/readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# [Batch Normalization](https://nn.labml.ai/normalization/batch_norm/index.html)
22

33
This is a [PyTorch](https://pytorch.org) implementation of Batch Normalization from paper
4-
[Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167).
4+
[Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://papers.labml.ai/paper/1502.03167).
55

66
### Internal Covariate Shift
77

labml_nn/normalization/group_norm/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
# Group Normalization
99
1010
This is a [PyTorch](https://pytorch.org) implementation of
11-
the [Group Normalization](https://arxiv.org/abs/1803.08494) paper.
11+
the [Group Normalization](https://papers.labml.ai/paper/1803.08494) paper.
1212
1313
[Batch Normalization](../batch_norm/index.html) works well for large enough batch sizes
1414
but not well for small batch sizes, because it normalizes over the batch.

labml_nn/normalization/group_norm/readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# [Group Normalization](https://nn.labml.ai/normalization/group_norm/index.html)
22

33
This is a [PyTorch](https://pytorch.org) implementation of
4-
the [Group Normalization](https://arxiv.org/abs/1803.08494) paper.
4+
the [Group Normalization](https://papers.labml.ai/paper/1803.08494) paper.
55

66
[Batch Normalization](https://nn.labml.ai/normalization/batch_norm/index.html) works well for large enough batch sizes
77
but not well for small batch sizes, because it normalizes over the batch.

labml_nn/normalization/instance_norm/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
# Instance Normalization
99
1010
This is a [PyTorch](https://pytorch.org) implementation of
11-
[Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022).
11+
[Instance Normalization: The Missing Ingredient for Fast Stylization](https://papers.labml.ai/paper/1607.08022).
1212
1313
Instance normalization was introduced to improve [style transfer](https://paperswithcode.com/task/style-transfer).
1414
It is based on the observation that stylization should not depend on the contrast of the content image.

labml_nn/normalization/instance_norm/readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# [Instance Normalization](https://nn.labml.ai/normalization/instance_norm/index.html)
22

33
This is a [PyTorch](https://pytorch.org) implementation of
4-
[Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022).
4+
[Instance Normalization: The Missing Ingredient for Fast Stylization](https://papers.labml.ai/paper/1607.08022).
55

66
Instance normalization was introduced to improve [style transfer](https://paperswithcode.com/task/style-transfer).
77
It is based on the observation that stylization should not depend on the contrast of the content image.

labml_nn/normalization/layer_norm/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
# Layer Normalization
99
1010
This is a [PyTorch](https://pytorch.org) implementation of
11-
[Layer Normalization](https://arxiv.org/abs/1607.06450).
11+
[Layer Normalization](https://papers.labml.ai/paper/1607.06450).
1212
1313
### Limitations of [Batch Normalization](../batch_norm/index.html)
1414

labml_nn/normalization/layer_norm/readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# [Layer Normalization](https://nn.labml.ai/normalization/layer_norm/index.html)
22

33
This is a [PyTorch](https://pytorch.org) implementation of
4-
[Layer Normalization](https://arxiv.org/abs/1607.06450).
4+
[Layer Normalization](https://papers.labml.ai/paper/1607.06450).
55

66
### Limitations of [Batch Normalization](https://nn.labml.ai/normalization/batch_norm/index.html)
77

labml_nn/normalization/weight_standardization/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
# Weight Standardization
99
1010
This is a [PyTorch](https://pytorch.org) implementation of Weight Standardization from the paper
11-
[Micro-Batch Training with Batch-Channel Normalization and Weight Standardization](https://arxiv.org/abs/1903.10520).
11+
[Micro-Batch Training with Batch-Channel Normalization and Weight Standardization](https://papers.labml.ai/paper/1903.10520).
1212
We also have an [annotated implementation of Batch-Channel Normalization](../batch_channel_norm/index.html).
1313
1414
Batch normalization **gives a smooth loss landscape** and
@@ -36,7 +36,7 @@
3636
This avoids outputs of nodes from always falling beyond the active range of the activation function
3737
(e.g. always negative input for a ReLU).
3838
39-
*[Refer to the paper for proofs](https://arxiv.org/abs/1903.10520)*.
39+
*[Refer to the paper for proofs](https://papers.labml.ai/paper/1903.10520)*.
4040
4141
Here is [the training code](experiment.html) for training
4242
a VGG network that uses weight standardization to classify CIFAR-10 data.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# [Weight Standardization](https://nn.labml.ai/normalization/weight_standardization/index.html)
22

33
This is a [PyTorch](https://pytorch.org) implementation of Weight Standardization from the paper
4-
[Micro-Batch Training with Batch-Channel Normalization and Weight Standardization](https://arxiv.org/abs/1903.10520).
4+
[Micro-Batch Training with Batch-Channel Normalization and Weight Standardization](https://papers.labml.ai/paper/1903.10520).
55
We also have an
66
[annotated implementation of Batch-Channel Normalization](https://nn.labml.ai/normalization/batch_channel_norm/index.html).

labml_nn/optimizers/ada_belief.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
This is based from AdaBelief
1010
[official implementation](https://github.com/juntang-zhuang/Adabelief-Optimizer)
1111
of the paper
12-
[AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients](https://arxiv.org/abs/2010.07468).
12+
[AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients](https://papers.labml.ai/paper/2010.07468).
1313
1414
This is implemented in [PyTorch](https://pytorch.org) as an extension to [RAdam](radam.html).
1515

labml_nn/optimizers/adam.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
# Adam Optimizer
88
99
This is a [PyTorch](https://pytorch.org) implementation of popular optimizer *Adam* from paper
10-
[Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980v9).
10+
[Adam: A Method for Stochastic Optimization](https://papers.labml.ai/paper/1412.6980v9).
1111
1212
*Adam* update is,
1313

labml_nn/optimizers/amsgrad.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
# AMSGrad
88
99
This is a [PyTorch](https://pytorch.org) implementation of the paper
10-
[On the Convergence of Adam and Beyond](https://arxiv.org/abs/1904.09237).
10+
[On the Convergence of Adam and Beyond](https://papers.labml.ai/paper/1904.09237).
1111
1212
We implement this as an extension to our [Adam optimizer implementation](adam.html).
1313
The implementation it self is really small since it's very similar to Adam.

labml_nn/optimizers/noam.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
# Noam Optimizer
1010
1111
This is the [PyTorch](https://pytorch.org) implementation of optimizer introduced in the paper
12-
[Attention Is All You Need](https://arxiv.org/abs/1706.03762).
12+
[Attention Is All You Need](https://papers.labml.ai/paper/1706.03762).
1313
"""
1414
from typing import Dict
1515

labml_nn/optimizers/radam.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
This implementation is based on
1010
[the official implementation](https://github.com/LiyuanLucasLiu/RAdam)
1111
of the paper
12-
[On the Variance of the Adaptive Learning Rate and Beyond](https://arxiv.org/abs/1908.03265).
12+
[On the Variance of the Adaptive Learning Rate and Beyond](https://papers.labml.ai/paper/1908.03265).
1313
1414
We have implemented it in [PyTorch](https://pytorch.org)
1515
as an extension to [our AMSGrad implementation](amsgrad.html)

labml_nn/recurrent_highway_networks/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
77
# Recurrent Highway Networks
88
9-
This is a [PyTorch](https://pytorch.org) implementation of [Recurrent Highway Networks](https://arxiv.org/abs/1607.03474).
9+
This is a [PyTorch](https://pytorch.org) implementation of [Recurrent Highway Networks](https://papers.labml.ai/paper/1607.03474).
1010
"""
1111
from typing import Optional
1212

labml_nn/resnet/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
# Deep Residual Learning for Image Recognition (ResNet)
99
1010
This is a [PyTorch](https://pytorch.org) implementation of the paper
11-
[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385).
11+
[Deep Residual Learning for Image Recognition](https://papers.labml.ai/paper/1512.03385).
1212
1313
ResNets train layers as residual functions to overcome the
1414
*degradation problem*.

0 commit comments

Comments
 (0)