Description
A list of PyTorch 1.7 features.
Items are checked if we have something more or less equivalent in Flux or in the julia ecosystem and supported by Flux.
This list is not complete, it comes from a rough scan of pytorch's documentation. Please feel free to add anything I missed in the comments, and whoever has write access to modify the list.
Related issue https://github.com/FluxML/ML-Coordination-Tracker/issues/16, and more generally anything in https://github.com/FluxML/ML-Coordination-Tracker/issues
Pytorch Features
Conv Layers
-
Conv1d
,Conv2d
,Conv3d
. -
ConvTranspose1d
,ConvTranspose2d
,ConvTranspose3d
. - groups in convolution layers
-
Fold
,Unfold
. In progress: Add fold and unfold NNlib.jl#444
Pooling Layers
-
MaxPool1d
,MaxPool2d
,MaxPool3d
-
MaxUnPool1d
,MaxUnPool2d
,MaxUnPool3d
-
AvgPool1d
,AvgPool2d
,AvgPool3d
-
FractionalMaxPool2d
-
LPPool1d
,LPPool2d
-
AdaptiveAvgPool1d
,AdaptiveAvgPool2d
,AdaptiveAvgPool3d
-
AdaptiveMaxPool1d
,AdaptiveMaxPool2d
,AdaptiveMaxPool3d
Padding Layers
- ReflectionPad (1d,2d)
- ReplicationPad (1d,2d,3d) ( NNlib.pad_repeat)
- ZeroPad (2d)
- ConstantPad (1d,2d,3d)
-
Add corresponding layers for all of the aboves wrapping the NNlin functionskeep as functions. Need to add them Flux's docs.
Activations
- ... . NNlib has an extensive collection of activation, plus we have any julia function.
Normalization Layers
-
BatchNorm1d
,BatchNorm2d
,BatchNorm3d
-
LayerNorm
-
GroupNorm
-
InstanceNorm1d
,InstanceNorm2d
,InstanceNorm3d
-
SyncBatchNorm
-
LocalResponseNorm
. Very old unfinished PR Local Response Normalization[W.I.P] #312. It is an outdated technique, probably we can live without it. - Move the functional implementations to NNlib.jl (BatchNorm and Dropout NNlib.jl#19)
Recurrent Layers
-
RNN
-
GRU
-
LSTM
Attention Layers
-
Transformer
. Well maintained implementations in Tansformers.jl. - MultiHeadAttention
Should be moved from Transformers.jl to Flux.jl(ensure hitting cudnn kernels). PR MultiHeadAttention implementation #2146
Linear Layers
-
Identity
-
Linear
-
Bilinear
Dropout Layers
-
Dropout
-
Dropout2d
,Dropout3d
(Make Dropout docstring clear w.r.t. N-D dropout #1490) -
AlphaDropout
Sparse Layers
-
Embedding
PR add Embedding layer #1516 -
EmbeddingBag
PR AddEmbeddingBag
#2031
Distance Functions
-
CosineSimilarity
. We have this in Distances.jl. Also easy to handcode. TODO check if AD and gpu friendly. -
PairwiseDistance
. We have this in Distances.jl TODO check if AD and gpu friendly (could use Tullio.jl to achieve both)
Loss Functions
- .... . We should be well covered here.
- CTCLoss. Being Implemented in Add CTC loss to new Losses module #1287 (todo: remove separate GPU case, integrate with cudnn)
Vision Layers
-
PixelShuffle
. add Upsample and PixelShuffle layers #1468 -
Upsample
(for 1d, 2d, and 3d). (partially done in add Upsample and PixelShuffle layers #1468)- 'nearest'
- 'linear' (cpu version merged in NNlib, CUDA PR still to come)
- 'bilinear'
- 'bicubic'
- 'trilinear' (cpu versino merged in NNlib, CUDA PR still open )
Initialization
-
xavier_uniform
,xavier_normal
. Calledglorot
here. -
kaiming_normal
kaiming_uniform
-
sparse
-
orthogonal
(Add Orthogonal initialization feature. #1496)
Parallelism and Distributed
-
DataParallel
-
DistributedDataParallel
(solved by https://github.com/DhairyaLGandhi/DaggerFlux.jl -
set_num_threads
,set_num_interop_threads
. Not sure which operations are parallelized in pytorch. Here we have parallelization only in blas operations.
Distributions
- diff rules for
logpdf
offered by DistributionsAD.jl -
rsample
. params's differentiability through sampling supported by many distr:gradient(mu -> rand(Normal(mu, 1)), 0) == (1,)
.
ONNX
- Current best support in ONNXmutable. See this discussion
FFT
- ... . Zygote has the adjoints for
AbstractFFT
s.
Quantization
- ...
Pruning
- WIP pruning package here
Optim
- schedulers Add simple Schedulers #1434 and Add basic scheduling policies and a scheduler #1506, also see ParameterSchedulers.jl
- Integrate with Flux's optimizers? (See RFC: basic sketch of scheduling Optimisers.jl#15)
- Document in Flux (see Add ParameterSchedulers.jl to docs #1511 and Update for latest ParameterSchedulers.jl release #1513)
- [ ] Reexport in Flux (see Add basic scheduling policies and a scheduler #1506)(TBD) -
LambdaLR
(handled in ParameterSchedulers.jl) -
MultiplicativeLR
(handled in ParameterSchedulers.jl)
- optimizers
- SGD (+ momentum)
- Adam
- AdaGrad
- AdaDelta
- RMSprop
- LBFGS. Integration with Optim.jl
LinAlg
-
det
-
norm
Tensorboard
- integration offered by TensorBoardLogger.jl
XLA
- Some work in XLA.jl
Misc
- Pytorch has both layers and their functional counterpart.
-
einsum
. AD and CUDA compatible Einstein summation given by Tullio.jl and other packages- add Documentation to Flux.jl
- LazyModuleMixin (pytorch 1.8) PR Add
@autosize
#2078 -
weight_norm
. Attempt in Added WeightNorm #1005 , PR AddWeightNorm
layer #2053 - modules iterator. define modules function #1444
-
spectral_norm
. Old attempt in Fixed the spectral normalization #115
Pytorch Extras
Torchvision
- datasets. Some are implemented in DLDatasets.jl (unreleased), some in FastAI.jl, some in MLDatasets.jl, many are missing.
- Will consolidate in MLDatasets.jl (see merge into MLDatasets? lorenzoh/DLDatasets.jl#1)
- models. Some are implemented in Metalhead.jl, but it is a bit stale and not comprehensive.
- Metalhead's PR should add a bunch of model and generally revive the repo
- We should expose the possibility to load pretrained weights
- io
- transforms. Some
unreleasedwork in DataAugmentation.jl
Torchaudio
...
Torchtext
...