Release v0.11.0 - Cuda support, mixed const/runtime tensors, and device rewrite · coreylowman/dfdx

What's Changed

AddInto by @Dimev in #256
added 5d & 6d tensors by @M1ngXU in #283
Remove phantom by @M1ngXU in #282
remove tensor bound by @Dimev in #297
Adding nightly to cargo-test by @JYudelson1 in #294
Devices/Dyn dimensions refactor by @coreylowman in #304
Add instructions for running the mnist example. by @infalmo in #310
Removes Dyn. Use usize directly by @coreylowman in #315
Making f32 default dtype for Tensor, updating examples/docstrings by @coreylowman in #316
Only running gha on push by @coreylowman in #317
Adding Unit and HasUnitType. Reducing bounds for Dtype by @coreylowman in #313
Removing build_test_device. Using TestDevice everywhere by @coreylowman in #324
Adding SampleTensor, Removing RandTensor/RandnTensor by @coreylowman in #327
Removing usages of tensor aliases by @coreylowman in #328
Moving intel-mkl stuff into sub module in build.rs by @coreylowman in #329
Adding Cuda device and skeleton cuda kernel impls by @coreylowman in #322
Implementing abs/exp/div/sum_to cuda kernels by @coreylowman in #331
permute_to and broadcast_to cuda kernels by @coreylowman in #343
Add cuda implementations for unary and binary tensor operations in #341 and #334 by @nkoppel in #346
Using atomicAdd in binary op backwards to properly handle strides by @coreylowman in #350
Resolve #352 and #347 by @nkoppel in #354
Implement reshape cuda kernel (resolves #336) by @nkoppel in #356
Add missing device generic in transformer test by @ViliamVadocz in #358
Add select and gather cuda kernels. by @nkoppel in #359
Upgrade to cudarc 0.6.0 by @coreylowman in #361
Add tests for binary broadcasted add and fix bugs to allow them to pass. by @nkoppel in #357
run GHA on pull_request by @coreylowman in #364
matmul cuda kernels by @coreylowman in #342
Adding dynamic example. by @Narsil in #368
Add cuda kernels for min_to/max_to by @coreylowman in #370
Adding dropout cuda kernel by @coreylowman in #372
Adding ConstDim and ConstShape for tensor creation by @coreylowman in #373
Fixing computation of lda/ldb/ldc with cblas by @coreylowman in #375
Modify sum_to cuda kernel to not need atomic adds in backwards by @nkoppel in #367
Simplifying trait Conv2DKernel and Cpu implementation by @coreylowman in #376
(#344) Implement cuda kernels for optimizers by @nkoppel in #378
Fix max_to and min_to edge case with negative zero by @ViliamVadocz in #380
Add cuda kernels for conv2d by @coreylowman in #369
Rework pool2d internals & add pool2d cuda kernels by @coreylowman in #384
Implement Shape for arrays (#377) by @nkoppel in #385
Efficient cuda kernels for reductions by @nkoppel in #382
Improving compilation times of deeply nested const generic modules by @coreylowman in #391
Fixing remainder of cuda tests & fixing cblas/cublas matmul with strides [1,1] by @coreylowman in #393
Adding Cuda device usage to mnist example by @coreylowman in #396
Adding GeLU operator (used in Gpt2) by @Narsil in #397
Removing codecov from workflows/readme by @coreylowman in #403
Reorganize tensor_ops, and add cuda_utils.cuh by @nkoppel in #398
Some small optimizations for conv2d on cpu by @coreylowman in #404
Removing Device generic from Gradients & optimizers by @coreylowman in #402
Add ToDevice and OnDevice to simplify nn api (#388) by @nkoppel in #394
Removes ModuleBuilder, Adds BuildModule & BuildOnDevice by @coreylowman in #405
Enable multi-core matmul by @infalmo in #417
Fix GELU CUDA kernel compilation by @ViliamVadocz in #409
Adding nn.Embedding layer. by @Narsil in #406
Removing defaults for Tensor Dtype & Device generic parameters by @coreylowman in #418
Removing Default for optimizers & adding &M to constructors by @coreylowman in #422
Adding runtime assertion in try_binary_op that shapes are equal by @coreylowman in #428
Add boolean operations and choose. by @nkoppel in #415
Add TensorFrom trait to create tensors from both vectors and arrays. by @nkoppel in #414
Adding nn builder structs, dtype generics, and remove device defaults. by @coreylowman in #433
Upgrade to cudarc==0.7.0 and use alloc_async instead of alloc_zeros_async by @coreylowman in #440
Add comparison tensor operations by @ViliamVadocz in #386
Add synchronize method to Cuda device by @ViliamVadocz in #442
f64 kernels by @coreylowman in #421
Add stack tensors method by @coreylowman in #449
cargo check cuda & run f64 tests in CI by @coreylowman in #447
Fix bug in #451 by @nkoppel in #453
Add more runtime shape checks by @coreylowman in #454
Adding ReshapeTo::reshape_like by @coreylowman in #456
Adding SampleTensor::sample_uniform_like and SampleTensor::sample_normal_like by @coreylowman in #457
Improve examples (add Cuda) by @TimerErTim in #452
Dataset iterators - adds batching, collating for iterators by @coreylowman in #462
Fixing issue with to_device and broadcasted tensors by @coreylowman in #465
Bump cudarc 0.7.2 by @coreylowman in #466
Adding index out of bounds checks to select/gather kernels by @coreylowman in #467
Rename to add_dim. by @infalmo in #471
impl BuildModule for ZeroSizedModule by @coreylowman in #470
Adds TensorCollection by @coreylowman in #469
Fixing cargo doc warnings by @coreylowman in #473
Using --gpu-architecture native with nvcc by @coreylowman in #474
using TensorFromVec for OneHotEncode and Arange by @coreylowman in #477
Small batchnorm optimizations by @coreylowman in #478
nvcc: fixed type bug by @M1ngXU in #480
Adds fast_alloc feature and binary kernel optimizations by @coreylowman in #481
Adding some "benchmarking" scripts by @coreylowman in #483
Add try_forward and try_forward_mut to Module and ModuleMut. by @nkoppel in #482
Optimizing cpu kernels for reductions by @coreylowman in #484
Using alloc_zeros_async and memset_zeros for cuda by @coreylowman in #489
Making Conv2D unbiased by default, and adding Bias2D module by @coreylowman in #494
Using image/filter stride in cuda kernel for conv by @coreylowman in #495
bump cudarc version by @coreylowman in #498
Adding attention_reshape (inference only) kernels. by @Narsil in #497
Adding lifetime to gat in ExactSizeDataset by @coreylowman in #501
added stack to device trait bound by @M1ngXU in #502
Allowing nn::Embedding to be dynamic in shape. by @Narsil in #503
Adding UnbiasedLinear (linear without bias). by @Narsil in #504
Making K dimension of matmul dynamic. by @Narsil in #505
Tensors the whole way down by @coreylowman in #508
Sorting tapes by unique_id to ensure proper operation order by @coreylowman in #510
cudarc 0.8.0 by @coreylowman in #512
Adding axpy tensor op & ModelEMA module walker by @coreylowman in #511
[Spring cleaning] Removes GradientTape & impl Clone for Gradients by @coreylowman in #514
Optimizer now takes &Gradients by @coreylowman in #515
Adding tensor.trace_with(grads) by @coreylowman in #517
Adding model.zero_grads(&mut gradients) by @coreylowman in #518
Adding gradient accumulation example by @coreylowman in #519
Don't clone tensor data when permuting or broadcasting by @nkoppel in #522
Use chunk_sum in cuda kernels of backward binary operations. by @nkoppel in #520
Adding no-std feature flag, matrixmultiply/threading behind feature flag. numpy no longer default by @coreylowman in #528
Adding model.alloc_grads(), removing Default for Gradients by @coreylowman in #524
feat: adds BatchNorm1D by @kstavro in #513
Safetensors support. by @Narsil in #381
Fixing bool tests with safetensors (serde compatibility) by @coreylowman in #529
Hotfixing the safetensors impl. by @Narsil in #531
Adds Tensor::concat by @coreylowman in #530
Changing stack to be method of array/vec instead of device by @coreylowman in #533
Easier preprocessing by @coreylowman in #534
Create tensor from usize (for e.g. select) on any Device by @M1ngXU in #535
Handle path for TensorVisitors using a TensorViewer by @nkoppel in #538
Adding nice error message when MHA num heads doesn't divide K/H by @coreylowman in #542
Moving Reshape to use stable compile time asserts by @coreylowman in #543
Finalizing nn exports by @coreylowman in #544
Moving src/unique_id & src/gradients into src/tensor by @coreylowman in #545
Docs update by @coreylowman in #549
Bump to cudarc 0.9.0 by @coreylowman in #551
chore: remove cblas feature in favor of intel-mkl by @Alexandcoats in #552
Adds ReduceShapeSelf::LastAxis to Shape by @coreylowman in #555
Letting batch & seq dimensions of matmul be dyn by @coreylowman in #556
Moving transformers to stable, and accepting dyn dimensions for transformer input by @coreylowman in #557
Reshape skip kernels with a contiguous tensor by @coreylowman in #558
Removing double computation of mean in normalize by @coreylowman in #559
feat: add realize shape by @Alexandcoats in #561
Querying nvidia-smi for compute capability instead of native by @coreylowman in #564
Adding features to cargo doc on ci by @coreylowman in #569
Updating 01-tensor by @coreylowman in #570
Fixing no-std support by @coreylowman in #571
Removes .trace_into(), .trace() now requires Gradients object by @coreylowman in #566
Adds trait Trace and generic training example by @coreylowman in #572
matrixmultiply optional. Adds cpu-seq-matmul, cpu-par-matmul, cpu-mkl-matmul features by @coreylowman in #576
Allow Modules to be constructed with the TensorCollection trait by @nkoppel in #548

New Contributors

@Dimev made their first contribution in #256
@JYudelson1 made their first contribution in #294
@Narsil made their first contribution in #368
@TimerErTim made their first contribution in #452
@kstavro made their first contribution in #513
@Alexandcoats made their first contribution in #552
@ViliamVadocz made their first contribution in #358

Full Changelog: v0.10.0...v0.11.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.11.0 - Cuda support, mixed const/runtime tensors, and device rewrite

What's Changed

New Contributors

Contributors