chore: Prepare 1.0.0~alpha2

tmattio · tmattio · commit 36f1926ea9b9 · 2025-11-03T14:44:20.000+05:30
diff --git a/CHANGES.md b/CHANGES.md
@@ -5,12 +5,35 @@ All notable changes to this project will be documented in this file.
 - Only document user-facing changes (features, bug fixes, performance improvements, API changes, etc.)
 - Add new entries at the top of the appropriate section (most recent first)
 
-## [1.0.0~alpha2] - TBD
+## [1.0.0~alpha2] - 2025-11-03
+
+We're excited to announce the release of Raven 1.0.0~alpha2! Less than a month after alpha1, this release notably includes contributions from Outreachy applicants in preparation for the upcoming _two_ internships.
+
+Some highlights from this release include:
+
+- NumPy-compatible text I/O with `Nx_io.{save,load}_text`
+- Lots of new functions in Nx/Rune, including neural-net ones `dropout`, `log_softmax`, `batch_norm`, `layer_norm`, and activation functions like `celu` and `celu`, and generic ones like `conjugate`, `index_put`, and more.
+- Addition of `.top` libraries for `nx`, `rune`, and `hugin` that auto-install pretty-printers in the OCaml toplevel. You can run e.g. `#require "nx.top"`.
+- Addition of a visualization API in Fehu via the new `fehu.visualize` library, supporting video recording.
+- Redesign of Kaun core datastructure and checkpointing subsystem for complete snapshotting.
+- Many, many bug fixes and correctness improvements.
+
+We've also made numerous performance improvements across the board:
+
+- Nx elementwise ops: 5–50× faster (e.g., Add 50×50 f32 88.81 µs → 1.83 µs, **48×**; Mul 100×100 f32 78.51 µs → 2.41 µs, **33×**).
+- Nx conv2d: **4–5×** faster on common shapes; up to **115×** on heavy f64 batched cases (e.g., B16 C64→128 16×16 K3 f64 1.61 s → 13.96 ms).
+- Rune autodiff: **1.2–3.7×** faster on core grads (e.g., MatMulGrad Medium 34.04 ms → 11.91 ms, **2.86×**; Large 190.19 ms → 50.97 ms, **3.73×**).
+- Talon dataframes: big wins in joins and group-bys (Join 805.35 ms → 26.10 ms, **31×**; Group-by 170.80 ms → 19.03 ms, **9×**; Filter 9.93 ms → 3.39 ms, **3×**).
+- Saga tokenizers: realistic workloads **4–17%** faster (e.g., WordPiece encode single 136.05 µs → 115.92 µs, **1.17×**; BPE batch_32 24.52 ms → 22.27 ms, **1.10×**)
+
+We're closing 8 user-reported issues or feature requests and are totalling 30 contributions from 15 unique contributors.
 
 ### Nx
 
-- Add `Nx_core.Cache_dir` module with consolidated cache directory utilities respecting `RAVEN_CACHE_ROOT`, `XDG_CACHE_HOME`, and `HOME` fallback, replacing project-specific cache logic across the whole raven ecosystem (#133, @Arsalaan-Alam)
+- Add `Nx_io.Cache_dir` module with consolidated cache directory utilities respecting `RAVEN_CACHE_ROOT`, `XDG_CACHE_HOME`, and `HOME` fallback, replacing project-specific cache logic across the whole raven ecosystem (#134, @Arsalaan-Alam)
 - Add `Nx_io.save_txt` / `Nx_io.load_txt` with NumPy-compatible formatting, comments, and dtype support (#120, @six-shot)
+- Optimize `multi_dot` for matrix chains, reducing intermediate allocations and improving performance (@tmattio)
+- Add public `index_put` function for indexed updates (@tmattio)
 - Clarify `reshape` documentation to match its view-only semantics (@tmattio)
 - Provide `nx.top`, `rune.top`, and `hugin.top` libraries that auto-install pretty printers in the OCaml toplevel and update Quill to load them (@tmattio)
 - Add `ifill` for explicit in-place fills and make `fill` return a copied tensor (@tmattio)
@@ -19,7 +42,7 @@ All notable changes to this project will be documented in this file.
 - Speed up float reductions with contiguous multi-axis fast paths (@tmattio)
 - Fast-path padding-free `unfold` to lower conv2d overhead (@tmattio)
 - Move neural-network operations (softmax, log_softmax, relu, gelu, silu, sigmoid, tanh) from Kaun to Nx (@tmattio)
-- Add public `conjugate` function for complex number conjugation (#123, @Arsalaan-Alam)
+- Add public `conjugate` function for complex number conjugation (#125, @Arsalaan-Alam)
 - Fix complex vdot to conjugate first tensor before multiplication, ensuring correct mathematical behavior (#123, @Arsalaan-Alam)
 - Update comparison and conditional operations to use boolean tensors (#115, @nirnayroy)
 - Add support for rcond parameter and underdetermined systems to `lstsq` (#102, @Shocker444)
@@ -52,18 +75,22 @@ All notable changes to this project will be documented in this file.
 ### Kaun
 
 - Added Similarity and Polysemy analysis to the BERT example (#137, @nirnayroy)
+- Support attention masks via the new `Kaun.Attention` module (@tmattio)
+- Support loading sharded Hugging Face safetensors (@tmattio)
+- Fix BERT and GPT‑2 model loading (@tmattio)
 - API simplification: removed type parameters from public types; `Ptree` now supports mixed‑dtype trees via packed tensors with typed getters. (@tmattio)
 - Checkpointing overhaul: versioned `Train_state` with schema tagging, explicit `Checkpoint.{Snapshot,Artifact,Manifest,Repository}` (retention, tags, metadata), and simple save/load helpers for snapshots and params. (@tmattio)
 - Overhaul dataset combinators: derive tensor specs from Rune dtype, fix sampling/window bugs, validate weighted sampling, and respect `drop_remainder` (@tmattio)
 - Make dataset `prefetch` truly asynchronous with background domains and allow reusing an external Domainslib pool via `parallel_map ~pool` (@tmattio)
+- Use `Dataset.iter` for epoch batches to reduce overhead (@tmattio)
 - Update BERT and GPT-2 tokenizer cache to use `Nx.Cache` for consistent cache directory resolution (#133, @Arsalaan-Alam)
 - Honor text dataset encodings via incremental Uutf decoding (#122, @Satarupa22-SD).
 - Preserve empty sequential modules when unflattening so indices stay aligned for checkpoint round-tripping (@tmattio)
 - Prevent `Training.fit`/`evaluate` from consuming entire datasets eagerly and fail fast when a dataset yields no batches, avoiding hangs and division-by-zero crashes (@tmattio)
 - Allow metric history to tolerate metrics that appear or disappear between epochs so dynamic metric sets no longer raise during training (@tmattio)
 - Make `Optimizer.clip_by_global_norm` robust to zero gradients and empty parameter trees to avoid NaNs during training (@tmattio)
 - Split CSV loader into `from_csv` and `from_csv_with_labels` to retain labels when requested (#114, @Satarupa22-SD)
-- Implement AUC-ROC and AUC-PR in Kaun metrics and simplify their signatures (#109, #131, @Shocker444)
+- Implement AUC-ROC and AUC-PR in Kaun metrics and simplify their signatures (#124, #131, @Shocker444)
 - Add mean absolute percentage error, explained variance, R² (with optional adjustment), KL-divergence, and top-k accuracy to Kaun metrics (@tmattio)
 - Add NDCG, MAP, and MRR ranking metrics to Kaun metrics (@tmattio)
 - Add BLEU, ROUGE, and METEOR metrics to Kaun for pre-tokenized sequences, removing tokenizer dependencies (@tmattio)
diff --git a/dune-project b/dune-project
@@ -23,7 +23,7 @@
 
 (using mdx 0.4)
 
-(version 1.0.0~alpha1)
+(version 1.0.0~alpha2)
 
 (implicit_transitive_deps false)
 
diff --git a/fehu.opam b/fehu.opam
@@ -1,6 +1,6 @@
 # This file is generated by dune, edit dune-project instead
 opam-version: "2.0"
-version: "1.0.0~alpha1"
+version: "1.0.0~alpha2"
 synopsis: "Reinforcement learning framework for OCaml"
 description:
   "Fehu is a reinforcement learning framework built on Raven's ecosystem, providing environments, algorithms, and training utilities"
diff --git a/hugin.opam b/hugin.opam
@@ -1,6 +1,6 @@
 # This file is generated by dune, edit dune-project instead
 opam-version: "2.0"
-version: "1.0.0~alpha1"
+version: "1.0.0~alpha2"
 synopsis: "Visualization library for OCaml"
 description:
   "Hugin is a powerful visualization library for OCaml that produces publication-quality plots and charts. It integrates with the Raven ecosystem to provide visualization of Nx data."
diff --git a/kaun.opam b/kaun.opam
@@ -1,6 +1,6 @@
 # This file is generated by dune, edit dune-project instead
 opam-version: "2.0"
-version: "1.0.0~alpha1"
+version: "1.0.0~alpha2"
 synopsis: "Flax-inspired neural network library for OCaml"
 description:
   "Kaun brings modern deep learning to OCaml with a flexible, type-safe API for building and training neural networks. It leverages Rune for automatic differentiation and computation graph optimization while maintaining OCaml's functional programming advantages."
diff --git a/nx-datasets.opam b/nx-datasets.opam
@@ -1,6 +1,6 @@
 # This file is generated by dune, edit dune-project instead
 opam-version: "2.0"
-version: "1.0.0~alpha1"
+version: "1.0.0~alpha2"
 synopsis: "Common datasets for machine learning"
 description:
   "A collection of common datasets for machine learning tasks, including image classification, regression, and more. This package provides easy access to popular datasets in a format compatible with Nx."
diff --git a/nx.opam b/nx.opam
@@ -1,6 +1,6 @@
 # This file is generated by dune, edit dune-project instead
 opam-version: "2.0"
-version: "1.0.0~alpha1"
+version: "1.0.0~alpha2"
 synopsis: "High-performance N-dimensional array library for OCaml"
 description:
   "Nx is the core component of the Raven ecosystem providing efficient numerical computation with multi-device support. It offers NumPy-like functionality with the benefits of OCaml's type system."
diff --git a/quill.opam b/quill.opam
@@ -1,6 +1,6 @@
 # This file is generated by dune, edit dune-project instead
 opam-version: "2.0"
-version: "1.0.0~alpha1"
+version: "1.0.0~alpha2"
 synopsis: "Interactive notebook for OCaml data science"
 description:
   "Quill is an interactive notebook application for data exploration, prototyping, and knowledge sharing in OCaml. It provides a Jupyter-like experience with rich visualization and documentation capabilities."
diff --git a/raven.opam b/raven.opam
@@ -1,6 +1,6 @@
 # This file is generated by dune, edit dune-project instead
 opam-version: "2.0"
-version: "1.0.0~alpha1"
+version: "1.0.0~alpha2"
 synopsis: "Meta package for the Raven ML ecosystem"
 description:
   "Raven is a comprehensive machine learning ecosystem for OCaml. This meta package installs all Raven components including Nx (tensors), Hugin (plotting), Quill (notebooks), Rune (autodiff), Kaun (neural networks), and Sowilo (computer vision)."
diff --git a/rune.opam b/rune.opam
@@ -1,6 +1,6 @@
 # This file is generated by dune, edit dune-project instead
 opam-version: "2.0"
-version: "1.0.0~alpha1"
+version: "1.0.0~alpha2"
 synopsis: "Automatic differentiation and JIT compilation for OCaml"
 description:
   "Rune provides automatic differentiation capabilities and experimental JIT compilation for the Raven ecosystem. It enables gradient-based optimization and supports functional transformations like grad, value_and_grad, and vmap, making it the foundation for deep learning in OCaml."
diff --git a/saga.opam b/saga.opam
@@ -1,6 +1,6 @@
 # This file is generated by dune, edit dune-project instead
 opam-version: "2.0"
-version: "1.0.0~alpha1"
+version: "1.0.0~alpha2"
 synopsis: "Text processing and NLP extensions for Nx"
 description:
   "Text processing library that extends Nx with natural language processing capabilities. Provides tokenization, encoding, and text manipulation functionality compatible with the Nx tensor library."
diff --git a/sowilo.opam b/sowilo.opam
@@ -1,6 +1,6 @@
 # This file is generated by dune, edit dune-project instead
 opam-version: "2.0"
-version: "1.0.0~alpha1"
+version: "1.0.0~alpha2"
 synopsis: "Computer vision extensions for Rune"
 description:
   "Computer vision operations and algorithms built on top of the Rune library. Provides image processing, feature extraction, and other vision-related functionality."
diff --git a/talon.opam b/talon.opam
@@ -1,6 +1,6 @@
 # This file is generated by dune, edit dune-project instead
 opam-version: "2.0"
-version: "1.0.0~alpha1"
+version: "1.0.0~alpha2"
 synopsis: "A dataframe library for OCaml"
 description:
   "Talon provides efficient tabular data manipulation with heterogeneous column types, inspired by pandas and polars."