PythonOT · rflamary · May 3, 2023 · Apr 29, 2023 · Apr 29, 2023 · Apr 29, 2023
diff --git a/README.md b/README.md
@@ -212,7 +212,7 @@ You can also post bug reports and feature requests in Github issues. Make sure t
 
 [3] Benamou, J. D., Carlier, G., Cuturi, M., Nenna, L., & Peyré, G. (2015). [Iterative Bregman projections for regularized transportation problems](https://arxiv.org/pdf/1412.5154.pdf). SIAM Journal on Scientific Computing, 37(2), A1111-A1138.
 
-[4] S. Nakhostin, N. Courty, R. Flamary, D. Tuia, T. Corpetti, [Supervised planetary unmixing with optimal transport](https://hal.archives-ouvertes.fr/hal-01377236/document), Whorkshop on Hyperspectral Image and Signal Processing : Evolution in Remote Sensing (WHISPERS), 2016.
+[4] S. Nakhostin, N. Courty, R. Flamary, D. Tuia, T. Corpetti, [Supervised planetary unmixing with optimal transport](https://hal.archives-ouvertes.fr/hal-01377236/document), Workshop on Hyperspectral Image and Signal Processing : Evolution in Remote Sensing (WHISPERS), 2016.
 
 [5] N. Courty; R. Flamary; D. Tuia; A. Rakotomamonjy, [Optimal Transport for Domain Adaptation](https://arxiv.org/pdf/1507.00504.pdf), in IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.PP, no.99, pp.1-1
 
@@ -250,7 +250,7 @@ You can also post bug reports and feature requests in Github issues. Make sure t
 
 [22] J. Altschuler, J.Weed, P. Rigollet, (2017) [Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration](https://papers.nips.cc/paper/6792-near-linear-time-approximation-algorithms-for-optimal-transport-via-sinkhorn-iteration.pdf), Advances in Neural Information Processing Systems (NIPS) 31
 
-[23] Aude, G., Peyré, G., Cuturi, M., [Learning Generative Models with Sinkhorn Divergences](https://arxiv.org/abs/1706.00292), Proceedings of the Twenty-First International Conference on Artficial Intelligence and Statistics, (AISTATS) 21, 2018
+[23] Aude, G., Peyré, G., Cuturi, M., [Learning Generative Models with Sinkhorn Divergences](https://arxiv.org/abs/1706.00292), Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, (AISTATS) 21, 2018
 
 [24] Vayer, T., Chapel, L., Flamary, R., Tavenard, R. and Courty, N. (2019). [Optimal Transport for structured data with application on graphs](http://proceedings.mlr.press/v97/titouan19a.html) Proceedings of the 36th International Conference on Machine Learning (ICML).
 

diff --git a/RELEASES.md b/RELEASES.md
@@ -9,6 +9,8 @@
 
 - Fix circleci-redirector action and codecov (PR #460)
 - Fix issues with cuda for ot.binary_search_circle and with gradients for ot.sliced_wasserstein_sphere (PR #457)
+- Major documentation cleanup (PR #462, #467)
+- Fix gradients for "Wasserstein2 Minibatch GAN" example (PR #466)
 
 ## 0.9.0
 
@@ -87,7 +89,7 @@ big. More details below.
 
 
 #### New features
-- Added feature to (Fused) Gromov-Wasserstein solvers herited from `ot.optim` to support relative and absolute loss variations as stopping criterions (PR #431)
+- Added feature to (Fused) Gromov-Wasserstein solvers inherited from `ot.optim` to support relative and absolute loss variations as stopping criterions (PR #431)
 - Added feature to (Fused) Gromov-Wasserstein solvers to handle asymmetric matrices (PR #431)
 - Added semi-relaxed (Fused) Gromov-Wasserstein solvers in `ot.gromov` + examples (PR #431)
 - Added the spherical sliced-Wasserstein discrepancy in `ot.sliced.sliced_wasserstein_sphere` and `ot.sliced.sliced_wasserstein_sphere_unif` + examples (PR #434)
@@ -279,7 +281,7 @@ a [Generative Network
 (GAN)](https://PythonOT.github.io/auto_examples/backends/plot_wass2_gan_torch.html),
 for a  [sliced Wasserstein gradient
 flow](https://PythonOT.github.io/auto_examples/backends/plot_sliced_wass_grad_flow_pytorch.html)
-and [optimizing the Gromov-Wassersein distance](https://PythonOT.github.io/auto_examples/backends/plot_optim_gromov_pytorch.html). Note that the Jax backend is still in early development and quite
+and [optimizing the Gromov-Wasserstein distance](https://PythonOT.github.io/auto_examples/backends/plot_optim_gromov_pytorch.html). Note that the Jax backend is still in early development and quite
 slow at the moment, we strongly recommend for Jax users to use the [OTT
 toolbox](https://github.com/google-research/ott)  when possible.
  As a result of this new feature,
@@ -291,7 +293,7 @@ Pointwise Gromov
 Wasserstein](https://PythonOT.github.io/auto_examples/gromov/plot_gromov.html#compute-gw-with-a-scalable-stochastic-method-with-any-loss-function),
 Sinkhorn in log space with `method='sinkhorn_log'`, [Projection Robust
 Wasserstein](https://PythonOT.github.io/gen_modules/ot.dr.html?highlight=robust#ot.dr.projection_robust_wasserstein),
-ans [deviased Sinkorn barycenters](https://PythonOT.github.ioauto_examples/barycenters/plot_debiased_barycenter.html).
+ans [debiased Sinkhorn barycenters](https://PythonOT.github.ioauto_examples/barycenters/plot_debiased_barycenter.html).
 
 This release will also simplify the installation process. We have now a
 `pyproject.toml` that defines the build dependency and POT should now build even
@@ -432,15 +434,15 @@ are coming for the next versions.
 
 #### Closed issues
 
-- Add JMLR paper to the readme and Mathieu Blondel to the Acknoledgments (PR
+- Add JMLR paper to the readme and Mathieu Blondel to the Acknowledgments (PR
   #231, #232)
 - Bug in Unbalanced OT example (Issue #127)
 - Clean Cython output when calling setup.py clean (Issue #122)
 - Various Macosx compilation problems (Issue #113, Issue #118, PR#130)
 - EMD dimension mismatch (Issue #114, Fixed in PR #116)
 - 2D barycenter bug for non square images (Issue #124, fixed in PR #132)
 - Bad value in EMD 1D (Issue #138, fixed in PR #139)
-- Log bugs for Gromov-Wassertein solver (Issue #107, fixed in PR #108)
+- Log bugs for Gromov-Wasserstein solver (Issue #107, fixed in PR #108)
 - Weight issues in barycenter function (PR #106)
 
 ## 0.6.0
@@ -471,9 +473,9 @@ a solver for [Unbalanced OT
 barycenters](https://github.com/rflamary/POT/blob/master/notebooks/plot_UOT_barycenter_1D.ipynb).
 A new variant of Gromov-Wasserstein divergence called [Fused
 Gromov-Wasserstein](https://pot.readthedocs.io/en/latest/all.html?highlight=fused_#ot.gromov.fused_gromov_wasserstein)
-has been also contributed with exemples of use on [structured
+has been also contributed with examples of use on [structured
 data](https://github.com/rflamary/POT/blob/master/notebooks/plot_fgw.ipynb) and
-computing [barycenters of labeld
+computing [barycenters of labeled
 graphs](https://github.com/rflamary/POT/blob/master/notebooks/plot_barycenter_fgw.ipynb).
 
 
@@ -534,7 +536,7 @@ and [free support](https://github.com/rflamary/POT/blob/master/notebooks/plot_fr
 implementation of entropic OT.
 
 POT 0.5 also comes with a rewriting of ot.gpu using the cupy framework instead of
-the unmaintained cudamat. Note that while we tried to keed changes to the
+the unmaintained cudamat. Note that while we tried to keep changes to the
 minimum, the OTDA classes were deprecated. If you are happy with the cudamat
 implementation, we recommend you stay with stable release 0.4 for now.
 
@@ -558,7 +560,7 @@ and new POT contributors (you can see the list in the [readme](https://github.co
 * Stochastic OT in the dual and semi-dual (PR #52 and PR #62)
 * Free support barycenters (PR #56)
 * Speed-up Sinkhorn function (PR #57 and PR #58)
-* Add convolutional Wassersein barycenters for 2D images (PR #64)
+* Add convolutional Wasserstein barycenters for 2D images (PR #64)
 * Add Greedy Sinkhorn variant (Greenkhorn) (PR #66)
 * Big ot.gpu update with cupy implementation (instead of un-maintained cudamat) (PR #67)
 
@@ -609,15 +611,15 @@ This release contains a lot of contribution from new contributors.
 * new notebooks for emd computation and Wasserstein Discriminant Analysis
 * relocate notebooks
 * update documentation
-* clean_zeros(a,b,M) for removimg zeros in sparse distributions
+* clean_zeros(a,b,M) for removing zeros in sparse distributions
 * GPU implementations for sinkhorn and group lasso regularization
 
 
 ## V0.2
 *7 Apr 2017*
 
 * New dimensionality reduction method (WDA)
-* Efficient method emd2 returns only tarnsport (in paralell if several histograms given)
+* Efficient method emd2 returns only transport (in parallel if several histograms given)
 
 
 

diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
@@ -151,7 +151,7 @@ case you are only solving an approximation of the Wasserstein distance because
 the 1-Lipschitz constraint on the dual cannot be enforced exactly (approximated
 through filter thresholding or regularization). Finally note that in order to
 avoid solving large scale OT problems, a number of recent approached minimized
-the expected Wasserstein distance on minibtaches that is different from the
+the expected Wasserstein distance on minibatches that is different from the
 Wasserstein but has better computational and
 `statistical properties <https://arxiv.org/pdf/1910.04091.pdf>`_.
 
@@ -164,8 +164,8 @@ Optimal transport and Wasserstein distance
     In POT, most functions that solve OT or regularized OT problems have two
     versions that return the OT matrix or the value of the optimal solution. For
     instance :any:`ot.emd` returns the OT matrix and :any:`ot.emd2` returns the
-    Wassertsein distance. This approach has been implemented in practice for all
-    solvers that return an OT matrix (even Gromov-Wasserstsein).
+    Wasserstein distance. This approach has been implemented in practice for all
+    solvers that return an OT matrix (even Gromov-Wasserstein).
 
 .. _kantorovitch_solve:
 
@@ -349,9 +349,9 @@ More details about the algorithms used are given in the following note.
       classic algorithm [2]_.
     + :code:`method='sinkhorn_log'` calls :any:`ot.bregman.sinkhorn_log`  the
       sinkhorn algorithm in log space [2]_ that is more stable but can be
-      slower in numpy since `logsumexp` is not implmemented in parallel. 
+      slower in numpy since `logsumexp` is not implemented in parallel.
       It is the recommended solver for applications that requires
-      differentiability with a  small number of iterations.
+      differentiability with a small number of iterations.
     + :code:`method='sinkhorn_stabilized'` calls :any:`ot.bregman.sinkhorn_stabilized`  the
       log stabilized version of the algorithm [9]_.
     + :code:`method='sinkhorn_epsilon_scaling'` calls
@@ -368,7 +368,7 @@ More details about the algorithms used are given in the following note.
     function to solve the smooth problem with :code:`L-BFGS-B` algorithm. Tu use
     this solver, use functions :any:`ot.smooth.smooth_ot_dual` or
     :any:`ot.smooth.smooth_ot_semi_dual` with parameter :code:`reg_type='kl'` to
-    choose entropic/Kullbach Leibler regularization.
+    choose entropic/Kullbach-Leibler regularization.
 
     **Choosing a Sinkhorn solver**
 
@@ -378,7 +378,7 @@ More details about the algorithms used are given in the following note.
     :any:`ot.bregman.sinkhorn_stabilized` solver that will avoid numerical
     errors. This last solver can be very slow in practice and might not even
     converge to a reasonable OT matrix in a finite time. This is why
-    :any:`ot.bregman.sinkhorn_epsilon_scaling` that relie on iterating the value
+    :any:`ot.bregman.sinkhorn_epsilon_scaling` that relies on iterating the value
     of the regularization (and using warm start) sometimes leads to better
     solutions. Note that the greedy version of the Sinkhorn
     :any:`ot.bregman.greenkhorn` can also lead to a speedup and the screening
@@ -546,7 +546,7 @@ where :math:`b_k` are also weights in the simplex. In the non-regularized case,
 the problem above is a classical linear program. In this case we propose a
 solver :meth:`ot.lp.barycenter` that relies on generic LP solvers. By default the
 function uses :any:`scipy.optimize.linprog`, but more efficient LP solvers from
-cvxopt can be also used by changing parameter :code:`solver`. Note that this problem
+`cvxopt` can be also used by changing parameter :code:`solver`. Note that this problem
 requires to solve a very large linear program and can be very slow in
 practice.
 
@@ -812,7 +812,7 @@ Gromov Wasserstein(GW)
 Gromov Wasserstein (GW) is a generalization of OT to distributions that do not lie in
 the same space [13]_. In this case one cannot compute distance between samples
 from the two distributions. [13]_ proposed instead to realign the metric spaces
-by computing a transport between distance matrices. The Gromow Wasserstein
+by computing a transport between distance matrices. The Gromov Wasserstein
 alignment between two distributions can be expressed as the one minimizing:
 
 .. math::
@@ -837,7 +837,7 @@ There also exists an entropic regularized variant of GW that has been proposed i
     :heading-level: "
 
 Gromov Wasserstein barycenters
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Note that similarly to Wasserstein distance GW allows for the definition of GW
 barycenters that can be expressed as
@@ -1134,7 +1134,7 @@ References
 
 .. [23] Genevay, A., Peyré, G., Cuturi, M., `Learning Generative Models with
     Sinkhorn Divergences <https://arxiv.org/abs/1706.00292>`__, Proceedings
-    of the Twenty-First International Conference on Artficial Intelligence
+    of the Twenty-First International Conference on Artificial Intelligence
     and Statistics, (AISTATS) 21, 2018
 
 .. [24] Vayer, T., Chapel, L., Flamary, R., Tavenard, R. and Courty, N.
@@ -1187,18 +1187,18 @@ References
     In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10648-10656).
 
 .. [36] Liutkus, A., Simsekli, U., Majewski, S., Durmus, A., & Stöter, F. R. 
-       (2019, May). `Sliced-Wasserstein flows: Nonparametric generative modeling via
-        optimal transport and diffusions
-        <http://proceedings.mlr.press/v97/liutkus19a/liutkus19a.pdf>`_. In International
-        Conference on Machine Learning (pp. 4104-4113). PMLR.
+    (2019, May). `Sliced-Wasserstein flows: Nonparametric generative modeling via
+    optimal transport and diffusions
+    <http://proceedings.mlr.press/v97/liutkus19a/liutkus19a.pdf>`_. In International
+    Conference on Machine Learning (pp. 4104-4113). PMLR.
 
 .. [37] Janati, H., Cuturi, M., Gramfort, A. `Debiased sinkhorn barycenters 
     <http://proceedings.mlr.press/v119/janati20a/janati20a.pdf>`_ Proceedings of
     the 37th International Conference on Machine Learning, PMLR 119:4692-4701, 2020
 
 .. [38] C. Vincent-Cuaz, T. Vayer, R. Flamary, M. Corneli, N. Courty, `Online
-       Graph Dictionary Learning <https://arxiv.org/pdf/2102.06555.pdf>`_\ , 
-       International Conference on Machine Learning (ICML), 2021.
+    Graph Dictionary Learning <https://arxiv.org/pdf/2102.06555.pdf>`_\ , 
+    International Conference on Machine Learning (ICML), 2021.
 
 .. [39] Gozlan, N., Roberto, C., Samson, P. M., & Tetali, P. (2017).
     `Kantorovich duality for general transport costs and applications

diff --git a/examples/backends/plot_dual_ot_pytorch.py b/examples/backends/plot_dual_ot_pytorch.py
@@ -100,7 +100,7 @@
 Ge = ot.stochastic.plan_dual_entropic(u, v, xs, xt, reg=reg)
 
 # %%
-# Plot teh estimated entropic OT plan
+# Plot the estimated entropic OT plan
 # -----------------------------------
 
 pl.figure(3, (10, 5))
@@ -114,7 +114,7 @@
 
 # %%
 # Estimating dual variables for quadratic OT
-# -----------------------------------------
+# ------------------------------------------
 
 u = torch.randn(n_source_samples, requires_grad=True)
 v = torch.randn(n_source_samples, requires_grad=True)
@@ -157,7 +157,7 @@
 
 # %%
 # Plot the estimated quadratic OT plan
-# -----------------------------------
+# ------------------------------------
 
 pl.figure(5, (10, 5))
 pl.clf()

diff --git a/examples/backends/plot_optim_gromov_pytorch.py b/examples/backends/plot_optim_gromov_pytorch.py
@@ -1,7 +1,7 @@
 r"""
-=================================
+=======================================================
 Optimizing the Gromov-Wasserstein distance with PyTorch
-=================================
+=======================================================
 
 In this example, we use the pytorch backend to optimize the Gromov-Wasserstein
 (GW) loss between two graphs expressed as empirical distribution.
@@ -11,7 +11,7 @@
 We can see that this actually recovers the proportion of classes in the SBM
 and allows for an accurate clustering of the nodes using the GW optimal plan.
 
-In the second part, we optimize simultaneously the weights and the sructure of
+In the second part, we optimize simultaneously the weights and the structure of
 the template graph which allows us to perform graph compression and to recover
 other properties of the SBM.
 
@@ -38,7 +38,7 @@
 
 # %%
 # Graph generation
-# ---------------
+# ----------------
 
 rng = np.random.RandomState(42)
 
@@ -95,8 +95,8 @@ def plot_graph(x, C, color='C0', s=None):
 
 # %%
 # Optimizing GW w.r.t. the weights on a template structure
-# ------------------------------------------------
-# The adajacency matrix C1 is block diagonal with 3 blocks. We want to
+# --------------------------------------------------------
+# The adjacency matrix C1 is block diagonal with 3 blocks. We want to
 # optimize the weights of a simple template C0=eye(3) and see if we can
 # recover the proportion of classes from the SBM (up to a permutation).
 
@@ -155,7 +155,7 @@ def min_weight_gw(C1, C2, a2, nb_iter_max=100, lr=1e-2):
 
 # %%
 # Community clustering with uniform and estimated weights
-# --------------------------------------------
+# -------------------------------------------------------
 # The GW OT  plan can be used to perform a clustering of the nodes of a graph
 # when computing the GW with a simple template like C0 by labeling nodes in
 # the original graph using by the index of the noe in the template receiving
@@ -193,7 +193,7 @@ def min_weight_gw(C1, C2, a2, nb_iter_max=100, lr=1e-2):
 # classes
 
 
-def graph_compession_gw(nb_nodes, C2, a2, nb_iter_max=100, lr=1e-2):
+def graph_compression_gw(nb_nodes, C2, a2, nb_iter_max=100, lr=1e-2):
     """ solve min_a GW(C1,C2,a, a2) by gradient descent"""
 
     # use pyTorch for our data
@@ -237,8 +237,8 @@ def graph_compession_gw(nb_nodes, C2, a2, nb_iter_max=100, lr=1e-2):
 
 
 nb_nodes = 3
-a0_est2, C0_est2, loss_iter2 = graph_compession_gw(nb_nodes, C1, ot.unif(n),
-                                                   nb_iter_max=100, lr=5e-2)
+a0_est2, C0_est2, loss_iter2 = graph_compression_gw(nb_nodes, C1, ot.unif(n),
+                                                    nb_iter_max=100, lr=5e-2)
 
 pl.figure(4)
 pl.plot(loss_iter2)

diff --git a/examples/backends/plot_sliced_wass_grad_flow_pytorch.py b/examples/backends/plot_sliced_wass_grad_flow_pytorch.py
@@ -1,16 +1,16 @@
 r"""
-=================================
+============================================================
 Sliced Wasserstein barycenter and gradient flow with PyTorch
-=================================
+============================================================
 
-In this exemple we use the pytorch backend to optimize the sliced Wasserstein
+In this example we use the pytorch backend to optimize the sliced Wasserstein
 loss between two empirical distributions [31].
 
 In the first example one we perform a
 gradient flow on the support of a distribution that minimize the sliced
-Wassersein distance as poposed in [36].
+Wasserstein distance as proposed in [36].
 
-In the second exemple we optimize with a gradient descent the sliced
+In the second example we optimize with a gradient descent the sliced
 Wasserstein barycenter between two distributions as in [31].
 
 [31] Bonneel, Nicolas, et al. "Sliced and radon wasserstein barycenters of

diff --git a/examples/backends/plot_ssw_unif_torch.py b/examples/backends/plot_ssw_unif_torch.py
@@ -119,7 +119,7 @@ def plot_sphere(ax):
 
 # %%
 # Animate trajectories of generated samples along iteration
-# -------------------------------------------------------
+# ---------------------------------------------------------
 
 pl.figure(4, (8, 8))