diff --git a/chapter_introduction/index.md b/chapter_introduction/index.md
index 7bf012414..901aff149 100644
--- a/chapter_introduction/index.md
+++ b/chapter_introduction/index.md
@@ -176,7 +176,7 @@ $$l(y, y') = \sum_i (y_i - y_i')^2.$$
请注意,最有可能的类不一定是您要用于决策的类。假设你发现这个美丽的蘑菇在你的后院如 :numref:`fig_death_cap` 所示。
-![Death cap---do not eat!](../img/death_cap.jpg)
+![Death cap---do not eat!](../img/death-cap.jpg)
:width:`200px`
:label:`fig_death_cap`
@@ -221,7 +221,7 @@ $$L(\mathrm{action}| x) = E_{y \sim p(y| x)}[\mathrm{loss}(\mathrm{action},y)].$
鉴于这样的模型,那么对于任何给定的用户,我们可以检索分数最大 $y_{ij}$ 的对象集,然后可以推荐给客户。生产系统比较先进,并且在计算此类分数时考虑详细的用户活动和项目特征。:numref:`fig_deeplearning_amazon` 是亚马逊推荐的深度学习书籍样本,基于个性化算法,经过调整,以捕捉作者的偏好。
-![Deep learning books recommended by Amazon.](../img/deeplearning_amazon.jpg)
+![Deep learning books recommended by Amazon.](../img/deeplearning-amazon.jpg)
:label:`fig_deeplearning_amazon`
尽管推荐系统具有巨大的经济价值,但天真地建立在预测模型之上,仍然存在一些严重的概念缺陷。首先,我们只观察 * 审查反馈 *。用户优先评分他们感觉强烈的电影:您可能会注意到项目获得了许多 5 星和 1 星评级,但三星评级显然很少。此外,当前的购买习惯往往是目前使用的推荐算法的结果,但学习算法并不总是考虑到这一细节。因此,反馈循环有可能形成,推荐人系统会优先推荐一个项目,然后由于购买量增加),反过来更频繁地推荐一个项目。许多关于如何处理审查、激励和反馈循环的问题都是重要的开放研究问题。
diff --git a/chapter_introduction/index_origin.md b/chapter_introduction/index_origin.md
index 61e88b926..610d49770 100644
--- a/chapter_introduction/index_origin.md
+++ b/chapter_introduction/index_origin.md
@@ -682,7 +682,7 @@ the one that you are going to use for your decision.
Assume that you find this beautiful mushroom in your backyard
as shown in :numref:`fig_death_cap`.
-![Death cap---do not eat!](../img/death_cap.jpg)
+![Death cap---do not eat!](../img/death-cap.jpg)
:width:`200px`
:label:`fig_death_cap`
@@ -852,7 +852,7 @@ detailed user activity and item characteristics into account
when computing such scores. :numref:`fig_deeplearning_amazon` is an example
of deep learning books recommended by Amazon based on personalization algorithms tuned to capture the author's preferences.
-![Deep learning books recommended by Amazon.](../img/deeplearning_amazon.jpg)
+![Deep learning books recommended by Amazon.](../img/deeplearning-amazon.jpg)
:label:`fig_deeplearning_amazon`
Despite their tremendous economic value, recommendation systems
diff --git a/chapter_linear-networks/linear-regression.md b/chapter_linear-networks/linear-regression.md
index 282cc5fdd..b2120cc99 100644
--- a/chapter_linear-networks/linear-regression.md
+++ b/chapter_linear-networks/linear-regression.md
@@ -57,7 +57,7 @@ $$l^{(i)}(\mathbf{w}, b) = \frac{1}{2} \left(\hat{y}^{(i)} - y^{(i)}\right)^2.$$
这个常数$\frac{1}{2}$不会带来本质的差别,但这样在形式上稍微简单一些。当我们对损失函数求导后常数系数为1。由于训练数据集是给我们的,并不受我们控制,所以经验误差只是关于模型参数的函数。为了进一步说明,来看下面的例子,我们为一维情况下的回归问题绘制图像,如 :numref:`fig_fit_linreg` 所示。
-![用线性模型拟合数据。](../img/fit_linreg.svg)
+![用线性模型拟合数据。](../img/fit-linreg.svg)
:label:`fig_fit_linreg`
由于平方误差函数中的二次方项,估计值 $\hat{y}^{(i)}$ 和观测值 $y^{(i)}$ 之间较大的差异将贡献更大的损失。为了度量模型在整个数据集上的质量,我们需计算在训练集$n$个样本上的损失均值(也等价于求和)。
diff --git a/chapter_linear-networks/linear-regression_origin.md b/chapter_linear-networks/linear-regression_origin.md
index 5e949aee2..7f360c220 100644
--- a/chapter_linear-networks/linear-regression_origin.md
+++ b/chapter_linear-networks/linear-regression_origin.md
@@ -80,7 +80,7 @@ limit the expressivity of our model.
Strictly speaking, :eqref:`eq_price-area` is an *affine transformation*
of input features,
which is characterized by
-a *linear transformation* of features via weighted sum, combined with
+a *linear transformation* of features via weighted sum, combined with
a *translation* via the added bias.
Given a dataset, our goal is to choose
@@ -177,7 +177,7 @@ To make things more concrete, consider the example below
where we plot a regression problem for a one-dimensional case
as shown in :numref:`fig_fit_linreg`.
-![Fit data with a linear model.](../img/fit_linreg.svg)
+![Fit data with a linear model.](../img/fit-linreg.svg)
:label:`fig_fit_linreg`
Note that large differences between
diff --git a/chapter_multilayer-perceptrons/kaggle-house-price.md b/chapter_multilayer-perceptrons/kaggle-house-price.md
index b8aa6e945..cbfc3f03f 100644
--- a/chapter_multilayer-perceptrons/kaggle-house-price.md
+++ b/chapter_multilayer-perceptrons/kaggle-house-price.md
@@ -84,7 +84,7 @@ def download_all(): #@save
> https://www.kaggle.com/c/house-prices-advanced-regression-techniques
-![The house price prediction competition page.](../img/house_pricing.png)
+![The house price prediction competition page.](../img/house-pricing.png)
:width:`400px`
:label:`fig_house_pricing`
@@ -448,7 +448,7 @@ train_and_pred(train_features, test_features, train_labels, test_data,
* 点击页面底部虚线框中的 “上传提交文件” 按钮,然后选择要上传的预测文件。
* 点击页面底部的 “提交” 按钮查看您的结果。
-![Submitting data to Kaggle](../img/kaggle_submit2.png)
+![Submitting data to Kaggle](../img/kaggle-submit2.png)
:width:`400px`
:label:`fig_kaggle_submit2`
diff --git a/chapter_multilayer-perceptrons/kaggle-house-price_origin.md b/chapter_multilayer-perceptrons/kaggle-house-price_origin.md
index f109361e1..58276499d 100644
--- a/chapter_multilayer-perceptrons/kaggle-house-price_origin.md
+++ b/chapter_multilayer-perceptrons/kaggle-house-price_origin.md
@@ -142,7 +142,7 @@ The URL is right here:
> https://www.kaggle.com/c/house-prices-advanced-regression-techniques
-![The house price prediction competition page.](../img/house_pricing.png)
+![The house price prediction competition page.](../img/house-pricing.png)
:width:`400px`
:label:`fig_house_pricing`
@@ -313,7 +313,7 @@ in the same way that we previously transformed
multiclass labels into vectors (see :numref:`subsec_classification-problem`).
For instance, "MSZoning" assumes the values "RL" and "RM".
Dropping the "MSZoning" feature,
-two new indicator features
+two new indicator features
"MSZoning_RL" and "MSZoning_RM" are created with values being either 0 or 1.
According to one-hot encoding,
if the original value of "MSZoning" is "RL",
@@ -664,7 +664,7 @@ The steps are quite simple:
* Click the “Upload Submission File” button in the dashed box at the bottom of the page and select the prediction file you wish to upload.
* Click the “Make Submission” button at the bottom of the page to view your results.
-![Submitting data to Kaggle](../img/kaggle_submit2.png)
+![Submitting data to Kaggle](../img/kaggle-submit2.png)
:width:`400px`
:label:`fig_kaggle_submit2`
diff --git a/chapter_multilayer-perceptrons/underfit-overfit.md b/chapter_multilayer-perceptrons/underfit-overfit.md
index 17a0bd07f..ac623a56b 100644
--- a/chapter_multilayer-perceptrons/underfit-overfit.md
+++ b/chapter_multilayer-perceptrons/underfit-overfit.md
@@ -93,7 +93,7 @@ $$\hat{y}= \sum_{i=0}^d x^i w_i$$
高阶多项式函数比低阶多项式函数复杂得多,因为高阶多项式具有更多的参数,而且模型函数的选择范围也更宽。修复训练数据集时,较高阶多项式函数应始终实现相对于较低度多项式的训练误差(在最坏情况下,相等)。事实上,每当每个数据点的不同值为 $x$ 时,度等于数据点数的多项式函数就可以完美地拟合训练集。我们在 :numref:`fig_capacity_vs_error` 中对多项式度与欠拟合与过拟合之间的关系进行了可视化。
-![Influence of model complexity on underfitting and overfitting](../img/capacity_vs_error.svg)
+![Influence of model complexity on underfitting and overfitting](../img/capacity-vs-error.svg)
:label:`fig_capacity_vs_error`
### 数据集大小
diff --git a/chapter_multilayer-perceptrons/underfit-overfit_origin.md b/chapter_multilayer-perceptrons/underfit-overfit_origin.md
index 95cad05bf..d5b84c3c8 100644
--- a/chapter_multilayer-perceptrons/underfit-overfit_origin.md
+++ b/chapter_multilayer-perceptrons/underfit-overfit_origin.md
@@ -361,7 +361,7 @@ can fit the training set perfectly.
We visualize the relationship between polynomial degree
and underfitting vs. overfitting in :numref:`fig_capacity_vs_error`.
-![Influence of model complexity on underfitting and overfitting](../img/capacity_vs_error.svg)
+![Influence of model complexity on underfitting and overfitting](../img/capacity-vs-error.svg)
:label:`fig_capacity_vs_error`
### Dataset Size
diff --git a/chapter_preliminaries/calculus.md b/chapter_preliminaries/calculus.md
index 6f63ff384..38fbedb55 100644
--- a/chapter_preliminaries/calculus.md
+++ b/chapter_preliminaries/calculus.md
@@ -4,7 +4,7 @@
直到至少2500年前,古希腊人把一个多边形分成三角形,并把它们的面积相加,才找到计算多边形面积的方法。
为了求出曲线形状(比如圆)的面积,古希腊人在这样的形状上刻内接多边形。如 :numref:`fig_circle_area` 所示,内接多边形的等长边越多,就越接近圆。这个过程也被称为*逼近法*(method of exhaustion)。
-![用穷举法求圆的面积。](../img/polygon_circle.svg)
+![用穷举法求圆的面积。](../img/polygon-circle.svg)
:label:`fig_circle_area`
事实上,逼近法就是 *积分*(integral calculus)(将在 :numref:`sec_integral_calculus` 中描述)的起源。2000 多年后,微积分的另一支,*微分*(differential calculus),被发明出来。在微分学最重要的应用中,优化问题考虑如何把事情做到*最好*。正如在 :numref:`subsec_norms_and_objectives` 中讨论的那样,这种问题在深度学习中是无处不在的。
diff --git a/chapter_preliminaries/calculus_origin.md b/chapter_preliminaries/calculus_origin.md
index 51ff963e1..6848c7f0f 100644
--- a/chapter_preliminaries/calculus_origin.md
+++ b/chapter_preliminaries/calculus_origin.md
@@ -14,7 +14,7 @@ As shown in :numref:`fig_circle_area`,
an inscribed polygon with more sides of equal length better approximates
the circle. This process is also known as the *method of exhaustion*.
-![Find the area of a circle with the method of exhaustion.](../img/polygon_circle.svg)
+![Find the area of a circle with the method of exhaustion.](../img/polygon-circle.svg)
:label:`fig_circle_area`
In fact, the method of exhaustion is where *integral calculus* (will be described in :numref:`sec_integral_calculus`) originates from.
diff --git a/chapter_preliminaries/probability.md b/chapter_preliminaries/probability.md
index de831076a..6535308d5 100644
--- a/chapter_preliminaries/probability.md
+++ b/chapter_preliminaries/probability.md
@@ -8,7 +8,7 @@
在前面的章节中,我们已经提到了概率,但没有明确说明它们是什么,也没有给出具体的例子。现在让我们更认真地考虑第一个例子:根据照片区分猫和狗。这听起来可能很简单,但实际上是一个艰巨的挑战。首先,问题的难度可能取决于图像的分辨率。
-![不同分辨率的图像 ($10 \times 10$, $20 \times 20$, $40 \times 40$, $80 \times 80$, 和 $160 \times 160$ pixels).](../img/cat_dog_pixels.png)
+![不同分辨率的图像 ($10 \times 10$, $20 \times 20$, $40 \times 40$, $80 \times 80$, 和 $160 \times 160$ pixels).](../img/cat-dog-pixels.png)
:width:`300px`
:label:`fig_cat_dog`
diff --git a/chapter_preliminaries/probability_origin.md b/chapter_preliminaries/probability_origin.md
index 0e567e5d5..bdfb834ea 100644
--- a/chapter_preliminaries/probability_origin.md
+++ b/chapter_preliminaries/probability_origin.md
@@ -12,7 +12,7 @@ Entire courses, majors, theses, careers, and even departments, are devoted to pr
We have already invoked probabilities in previous sections without articulating what precisely they are or giving a concrete example. Let us get more serious now by considering the first case: distinguishing cats and dogs based on photographs. This might sound simple but it is actually a formidable challenge. To start with, the difficulty of the problem may depend on the resolution of the image.
-![Images of varying resolutions ($10 \times 10$, $20 \times 20$, $40 \times 40$, $80 \times 80$, and $160 \times 160$ pixels).](../img/cat_dog_pixels.png)
+![Images of varying resolutions ($10 \times 10$, $20 \times 20$, $40 \times 40$, $80 \times 80$, and $160 \times 160$ pixels).](../img/cat-dog-pixels.png)
:width:`300px`
:label:`fig_cat_dog`
diff --git a/img/attention_output.svg b/img/attention_output.svg
deleted file mode 100644
index c5731a4a9..000000000
--- a/img/attention_output.svg
+++ /dev/null
@@ -1,250 +0,0 @@
-
-
diff --git a/img/autumn_oak.jpg b/img/autumn_oak.jpg
deleted file mode 100644
index 747fa1e41..000000000
Binary files a/img/autumn_oak.jpg and /dev/null differ
diff --git a/img/capacity_vs_error.svg b/img/capacity_vs_error.svg
deleted file mode 100644
index 543070360..000000000
--- a/img/capacity_vs_error.svg
+++ /dev/null
@@ -1,330 +0,0 @@
-
-
diff --git a/img/cat_dog_pixels.png b/img/cat_dog_pixels.png
deleted file mode 100644
index 28a331148..000000000
Binary files a/img/cat_dog_pixels.png and /dev/null differ
diff --git a/img/death_cap.jpg b/img/death_cap.jpg
deleted file mode 100644
index 2448731b6..000000000
Binary files a/img/death_cap.jpg and /dev/null differ
diff --git a/img/deeplearning_amazon.jpg b/img/deeplearning_amazon.jpg
deleted file mode 100644
index 2eca86019..000000000
Binary files a/img/deeplearning_amazon.jpg and /dev/null differ
diff --git a/img/fit_linreg.svg b/img/fit_linreg.svg
deleted file mode 100644
index 79fd4a02e..000000000
--- a/img/fit_linreg.svg
+++ /dev/null
@@ -1,31 +0,0 @@
-
-
diff --git a/img/gru_1.svg b/img/gru_1.svg
deleted file mode 100644
index 15c6afd98..000000000
--- a/img/gru_1.svg
+++ /dev/null
@@ -1,399 +0,0 @@
-
-
diff --git a/img/gru_2.svg b/img/gru_2.svg
deleted file mode 100644
index cfa73b2c6..000000000
--- a/img/gru_2.svg
+++ /dev/null
@@ -1,559 +0,0 @@
-
-
diff --git a/img/gru_3.svg b/img/gru_3.svg
deleted file mode 100644
index 2c892b36e..000000000
--- a/img/gru_3.svg
+++ /dev/null
@@ -1,644 +0,0 @@
-
-
diff --git a/img/house_pricing.png b/img/house_pricing.png
deleted file mode 100644
index b14ae9dc3..000000000
Binary files a/img/house_pricing.png and /dev/null differ
diff --git a/img/kaggle_cifar10.png b/img/kaggle_cifar10.png
deleted file mode 100644
index 5d85b5cca..000000000
Binary files a/img/kaggle_cifar10.png and /dev/null differ
diff --git a/img/kaggle_submit2.png b/img/kaggle_submit2.png
deleted file mode 100644
index aaaa0ceb0..000000000
Binary files a/img/kaggle_submit2.png and /dev/null differ
diff --git a/img/lstm_0.svg b/img/lstm_0.svg
deleted file mode 100644
index 3e47b58a3..000000000
--- a/img/lstm_0.svg
+++ /dev/null
@@ -1,464 +0,0 @@
-
-
diff --git a/img/lstm_1.svg b/img/lstm_1.svg
deleted file mode 100644
index d448e0f51..000000000
--- a/img/lstm_1.svg
+++ /dev/null
@@ -1,517 +0,0 @@
-
-
diff --git a/img/lstm_2.svg b/img/lstm_2.svg
deleted file mode 100644
index c32e2271e..000000000
--- a/img/lstm_2.svg
+++ /dev/null
@@ -1,644 +0,0 @@
-
-
diff --git a/img/lstm_3.svg b/img/lstm_3.svg
deleted file mode 100644
index bf4094564..000000000
--- a/img/lstm_3.svg
+++ /dev/null
@@ -1,672 +0,0 @@
-
-
diff --git a/img/mutual_information.svg b/img/mutual_information.svg
deleted file mode 100644
index 79c9f5128..000000000
--- a/img/mutual_information.svg
+++ /dev/null
@@ -1,15 +0,0 @@
-
-
diff --git a/img/nli_attention.svg b/img/nli_attention.svg
deleted file mode 100644
index 536347de0..000000000
--- a/img/nli_attention.svg
+++ /dev/null
@@ -1,629 +0,0 @@
-
-
diff --git a/img/polygon_circle.svg b/img/polygon_circle.svg
deleted file mode 100644
index 2c88289b7..000000000
--- a/img/polygon_circle.svg
+++ /dev/null
@@ -1,13 +0,0 @@
-
-
diff --git a/img/positional_encoding.svg b/img/positional_encoding.svg
deleted file mode 100644
index 27d9155b3..000000000
--- a/img/positional_encoding.svg
+++ /dev/null
@@ -1,366 +0,0 @@
-
-
diff --git a/img/seq2seq_attention.svg b/img/seq2seq_attention.svg
deleted file mode 100644
index 2ef7a0bd8..000000000
--- a/img/seq2seq_attention.svg
+++ /dev/null
@@ -1,446 +0,0 @@
-
-
diff --git a/img/seq2seq_predict.svg b/img/seq2seq_predict.svg
deleted file mode 100644
index e8bae0d2f..000000000
--- a/img/seq2seq_predict.svg
+++ /dev/null
@@ -1,259 +0,0 @@
-
-
diff --git a/img/statistical_significance.svg b/img/statistical_significance.svg
deleted file mode 100644
index bf7f8285e..000000000
--- a/img/statistical_significance.svg
+++ /dev/null
@@ -1,434 +0,0 @@
-
-
diff --git a/img/trans_conv.svg b/img/trans_conv.svg
deleted file mode 100644
index e21cd4dc8..000000000
--- a/img/trans_conv.svg
+++ /dev/null
@@ -1,266 +0,0 @@
-
-
diff --git a/img/turing_processing_block.png b/img/turing_processing_block.png
deleted file mode 100644
index 276528e99..000000000
Binary files a/img/turing_processing_block.png and /dev/null differ
diff --git a/img/ubuntu_new.png b/img/ubuntu_new.png
deleted file mode 100644
index 13c6e0ef6..000000000
Binary files a/img/ubuntu_new.png and /dev/null differ