Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Background: SGD in MPC-ML doc in Chinese #1052

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 67 additions & 55 deletions docs/locales/zh_CN/LC_MESSAGES/development.po
Original file line number Diff line number Diff line change
Expand Up @@ -1974,7 +1974,7 @@ msgstr ""

#: ../../development/policy_sgd_insight.rst:2
msgid "Background: SGD in MPC-ML"
msgstr ""
msgstr "背景:随机梯度下降法在MPC-ML"

#: ../../development/policy_sgd_insight.rst:4
msgid ""
Expand All @@ -1988,64 +1988,68 @@ msgid ""
"learning_rate), we can find that it seems safer to use small batch_size "
"and learning_rate, else we get some loss or very strong vibration of auc "
"within 10 epochs(we leave 10000 samples as test dataset randomly)."
msgstr ""
msgstr "随机梯度下降法(SGD)是一个著名的优化算法,它沿着负梯度方向来更新模型的参数。然而,这种方法要求使用者仔细地选择超参数。"
"网格搜索虽然可以解决此问题,但在实际应用中训练耗时过长,因此并不实用。"
"例如,在 credit card 数据集上使用随机梯度下降法训练线性回归模型时(批量大小为 4096,学习率为 0.1),"
"我们发现,若批量大小和学习率设置不当,在 10 个轮次内损失值会偏大,且 AUC 值波动剧烈。(随机选择 10000 个样本作为测试数据集)。"

#: ../../development/policy_sgd_insight.rst:12
msgid ""
"Unfortunately, when under MPC, even simple algorithm like SSLR, small "
"batch_size leads to huge training time under limited network resources. "
"Besides, even you have high bandwidth, small batch_size can not utilize "
"it!"
msgstr ""
msgstr "但是,但处于MPC模式下,即使简单的算法,例如SSLR,小的批量大小在网络资源有限的环境下也会导致训练时间的大幅增加。而且,即使你拥有较高的网络带宽,小的批量也无法充分利用带宽资源。"

#: ../../development/policy_sgd_insight.rst:16
msgid "How to improve SGD"
msgstr ""
msgstr "如何优化 SGD 方法"

#: ../../development/policy_sgd_insight.rst:17
msgid ""
"Indeed, after our elaborated experiments, we can find two drawbacks of "
"naive SGD:"
msgstr ""
msgstr "在我们的全面的实验下,我们发现朴素 SGD 方法的两个缺点"

#: ../../development/policy_sgd_insight.rst:19
msgid "slow update(because of small learning_rate) at the beginning."
msgstr ""
msgstr "起始阶段,参数更新缓慢是由于学习率较小造成的。"

#: ../../development/policy_sgd_insight.rst:21
msgid "vibration happens when near to optimal point."
msgstr ""
msgstr "当参数接近最优点时,模型的表现开始波动。"

#: ../../development/policy_sgd_insight.rst:23
msgid ""
"So, it's a straight strategy to use \"large\" learning_rate at the "
"beginning, and \"slow down\" as training goes on. But the ensuing "
"question is how to determine specific \"large\" and \"slow down\" for "
"different datasets."
msgstr ""
msgstr "所以,应该在起步阶段使用较大的学习率,随着训练的进行,逐渐减小学习率。问题是,对于不同的数据集,如何确定“较大的”和“逐渐减小”的具体情况。"

#: ../../development/policy_sgd_insight.rst:27
msgid "What's Policy sgd"
msgstr ""
msgstr "什么是Policy sgd"

#: ../../development/policy_sgd_insight.rst:28
msgid ""
"For SSLR, we provide an answer to the above question: policy-sgd(already "
"implemented on `SSRegression` in Secretflow when setting "
"`strategy=policy_sgd`)."
msgstr ""
msgstr "对于 SSLR 算法,我们已经在隐语 Secretflow 中实现了 policy-sgd 方法。(在 SSRegression 中设置 strategy=policy_sgd)"

#: ../../development/policy_sgd_insight.rst:33
msgid "The core of this optimizer consists of two parts:"
msgstr ""
msgstr "policy-sgd 优化算法的核心包含两部分"

#: ../../development/policy_sgd_insight.rst:35
#, python-brace-format
msgid ""
"1. Adaptive learning rate scaling mechanism: in first epoch, we force the"
" weight move unit-norm in gradient direction and record "
":math:`\\frac{1}{||grad||_2}` as scale factor for this batch."
msgstr ""
msgstr "自适应学习率缩放机制:在第一个轮次(epoch)中,我们强制权重在梯度方向上移动单位范数,并将 "
":math:`\\frac{1}{||grad||_2}` 记录为该批次的缩放因子。"

#: ../../development/policy_sgd_insight.rst:38
msgid ""
Expand All @@ -2054,11 +2058,12 @@ msgid ""
"based(use Taylor expansion to avoid invoking time-consuming op like "
"`exp`, `log`) and weight based, and choose the latter in our final "
"implementation."
msgstr ""
msgstr "学习率衰减与早停策略:我们采用步长衰减(step-decay)策略进行学习率衰减。"
"关于早停,我们比较了基于损失(利用泰勒展开避免指数、对数等耗时运算)和基于权重的两种策略,最终选择了后者。"

#: ../../development/policy_sgd_insight.rst:44
msgid "Experiments"
msgstr ""
msgstr "实验"

#: ../../development/policy_sgd_insight.rst:45
#, python-brace-format
Expand All @@ -2072,105 +2077,107 @@ msgid ""
" basic one-hot and min-max normalization like normal LR needs. We leave "
"out about :math:`\\frac{1}{3}` data for test data and evaluate auc on it."
" The baseline auc is computed when using sklearn for model training."
msgstr ""
msgstr "我们使用四个数据集进行实验,包含三个开源数据集(credit_card, Bank Marketing, Aps)和一个工业界数据集(wzdata)。"
"对于开源数据集,我们仅仅做一些基本的 one-hot 和 min-max 的归一化操作,以此来满足线性回归模型的需要。"
"我们留出三分之一的数据作为测试数据集,并在其上评估了 auc 指标作为基线。我们使用 sklearn 进行模型训练,并计算 auc。"

#: ../../development/policy_sgd_insight.rst:53
#: ../../development/policy_sgd_insight.rst:107
msgid "Dataset"
msgstr ""
msgstr "数据集"

#: ../../development/policy_sgd_insight.rst:53
msgid "Training shape"
msgstr ""
msgstr "训练集的维度"

#: ../../development/policy_sgd_insight.rst:53
msgid "baseline auc"
msgstr ""
msgstr "auc"

#: ../../development/policy_sgd_insight.rst:55
#: ../../development/policy_sgd_insight.rst:109
msgid "business dataset"
msgstr ""
msgstr "wzdata"

#: ../../development/policy_sgd_insight.rst:55
msgid "111618,23"
msgstr ""
msgstr "111618,23"

#: ../../development/policy_sgd_insight.rst:55
msgid "0.8175"
msgstr ""
msgstr "0.8175"

#: ../../development/policy_sgd_insight.rst:57
#: ../../development/policy_sgd_insight.rst:111
msgid "Bank Marketing"
msgstr ""
msgstr "Bank Marketing"

#: ../../development/policy_sgd_insight.rst:57
msgid "40787,48"
msgstr ""
msgstr "40787,48"

#: ../../development/policy_sgd_insight.rst:57
msgid "0.93"
msgstr ""
msgstr "0.93"

#: ../../development/policy_sgd_insight.rst:59
#: ../../development/policy_sgd_insight.rst:113
msgid "credit_card"
msgstr ""
msgstr "credit_card"

#: ../../development/policy_sgd_insight.rst:59
msgid "20000, 23"
msgstr ""
msgstr "20000, 23"

#: ../../development/policy_sgd_insight.rst:59
msgid "0.718"
msgstr ""
msgstr "0.718"

#: ../../development/policy_sgd_insight.rst:61
#: ../../development/policy_sgd_insight.rst:115
msgid "Aps"
msgstr ""
msgstr "Aps"

#: ../../development/policy_sgd_insight.rst:61
msgid "60000,170"
msgstr ""
msgstr "60000,170"

#: ../../development/policy_sgd_insight.rst:61
msgid "0.9666"
msgstr ""
msgstr "0.9666"

#: ../../development/policy_sgd_insight.rst:65
msgid "Precision"
msgstr ""
msgstr "精确度"

#: ../../development/policy_sgd_insight.rst:67
msgid ""
"We first test how precision of fixed point influence SSLR and test three "
"settings:"
msgstr ""
msgstr "我们首先测试定点精度如何影响 SSLR,并测试了三种设置:"

#: ../../development/policy_sgd_insight.rst:69
msgid "low precision: `FM64` + 18 fxp"
msgstr ""
msgstr "低精度:64 位定点数格式(FM64)加上 18 位小数部分(fxp)"

#: ../../development/policy_sgd_insight.rst:71
msgid "medium precision: `FM128` + 28 fxp"
msgstr ""
msgstr "中精度:128 位定点数格式(FM128)加上 28 位小数部分(fxp)"

#: ../../development/policy_sgd_insight.rst:73
msgid "high precision: `FM128` + 42 fxp"
msgstr ""
msgstr "高精度:128 位定点数格式(FM128)加上 42 位小数部分(fxp)"

#: ../../development/policy_sgd_insight.rst:81
msgid ""
"We can find that for both optimizer, precision has little influence on "
"final auc, so it's safe for user to choose low precision when training "
"LR."
msgstr ""
msgstr "我们可以发现,对于这两种优化器而言,精度对 auc 影响很小,因此用户在训练线性回归(LR)模型时选择低精度是安全的。"

#: ../../development/policy_sgd_insight.rst:85
msgid "Naive v.s Policy"
msgstr ""
msgstr "Naive sgd 与 Policy sgd 对比"

#: ../../development/policy_sgd_insight.rst:87
msgid ""
Expand All @@ -2184,15 +2191,18 @@ msgid ""
" for naive_sgd). But for policy_sgd, it's often harmless to use larger "
"batch_size(2048 for these experiments),and we set learning_rate decay a "
"half every 2 epochs."
msgstr ""
msgstr "我们对比了 naive-sgd (v1) 和 policy-sgd (v2) 的总运行时间。"
"对于 naive-sgd,我们遵循“稳妥策略”(常用于明文机器学习):采用较小的学习率(如 0.1)和较小的批量大小(如 1024,若使用 2048 则部分数据无法收敛)。"
"此外,naive-sgd 难以确定合适的默认早停值(设置不当可能导致 AUC 大幅下降)。为避免繁琐的网格搜索,naive-sgd 运行时不采用任何学习率衰减(推荐做法)。"
"但对于 policy-sgd,使用较大的批量大小通常无害(本实验中为 2048),且每 2 个轮次学习率减半。"

#: ../../development/policy_sgd_insight.rst:93
msgid ""
"As for other hyper-parameters, we set total running epochs to 20, "
"learning_rate to 0.1 and use low precision, CHEETAH protocol. And we test"
" in WAN, providing 20Mbps and 20ms RTT, which is a typical setting in "
"real world project."
msgstr ""
msgstr "至于其他超参数,我们将总运行轮次设置为 20,学习率设置为 0.1,并使用低精度、CHEETAH协议。我们在广域网中进行测试,提供 20Mbps 的带宽以及 20 毫秒的往返时间(RTT),这是现实世界项目中的一种典型设置。"

#: ../../development/policy_sgd_insight.rst:99
msgid ""
Expand All @@ -2202,71 +2212,73 @@ msgid ""
"means policy_sgd meets the stop criterion based on loss, similar for "
"purple line) after the model converges. Besides, checking the auc of "
"stopping time, it has very low gap(<0.01) between baseline."
msgstr ""
msgstr "首先,我们发现 naive-sgd (v1) 在四个数据集上均未能在 20 轮内满足早停条件(因此图中未标出)。"
"而 policy-sgd (v2) 在所有四个数据集上均能在模型收敛后实现“早停”(红色点划线表示满足基于损失的早停标准,紫色表示基于权重的早停标准)。"
"此外,早停时的 AUC 值与基线差距极小(<0.01)。"

#: ../../development/policy_sgd_insight.rst:103
msgid ""
"The following table shows the total running time of policy_sgd and "
"naive_sgd(based on weight early stop). Policy_sgd can reduce the time by "
"2-5 times compared to naive_sgd."
msgstr ""
msgstr "下方的表格展示了两种方法的运行时间,其中 policy-sgd 是基于 weight 进行早停判断的。 policy-sgd 相比 naive-sgd 在时间上缩短了 2-5 倍。"

#: ../../development/policy_sgd_insight.rst:107
msgid "naive_sgd(s)"
msgstr ""
msgstr "naive_sgd(s)"

#: ../../development/policy_sgd_insight.rst:107
msgid "policy_sgd(s)"
msgstr ""
msgstr "policy_sgd(s)"

#: ../../development/policy_sgd_insight.rst:107
msgid "naive/policy"
msgstr ""
msgstr "naive/policy"

#: ../../development/policy_sgd_insight.rst:109
msgid "~8000"
msgstr ""
msgstr "~8000"

#: ../../development/policy_sgd_insight.rst:109
#: ../../development/policy_sgd_insight.rst:113
msgid "~1600"
msgstr ""
msgstr "~1600"

#: ../../development/policy_sgd_insight.rst:109
msgid "5x"
msgstr ""
msgstr "5x"

#: ../../development/policy_sgd_insight.rst:111
msgid "~3000"
msgstr ""
msgstr "~3000"

#: ../../development/policy_sgd_insight.rst:111
msgid "~800"
msgstr ""
msgstr "~800"

#: ../../development/policy_sgd_insight.rst:111
msgid "3.75x"
msgstr ""
msgstr "3.75x"

#: ../../development/policy_sgd_insight.rst:113
msgid "~350"
msgstr ""
msgstr "~350"

#: ../../development/policy_sgd_insight.rst:113
msgid "4.57x"
msgstr ""
msgstr "4.57x"

#: ../../development/policy_sgd_insight.rst:115
msgid "~10000"
msgstr ""
msgstr "~10000"

#: ../../development/policy_sgd_insight.rst:115
msgid "~4200"
msgstr ""
msgstr "~4200"

#: ../../development/policy_sgd_insight.rst:115
msgid "2.38x"
msgstr ""
msgstr "2.38x"

#: ../../development/runtime.rst:2
msgid "SPU Runtime"
Expand Down
Loading
Loading