Update neural search readme and Add Paddle Serving Support #1558

w5688414 · 2022-01-05T15:28:51Z

PR types

New features and Bug fixes

PR changes

Docs

Description

Add Paddle Serving Support
Update readme

…nto develop

…develop

ZeyuChen

确认下提交代码是否有自动yapf格式化

ZeyuChen · 2022-01-05T15:31:20Z

applications/neural_search/recall/in_batch_negative/README.md

@@ -447,6 +451,58 @@ sh deploy.sh
 [0.959269642829895, 0.04725276678800583]
 ```

+### Paddle Serving部署
+
+首先把PaddleInference转换成Serving的格式：


把paddle infernece转换这句话不准确。
应该是将静态图模型转换成Serving格式

已经修改

ZeyuChen · 2022-01-05T15:32:41Z

applications/neural_search/recall/in_batch_negative/config_nlp.yml

@@ -0,0 +1,32 @@
+#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG


这些中文注释是我们额外加的还是paddle serving自带的？

中文注释是serving的examples里面有的，我只是修改了其中少量的参数

ZeyuChen · 2022-01-05T15:32:57Z

applications/neural_search/recall/in_batch_negative/config_nlp.yml

+rpc_port: 9998
+op:
+  bert:
+    #并发数，is_thread_op=True时，为线程并发；否则为进程并发


注释 #后面需要带一个空格

已经修改

ZeyuChen · 2022-01-05T15:34:09Z

applications/neural_search/recall/in_batch_negative/web_service.py

+
+    def postprocess(self, input_dicts, fetch_dict, data_id, log_id):
+        new_dict = {}
+        new_dict["elementwise_div_1"] = str(fetch_dict["elementwise_div_1"].tolist())


这些地方奇怪的变量名操作，给人开发体验确实不太好

基本属于我们不写的话，开发者不可能自行搞明白

已经修改，这个要在导出的serving文件里面修改，或者在导出serving格式文件的时候进行指定

ZeyuChen · 2022-01-05T15:36:29Z

applications/neural_search/recall/in_batch_negative/web_service.py

+        ): fn(samples)
+        input_ids, segment_ids = batchify_fn(examples)
+        feed_dict = {}
+        feed_dict['input_ids']=input_ids


你的代码是不是都自动过yapf就提交上来了，格式化都不准确

修改过来了，现在已经格式化

tianxin1860

Leave some comment

tianxin1860 · 2022-01-06T08:03:57Z

applications/neural_search/recall/in_batch_negative/README.md

 |—— deploy
    |—— python
        |—— predict.py # PaddleInference
        |—— deploy.sh # Paddle Inference部署脚本
 |—— inference.py # 动态图抽取向量
-
+|—— export_to_serving.py # 静态图转Serving


目录结构咱们整体再讨论下，serving 相关代码是否放在 deploy 目录下语义更清楚一些？

已经调整

tianxin1860 · 2022-01-06T08:04:44Z

applications/neural_search/recall/in_batch_negative/README.md

-
+|—— export_to_serving.py # 静态图转Serving
+|—— rpc_client.py # Paddle Serving的Client端
+|—— web_service.py # Paddle Serving的 Serving端


中英字符间的空格统一吧

已经调整

tianxin1860 · 2022-01-06T08:06:29Z

applications/neural_search/recall/in_batch_negative/README.md

+    --params_filename "inference.get_pooled_embedding.pdiparams" \
+    --server_path "./serving_server" \
+    --client_path "./serving_client" \
+    --fetch_alias_names "output_embed"


这里需要补充下对 export_to_serving.py 各参数的含义说明用户更容易理解一些。

已经修改

tianxin1860 · 2022-01-06T08:08:45Z

applications/neural_search/recall/in_batch_negative/README.md

+
+启动客户端调用 Server。
+
+首先修改需要预测的样本，并把它放入到 feed 字典中：


修改需要预测的样本 是什么意思？做了什么修改？

已经修改

tianxin1860 · 2022-01-06T08:11:21Z

applications/neural_search/recall/in_batch_negative/export_to_serving.py

+parser.add_argument("--model_filename", type=str, required=True,
+                    default='inference.get_pooled_embedding.pdmodel', help="The path to model parameters to be loaded.")
+parser.add_argument("--params_filename", type=str, required=True,
+                    default='inference.get_pooled_embedding.pdiparams', help="The path to model parameters to be loaded.")


这 2 个参数的 help 描述不对吧？

已经修改

tianxin1860 · 2022-01-06T08:14:26Z

applications/neural_search/recall/in_batch_negative/rpc_client.py

+feed["0"] = "国有企业引入非国有资本对创新绩效的影响——基于制造业国有上市公司的经验证据"
+feed["1"] = "试论翻译过程中的文化差异与语言空缺翻译过程,文化差异,语言空缺,文化对比"
+print(feed)
+ret = client.predict(feed_dict=feed, fetch=["res"])


为什么 client 发送的数据必须是字典形式？而不是 List[String] ?

代码做了调整

tianxin1860 · 2022-01-06T08:15:39Z

applications/neural_search/recall/in_batch_negative/web_service.py

+    return result
+
+
+class BertOp(Op):


BertOp 这个类名可能需要改一下

已经修改

tianxin1860

Leave some comments

tianxin1860 · 2022-01-11T03:35:51Z

applications/neural_search/recall/in_batch_negative/deploy/python/rpc_client.py

+    feed[str(i)] = item
+
+print(feed)
+ret = client.predict(feed_dict=feed, fetch=["res"])


res 的命名是在哪里确定的？

已删除，测试了一下，可以不加

tianxin1860 · 2022-01-11T03:36:58Z

applications/neural_search/recall/in_batch_negative/deploy/python/web_service.py

+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from paddle_serving_server.web_service import WebService, Op


第三方库的 import 应该在系统库 import 下面，并空行分开

已经修改

tianxin1860 · 2022-01-11T05:07:03Z

applications/neural_search/recall/in_batch_negative/deploy/python/web_service.py

+        self.tokenizer = ppnlp.transformers.ErnieTokenizer.from_pretrained(
+            'ernie-1.0')
+
+    def preprocess(self, input_dicts, data_id, log_id):


data_id、log_id 看起来为被使用？

preprocess 的参数是前继 Channel 中的数据 input_dicts，该变量（作为一个 sample）是一个以前继 OP 的 name 为 Key，对应 OP 的输出为 Value 的字典。 process 的参数是 Paddle Serving Client 预测接口的输入变量 fetch_dict_list（preprocess 函数的返回值的列表），该变量（作为一个 batch）是一个列表，列表中的元素为以 feed_name 为 Key，对应 ndarray 格式的数据为 Value 的字典。typical_logid 作为向 PaddleServingService 穿透的 logid。 postprocess 的参数是 input_dicts 和 fetch_dict，input_dicts 与 preprocess 的参数一致，fetch_dict （作为一个 sample）是 process 函数的返回 batch 中的一个 sample（如果没有执行 process ，则该值为 preprocess 的返回值）。

文档在这里：
https://github.com/PaddlePaddle/Serving/blob/c14a765892a5624111408147d8ec3799aa84ad49/doc/Python_Pipeline/Pipeline_Design_CN.md

tianxin1860 · 2022-01-11T05:09:33Z

applications/neural_search/recall/in_batch_negative/deploy/python/web_service.py

+
+    def postprocess(self, input_dicts, fetch_dict, data_id, log_id):
+        new_dict = {}
+        new_dict["output_embed"] = str(fetch_dict["output_embed"].tolist())


output_embed 变量名在哪里确定？

在export_to_serving.py输入时指定，也可以选择默认，详细请看readme

tianxin1860 · 2022-01-11T05:14:27Z

applications/neural_search/recall/in_batch_negative/deploy/python/web_service.py

+            'ernie-1.0')
+
+    def preprocess(self, input_dicts, data_id, log_id):
+        from paddlenlp.data import Stack, Tuple, Pad


import 为什么放在这里？

多进程测试有问题，因为这里会进入子进程，paddlenlp的一些操作调用了Paddle，在子进程里需要禁用Paddle。这样做了之后就可以避免多进程的问题

tianxin1860 · 2022-01-11T05:16:05Z

applications/neural_search/recall/in_batch_negative/deploy/python/web_service.py

+
+
+class ErnieService(WebService):
+    def get_pipeline_response(self, read_op):


这是 paddle_serving 约定的服务类必须提供的接口函数么？

WebService作为基类，提供将用户接受的HTTP请求转化为RPC输入的接口preprocess，同时提供对RPC请求返回的结果进行后处理的接口postprocess，继承WebService的子类，可以定义各种类型的成员函数。WebService的启动命令和普通RPC服务提供的启动API一致，重写preprocess和postprocess接口，实现模型预测前、预测后处理方法即可。

详情轻参考文档：https://github.com/PaddlePaddle/Serving/blob/c14a765892a5624111408147d8ec3799aa84ad49/doc/Serving_Design_CN.md

tianxin1860

LGTM

tianxin1860

LGTM

w5688414 and others added 18 commits December 23, 2021 16:42

add recall inference similarity

c88bcf9

update examples

2c7fe5b

updatea readme

3624a3c

update dir name

ea1587a

fix conflicts

06af5c8

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

dece886

…nto develop

update neural search readme

7eeecd6

update milvus readme

68f2f02

update domain adaptive pretraining readme

7017194

Merge branch 'develop' into develop

9a2fc64

fix the mistakes

567c70c

Merge branch 'develop' of https://github.com/w5688414/PaddleNLP into …

7fd2e43

…develop

Merge branch 'develop' into develop

413d3d1

update readme

195a4c9

Merge branch 'develop' of https://github.com/w5688414/PaddleNLP into …

002432f

…develop

Merge branch 'PaddlePaddle:develop' into develop

386d02f

add recall Paddle Serving Support

a8d8572

update readme

f661c3b

ZeyuChen reviewed Jan 5, 2022

View reviewed changes

tianxin1860 self-requested a review January 5, 2022 16:30

update readme and format the code

703bdd7

tianxin1860 reviewed Jan 6, 2022

View reviewed changes

w5688414 added 5 commits January 6, 2022 19:45

reformat the files

e00d834

move the files

d9753cc

fix conflicts and update readme

9bf791e

fix the conflicts

67fba60

reformat the code

b306873

tianxin1860 reviewed Jan 11, 2022

View reviewed changes

w5688414 added 2 commits January 12, 2022 21:17

remove redundant code

32592e3

fix conflicts

e52545c

tianxin1860 previously approved these changes Jan 13, 2022

View reviewed changes

Merge branch 'develop' into develop

6a989a0

tianxin1860 dismissed their stale review via 6a989a0 January 13, 2022 06:02

tianxin1860 approved these changes Jan 13, 2022

View reviewed changes

tianxin1860 merged commit 6213573 into PaddlePaddle:develop Jan 13, 2022

w5688414 mentioned this pull request Jan 25, 2022

PaddleNLP 2.2.4 Release Note Candidate #1614

Closed

		@@ -0,0 +1,32 @@
		#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG


		启动客户端调用 Server。

		首先修改需要预测的样本，并把它放入到 feed 字典中：



		class ErnieService(WebService):
		def get_pipeline_response(self, read_op):

Update neural search readme and Add Paddle Serving Support #1558

Update neural search readme and Add Paddle Serving Support #1558

Conversation

w5688414 commented Jan 5, 2022

PR types

PR changes

Description

ZeyuChen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tianxin1860 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tianxin1860 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tianxin1860 left a comment

Choose a reason for hiding this comment

tianxin1860 left a comment

Choose a reason for hiding this comment