Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update neural search readme and Add Paddle Serving Support #1558

Merged
merged 27 commits into from
Jan 13, 2022

Conversation

w5688414
Copy link
Contributor

@w5688414 w5688414 commented Jan 5, 2022

PR types

New features and Bug fixes

PR changes

Docs

Description

Add Paddle Serving Support
Update readme

Copy link
Member

@ZeyuChen ZeyuChen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确认下提交代码是否有自动yapf格式化

@@ -447,6 +451,58 @@ sh deploy.sh
[0.959269642829895, 0.04725276678800583]
```

### Paddle Serving部署

首先把PaddleInference转换成Serving的格式:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把paddle infernece转换这句话不准确。
应该是将静态图模型转换成Serving格式

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经修改

@@ -0,0 +1,32 @@
#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程,每个进程内构建grpcSever和DAG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些中文注释是我们额外加的还是paddle serving自带的?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

中文注释是serving的examples里面有的,我只是修改了其中少量的参数

rpc_port: 9998
op:
bert:
#并发数,is_thread_op=True时,为线程并发;否则为进程并发
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释 #后面需要带一个空格

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经修改


def postprocess(self, input_dicts, fetch_dict, data_id, log_id):
new_dict = {}
new_dict["elementwise_div_1"] = str(fetch_dict["elementwise_div_1"].tolist())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些地方奇怪的变量名操作,给人开发体验确实不太好

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

基本属于我们不写的话,开发者不可能自行搞明白

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经修改,这个要在导出的serving文件里面修改,或者在导出serving格式文件的时候进行指定

): fn(samples)
input_ids, segment_ids = batchify_fn(examples)
feed_dict = {}
feed_dict['input_ids']=input_ids
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你的代码是不是都自动过yapf就提交上来了,格式化都不准确

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改过来了,现在已经格式化

@tianxin1860 tianxin1860 self-requested a review January 5, 2022 16:30
Copy link

@tianxin1860 tianxin1860 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave some comment

|—— deploy
|—— python
|—— predict.py # PaddleInference
|—— deploy.sh # Paddle Inference部署脚本
|—— inference.py # 动态图抽取向量

|—— export_to_serving.py # 静态图转Serving

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目录结构咱们整体再讨论下,serving 相关代码是否放在 deploy 目录下语义更清楚一些?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经调整


|—— export_to_serving.py # 静态图转Serving
|—— rpc_client.py # Paddle Serving的Client端
|—— web_service.py # Paddle Serving的 Serving端

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

中英字符间的空格统一吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经调整

--params_filename "inference.get_pooled_embedding.pdiparams" \
--server_path "./serving_server" \
--client_path "./serving_client" \
--fetch_alias_names "output_embed"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里需要补充下对 export_to_serving.py 各参数的含义说明用户更容易理解一些。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经修改


启动客户端调用 Server。

首先修改需要预测的样本,并把它放入到 feed 字典中:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改需要预测的样本 是什么意思?做了什么修改?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经修改

Comment on lines 21 to 24
parser.add_argument("--model_filename", type=str, required=True,
default='inference.get_pooled_embedding.pdmodel', help="The path to model parameters to be loaded.")
parser.add_argument("--params_filename", type=str, required=True,
default='inference.get_pooled_embedding.pdiparams', help="The path to model parameters to be loaded.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这 2 个参数的 help 描述不对吧?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经修改

feed["0"] = "国有企业引入非国有资本对创新绩效的影响——基于制造业国有上市公司的经验证据"
feed["1"] = "试论翻译过程中的文化差异与语言空缺翻译过程,文化差异,语言空缺,文化对比"
print(feed)
ret = client.predict(feed_dict=feed, fetch=["res"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么 client 发送的数据必须是字典形式?而不是 List[String] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

代码做了调整

return result


class BertOp(Op):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BertOp 这个类名可能需要改一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经修改

Copy link

@tianxin1860 tianxin1860 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave some comments

feed[str(i)] = item

print(feed)
ret = client.predict(feed_dict=feed, fetch=["res"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

res 的命名是在哪里确定的?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除,测试了一下,可以不加

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_server.web_service import WebService, Op

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

第三方库的 import 应该在系统库 import 下面,并空行分开

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经修改

self.tokenizer = ppnlp.transformers.ErnieTokenizer.from_pretrained(
'ernie-1.0')

def preprocess(self, input_dicts, data_id, log_id):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_id、log_id 看起来为被使用?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preprocess 的参数是前继 Channel 中的数据 input_dicts,该变量(作为一个 sample)是一个以前继 OP 的 name 为 Key,对应 OP 的输出为 Value 的字典。

process 的参数是 Paddle Serving Client 预测接口的输入变量 fetch_dict_list(preprocess 函数的返回值的列表),该变量(作为一个 batch)是一个列表,列表中的元素为以 feed_name 为 Key,对应 ndarray 格式的数据为 Value 的字典。typical_logid 作为向 PaddleServingService 穿透的 logid。

postprocess 的参数是 input_dicts 和 fetch_dict,input_dicts 与 preprocess 的参数一致,fetch_dict (作为一个 sample)是 process 函数的返回 batch 中的一个 sample(如果没有执行 process ,则该值为 preprocess 的返回值)。

文档在这里:
https://github.com/PaddlePaddle/Serving/blob/c14a765892a5624111408147d8ec3799aa84ad49/doc/Python_Pipeline/Pipeline_Design_CN.md


def postprocess(self, input_dicts, fetch_dict, data_id, log_id):
new_dict = {}
new_dict["output_embed"] = str(fetch_dict["output_embed"].tolist())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_embed 变量名在哪里确定?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在export_to_serving.py输入时指定,也可以选择默认,详细请看readme

'ernie-1.0')

def preprocess(self, input_dicts, data_id, log_id):
from paddlenlp.data import Stack, Tuple, Pad

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import 为什么放在这里?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

多进程测试有问题,因为这里会进入子进程,paddlenlp的一些操作调用了Paddle,在子进程里需要禁用Paddle。这样做了之后就可以避免多进程的问题



class ErnieService(WebService):
def get_pipeline_response(self, read_op):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这是 paddle_serving 约定的服务类必须提供的接口函数么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebService作为基类,提供将用户接受的HTTP请求转化为RPC输入的接口preprocess,同时提供对RPC请求返回的结果进行后处理的接口postprocess,继承WebService的子类,可以定义各种类型的成员函数。WebService的启动命令和普通RPC服务提供的启动API一致,重写preprocess和postprocess接口,实现模型预测前、预测后处理方法即可。

详情轻参考文档:https://github.com/PaddlePaddle/Serving/blob/c14a765892a5624111408147d8ec3799aa84ad49/doc/Serving_Design_CN.md

tianxin1860
tianxin1860 previously approved these changes Jan 13, 2022
Copy link

@tianxin1860 tianxin1860 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@tianxin1860 tianxin1860 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tianxin1860 tianxin1860 merged commit 6213573 into PaddlePaddle:develop Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants