Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UIE-M #3192

Merged
merged 8 commits into from
Sep 5, 2022
Merged

Add UIE-M #3192

merged 8 commits into from
Sep 5, 2022

Conversation

linjieccc
Copy link
Contributor

@linjieccc linjieccc commented Sep 5, 2022

PR types

New features

PR changes

Models

Description

  • 新增uie-m-baseuie-m-large支持中英文混合抽取,调用示例:

    >>> from pprint import pprint
    >>> from paddlenlp import Taskflow
    
    >>> schema = ['Time', 'Player', 'Competition', 'Score']
    >>> ie = Taskflow('information_extraction', schema=schema, model="uie-m-base", schema_lang="en")
    >>> pprint(ie(["2月8日上午北京冬奥会自由式滑雪女子大跳台决赛中中国选手谷爱凌以188.25分获得金牌!", "Rafael Nadal wins French Open Final!"]))
    [{'Competition': [{'end': 23,
                      'probability': 0.9373889907291257,
                      'start': 6,
                      'text': '北京冬奥会自由式滑雪女子大跳台决赛'}],
      'Player': [{'end': 31,
                  'probability': 0.6981119555336441,
                  'start': 28,
                  'text': '谷爱凌'}],
      'Score': [{'end': 39,
                'probability': 0.9888507878270296,
                'start': 32,
                'text': '188.25分'}],
      'Time': [{'end': 6,
                'probability': 0.9784080036931151,
                'start': 0,
                'text': '2月8日上午'}]},
    {'Competition': [{'end': 35,
                      'probability': 0.9851549932171295,
                      'start': 18,
                      'text': 'French Open Final'}],
      'Player': [{'end': 12,
                  'probability': 0.9379371275888104,
                  'start': 0,
                  'text': 'Rafael Nadal'}]}]
  • 模型结构:

    模型 结构 语言
    uie-m-large 24-layers, 1024-hidden, 16-heads 中、英文
    uie-m-base 12-layers, 768-hidden, 12-heads 中、英文
  • 模型指标:

    金融医疗互联网
    0-shot5-shot0-shot5-shot0-shot5-shot
    uie-base (12L768H)46.4370.9271.8385.7278.3381.86
    uie-medium (6L768H)41.1164.5365.4075.7278.3279.68
    uie-mini (6L384H)37.0464.6560.5078.3672.0976.38
    uie-micro (4L384H)37.5362.1157.0475.9266.0070.22
    uie-nano (4L312H)38.9466.8348.2976.7462.8672.35
    uie-m-large (24L1024H)49.3574.5570.5092.6678.4983.02
    uie-m-base (12L768H)38.4674.3163.3787.3276.2780.13

@linjieccc linjieccc self-assigned this Sep 5, 2022
@linjieccc linjieccc added ie Issues related to Information Extraction taskflow Taskflow labels Sep 5, 2022
@@ -886,6 +888,41 @@ from paddlenlp import Taskflow
[{'时间': [{'text': '2月8日上午', 'start': 0, 'end': 6, 'probability': 0.6513581678349247}], '选手': [{'text': '谷爱凌', 'start': 28, 'end': 31, 'probability': 0.9819330659468051}], '赛事名称': [{'text': '北京冬奥会自由式滑雪女子大跳台决赛', 'start': 6, 'end': 23, 'probability': 0.4908131110420939}]}]
```

- `uie-m-base`和`uie-m-large`支持中英文混合抽取,调用示例:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里确定一下UIE多语言版本,在训练的时候只是支持中文、英文

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯嗯 目前UIE-M多任务阶段只加了中英文数据

batch_size=1,
model='uie-base',
position_prob=0.5,
precision='fp32')
```

* `schema`:定义任务抽取目标,可参考开箱即用中不同任务的调用示例进行配置。
* `schema_lang`:设置schema的语言,因为中英schema的构造有所不同,因此需要指定schema的语言。该参数只对`uie-m-base`和`uie-m-large`模型有效。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该提示设置哪些内容

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thx

Copy link
Collaborator

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wawltor wawltor merged commit 6f953a9 into PaddlePaddle:develop Sep 5, 2022
@linjieccc linjieccc deleted the uie_m branch September 5, 2022 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ie Issues related to Information Extraction taskflow Taskflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants