-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FasterTokenizer model in experiment #1220
Add FasterTokenizer model in experiment #1220
Conversation
2. add demo with FasterTokenizer usage in experiment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_string_tensor -> to_tensor, 移入experimental中
动转静需要内置到FasterModelForXXXX类中,在上层动转静接口中屏蔽STRINGS对象暴露
动转静导出建议同时导出probs和argmax后结果,可以使推理结果更加便捷。
不要让用户额外撰写softmax的算子实现
rm -rf * | ||
|
||
# same with the demo.cc | ||
DEMO_NAME=demo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要叫DEMO,这不是DEMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
修改为text_cls_infer
@@ -0,0 +1,64 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要这些莫名其妙的空行
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文件名不要定义为demo,改为infer.cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
或者是ernie_infer。同时后面应该还得区分下句子分类还是序列标注任务
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, 修改为text_cls_infer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seq_cls_infer/token_cls_infer可能可以跟类名保持更好一致
"办理入住手续,节省时间。"}; | ||
|
||
std::vector<float> probs; | ||
Run(predictor.get(), &data, &probs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要要给出print的结果
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
输出应该是const引用,保持data输入
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果这个demo就是分类,那就写清楚分类的,和序列标注的分开
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seq_cls_infer/token_cls_infer可能可以跟类名保持更好一致
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done,已修改为seq_cls_infer
} | ||
|
||
void Run(Predictor* predictor, | ||
std::vector<std::string>* input_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
输入应该是用const引用,输出才是指针
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
修改为
void Run(Predictor* predictor,
const std::vector<std::string>& input_data,
std::vector<float>* logits,
std::vector<int64_t>* predictions)
|
||
import paddle | ||
import paddlenlp | ||
from paddlenlp.experimental import FastSequenceClassificationModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Faster,我们整个技术代号统一使用Faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddlenlp/experimental/model.py
Outdated
return logits | ||
|
||
|
||
class FastSequenceClassificationModel(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FasterModelForSequenceClassification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddlenlp/ops/strings.py
Outdated
import paddle.fluid.core as core | ||
|
||
__all__ = ['to_string_tensor', 'to_vocab_tensor'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
整体挪到paddlenlp.experimental中去
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddlenlp/experimental/model.py
Outdated
raise ValueError("Unknown name %s. Now %s surports %s" % | ||
(pretrained_model_name_or_path, cls.__name__, | ||
list(name_model.keys()))) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
基于这个类新增to_static接口,屏蔽STRINGS类型对外暴露
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddlenlp/experimental/model.py
Outdated
else: | ||
raise ValueError("Unknown name %s. Now %s surports %s" % | ||
(pretrained_model_name_or_path, cls.__name__, | ||
list(name_model.keys()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
基于这个类新增to_static接口,屏蔽STRINGS类型对外暴露
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddlenlp/ops/strings.py
Outdated
@@ -12,10 +12,13 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
|
|||
import paddle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
移入paddlenlp/experimental/中
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
2. rename to_vocab_tensor to to_vocab_buff 3. add to_static() api to FasterModel
2. remove some redudant code
2. suuport from_pretrained() with given a local directory
set(CUDA_LIB "/usr/local/cuda/lib64/" CACHE STRING "CUDA Library") | ||
else() | ||
if(CUDA_LIB STREQUAL "") | ||
set(CUDA_LIB "C:\\Program\ Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\lib\\x64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个命令写死路径可能不一定正确,回头得windows测试验证下
losses.append(loss.numpy()) | ||
correct = metric.compute(logits, labels) | ||
metric.update(correct) | ||
accu = metric.accumulate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此处的accumulte应该在循环外还是在循环内?
|
||
|
||
def create_dataloader(dataset, mode='train', batch_size=1): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
去掉无用空行
text = '小说是文学的一种样式,一般描写人物故事,塑造多种多样的人物形象,但亦有例外。它是拥有不完整布局、发展及主题的文学作品。而对话是不是具有鲜明的个性,每个人物说的没有独特的语言风格,是衡量小说水准的一个重要标准。与其他文学样式相比,小说的容量较大,它可以细致的展现人物性格和命运,可以表现错综复杂的矛盾冲突,同时还可以描述人物所处的社会生活环境。小说一词,最早见于《庄子·外物》:“饰小说以干县令,其于大达亦远矣。”这里所说的小说,是指琐碎的言谈、小的道理,与现时所说的小说相差甚远。文学中,小说通常指长篇小说、中篇、短篇小说和诗的形式。小说是文学的一种样式,一般描写人物故事,塑造多种多样的人物形象,但亦有例外。它是拥有不完整布局、发展及主题的文学作品。而对话是不是具有鲜明的个性,每个人物说的没有独特的语言风格,是衡量小说水准的一个重要标准。与其他文学样式相比,小说的容量较大,它可以细致的展现人物性格和命运,可以表现错综复杂的矛盾冲突,同时还可以描述人物所处的社会生活环境。小说一词,最早见于《庄子·外物》:“饰小说以干县令,其于大达亦远矣。”这里所说的小说,是指琐碎的言谈、小的道理,与现时所说的小说相差甚远。文学中' | ||
data = [text[:max_seq_length]] * 100 | ||
|
||
pp_tokenizer = FasterTokenizer(vocab, do_lower_case=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此处接口是否需要与XXXTokenizer.from_pretrained的API体验打平?以及是否需要
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
APIs
Description