add CEPF

hujunxianligong · hujunxianligong · commit 4baad03b718f · 2020-05-07T11:27:56.000+08:00
diff --git a/README.md b/README.md
@@ -3,6 +3,11 @@ WebCollector is an open source web crawler framework based on Java.It provides
   some simple interfaces for crawling the Web,you can setup a
   multi-threaded web crawler in less than 5 minutes.
 
+
+In addition to a general crawler framework, WebCollector also integrates __CEPF__, a well-designed state-of-the-art web content extraction algorithm proposed by Wu, et al.:
++ Wu GQ, Hu J, Li L, Xu ZH, Liu PC, Hu XG, Wu XD. Online Web news extraction via tag path feature fusion. Ruan Jian Xue Bao/Journal of Software, 2016,27(3):714-735 (in Chinese). http://www.jos.org.cn/1000-9825/4868.htm
+
+
 ## HomePage
 [https://github.com/CrawlScript/WebCollector](https://github.com/CrawlScript/WebCollector)
 
diff --git a/README.zh-cn.md b/README.zh-cn.md
@@ -4,6 +4,9 @@ WebCollector
 ### 爬虫简介
 WebCollector是一个无须配置、便于二次开发的JAVA爬虫框架（内核），它提供精简的的API，只需少量代码即可实现一个功能强大的爬虫。
 
+除了爬虫框架，WebCollector还集成了CEPF，它是由吴共庆老师等提出的网页内容自动抽取算法，是目前最先进的算法之一：
++ 吴共庆,胡骏,李莉,徐喆昊,刘鹏程,胡学钢,吴信东.基于标签路径特征融合的在线Web 新闻内容抽取.软件学报,2016,27(3):714-735. http://www.jos.org.cn/1000-9825/4868.htm
+
 ### 爬虫内核：
 WebCollector致力于维护一个稳定、可扩的爬虫内核，便于开发者进行灵活的二次开发。内核具有很强的扩展性，用户可以在内核基础上开发自己想要的爬虫。源码中集成了Jsoup，可进行精准的网页解析。