Skip to content

mediaProduct2017/reading

Repository files navigation

reading

A media product.

如果用多台机器来做并行爬虫,那么每台机器负责几个网站(本质上也是map操作,每台机器上的爬虫对应自己负责的网站),存储在hadoop数据库中,然后用MapReduce做数据库查询。

About

Download news data by a web crawler for reading

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published