Skip to content
View flicck's full-sized avatar

Block or report flicck

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Popular repositories Loading

  1. CninfoDistributedSpider CninfoDistributedSpider Public

    针对巨潮资讯网上市公司公告的分布式爬虫,采用scrapy和kafka的分布式架构。可以爬取爬取指定上市公司列表、指定时间段内的所有公告并保存PDF。后续会加入搜索引擎功能

    Python 20 5

  2. HadoopLink HadoopLink Public

    HadoopLink是基于Hadoop及BaseMR进一步封装而来的工作链框架。该框架省去了您生成Job,JobControl,ContrlledJob的繁杂重复代码。同时您可以非常便捷的通过注释设置MapReduce任务之间的依赖关系。HadoopLink支持多个任务链并行运行,不同任务链的可以进行定时执行设置。

    Java 11

  3. flinkx flinkx Public

    Forked from DTStack/chunjun

    基于flink的分布式数据同步工具

    Java 5 7

  4. MutipleContentExtractor MutipleContentExtractor Public

    通用html正文内容抽取工具。双层次网页正文抽取的单机开源版本,第一层次精确识别正文xpath进行抽取。一旦抽取失败转入基于行块分布的暴力抽取。由于有双重保证,正文抽取精度和成功率都达到了一个很高的比率。

    Java 4

  5. flink-simpleapi flink-simpleapi Public

    由于笔者觉得flink的一些api太难记了,就写了这个工具包,将一些常用函数放在一块方便调用。

    Java 4 1

  6. DynamicDrools DynamicDrools Public

    动态的加载drools规则的demo

    Java 4 2