本代码使用spark对B站大量视频用户弹幕数据进行统计学分析,并使用网页展示结果。
This branch is for spider which retrive data from www.bilibili.com. Please put your code and brief description of your code here if your job is implement the spider or part of it.
==DO NOT RUN THE FILE ./BigDataTraining/spyder/bilibili_spyder IF YOU DO NOT KNOW WHAT YOU ARE DOING!!!==
This package contains a spider which crawls URLs pointing to the list of each video categories.
These URLs are stored behand these navigate buttons.
This spider will write result to a CSV file called nav_btn_item.csv,
CSV record Format:
| Button title | URL |
|---|---|
| 首页 | www.bilibili.com/?type=1 |
and it also write records to Redis database.
The Redis key is: bilibili:class_urls
The video type code is appended at the tail of each URL string. The type codes are defined in file GlobalParams.py
Spider used to get barrage data.
Spider used to get video details.
Spark operator for data processing
Front end website and backend service.
Project structure:
UrlUtils:
- Add video type code to URL
- Remove video type code from URL
- Get video typed code from URL
RedisUtils:
- Create a new Redis key and insert URLs to it

