Skip to content

Commit 905d0c6

Browse files
committed
Add Scraper Script for Crawling Multiple Websites
Takes in a list of URLs from a text file and processes them Signed-off-by: Antony Oduor <aowino@gmail.com>
1 parent 23c2db5 commit 905d0c6

File tree

3 files changed

+552
-0
lines changed

3 files changed

+552
-0
lines changed

README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@ pip install tldextract
2020
pip install html5
2121
pip install pandas
2222
pip install tqdm
23+
pip install grequests
24+
pip install validators
25+
pip install tld
2326
```
2427

2528
### keywords
@@ -55,3 +58,9 @@ their frequency and sorted.
5558
```
5659
./search.py -h
5760
```
61+
62+
63+
#### Linux / MacOS / Windows Gotchas!
64+
65+
66+
https://stackoverflow.com/questions/19425857/env-python-r-no-such-file-or-directory

requirements.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,6 @@ webencodings==0.5.1
1313
Whoosh==2.7.4
1414
pandas==0.21.1
1515
tqdm==4.19.5
16+
grequests=0.3.0
17+
validators=0.12.0
18+
tld==0.7.9

0 commit comments

Comments
 (0)