Skip to content

Commit aab79db

Browse files
committed
Add book scraping example
1 parent aa6821e commit aab79db

File tree

2 files changed

+36
-2
lines changed

2 files changed

+36
-2
lines changed

README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3115,7 +3115,7 @@ conn.close()
31153115
If you don't sanitize your inputs, a user could write SQl into their input and manipulate the database.
31163116

31173117
In the example below, if a user entered `' OR 1=1--` the single quote would close the quote, `1=1` would always evaluate to true and `--` would comment out any remaining characters.
3118-
3118+
31193119
```python
31203120
import sqlite3
31213121
conn = sqlite3.connect("users.db")
@@ -3140,4 +3140,8 @@ else:
31403140

31413141
conn.commit()
31423142
conn.close()
3143-
```
3143+
```
3144+
3145+
### Scraping to a Database
3146+
3147+

book_scraper/book_scraper.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# https://books.toscrape.com/catalogue/category/books/history_32/index.html
2+
3+
import sqlite3
4+
import requests
5+
from bs4 import BeautifulSoup
6+
7+
def scrape_books(url):
8+
response = requests.get(url)
9+
soup = BeautifulSoup(response.text, "html.parser")
10+
books = soup.find_all("article")
11+
all_books = []
12+
for book in books:
13+
book_data = (get_title(book),get_price(book), get_rating(book))
14+
all_books.append(book_data)
15+
print(all_books)
16+
17+
def get_title(book):
18+
return book.find("h3").find("a")["title"]
19+
20+
def get_price(book):
21+
price = book.select(".price_color")[0].get_text()
22+
return float(price.replace("£", "").replace("Â", ""))
23+
24+
def get_rating(book):
25+
ratings = {"One":1, "Two":2, "Three":3, "Four":4, "Five":5}
26+
paragraph = book.select(".star-rating")[0]
27+
word = paragraph.get_attribute_list("class")[-1]
28+
return ratings[word]
29+
30+
scrape_books("https://books.toscrape.com/catalogue/category/books/history_32/index.html")

0 commit comments

Comments
 (0)