Skip to content

Commit 689aef3

Browse files
committed
Day 45
- Web Scraping with BeautifulSoup
1 parent db52418 commit 689aef3

File tree

4 files changed

+149
-1
lines changed

4 files changed

+149
-1
lines changed

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@
5050
- [Day 42](https://github.com/a092devs/100-days-of-python/tree/master/day042) - Intermediate HTML
5151
- [Day 43](https://github.com/a092devs/100-days-of-python/tree/master/day043) - Introduction to CSS
5252
- [Day 44](https://github.com/a092devs/100-days-of-python/tree/master/day044) - Intermediate CSS
53+
- [Day 45](https://github.com/a092devs/100-days-of-python/tree/master/day045) - Web Scraping with BeautifulSoup
5354

5455
## ⚙ Tools and Technologies Covered
5556
- Python 3
@@ -67,4 +68,5 @@
6768
- APIs
6869
- Authentication
6970
- HTML 5
70-
- CSS 3
71+
- CSS 3
72+
- Web Scraping

day045/README.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
## 100 Movies that You Must Watch
2+
3+
# Objective
4+
5+
Scrape the top 100 movies of all time from a website. Generate a text file called `movies.txt` that lists the movie titles in ascending order (starting from 1).
6+
The result should look something like this:
7+
8+
```
9+
1) The Godfather
10+
2) The Empire Strikes Back
11+
3) The Dark Knight
12+
4) The Shawshank Redemption
13+
... and so on
14+
```
15+
The central idea behind this project is to be able to use BeautifulSoup to obtain some data - like movie titles - from a website like Empire's (or from, say Timeout or Stacker that have curated similar lists).
16+
17+
### ⚠️ Important: Use the Internet Archive's URL
18+
19+
Since websites change very frequently, **use this link**
20+
```
21+
URL = "https://web.archive.org/web/20200518073855/https://www.empireonline.com/movies/features/best-movies-2/"
22+
```
23+
from the Internet Archive's Wayback machine. That way your work will match the solution video.
24+
25+
(Do *not* use https://www.empireonline.com/movies/features/best-movies-2/ which I've used in the screen recording)
26+
27+
# Solution
28+
29+
You can find the code from my walkthrough and solution as a downloadable .zip file in the course resources for this lesson.

day045/main.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
import requests
2+
from bs4 import BeautifulSoup
3+
4+
URL = "https://web.archive.org/web/20200518073855/https://www.empireonline.com/movies/features/best-movies-2/"
5+
6+
response = requests.get(URL)
7+
webpage_html = response.text
8+
9+
soup = BeautifulSoup(webpage_html, 'html.parser')
10+
11+
all_movies = soup.find_all(name='h3', class_='title')
12+
movie_titles = [movie.getText() for movie in all_movies]
13+
movies = movie_titles[::-1]
14+
15+
with open('./day045/movies.txt', mode='w', encoding="utf-8") as file:
16+
for movie in movies:
17+
file.write(f"{movie}\n")

day045/movies.txt

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
1) The Godfather
2+
2) The Empire Strikes Back
3+
3) The Dark Knight
4+
4) The Shawshank Redemption
5+
5) Pulp Fiction
6+
6) Goodfellas
7+
7) Raiders Of The Lost Ark
8+
8) Jaws
9+
9) Star Wars
10+
10) The Lord Of The Rings: The Fellowship Of The Ring
11+
11) Back To The Future
12+
12: The Godfather Part II
13+
13) Blade Runner
14+
14) Alien
15+
15) Aliens
16+
16) The Lord Of The Rings: The Return Of The King
17+
17) Fight Club
18+
18) Inception
19+
19) Jurassic Park
20+
20) Die Hard
21+
21) 2001: A Space Odyssey
22+
22) Apocalypse Now
23+
23) The Lord Of The Rings: The Two Towers
24+
24) The Matrix
25+
25) Terminator 2: Judgment Day
26+
26) Heat
27+
27) The Good, The Bad And The Ugly
28+
28) Casablanca
29+
29) The Big Lebowski
30+
30) Seven
31+
31) Taxi Driver
32+
32) The Usual Suspects
33+
33) Schindler's List
34+
34) Guardians Of The Galaxy
35+
35) The Shining
36+
36) The Departed
37+
37) The Thing
38+
38) Mad Max: Fury Road
39+
39) Saving Private Ryan
40+
40) 12 Angry Men
41+
41) Eternal Sunshine Of The Spotless Mind
42+
42) There Will Be Blood
43+
43) One Flew Over The Cuckoo's Nest
44+
44) Gladiator
45+
45) Drive
46+
46) Citizen Kane
47+
47) Interstellar
48+
48) The Silence Of The Lambs
49+
49) Trainspotting
50+
50) Lawrence Of Arabia
51+
51) It's A Wonderful Life
52+
52) Once Upon A Time In The West
53+
53) Psycho
54+
54) Vertigo
55+
55) Pan's Labyrinth
56+
56) Reservoir Dogs
57+
57) Whiplash
58+
58) Inglourious Basterds
59+
59) E.T. – The Extra Terrestrial
60+
60) American Beauty
61+
61) Forrest Gump
62+
62) La La Land
63+
63) Donnie Darko
64+
64) L.A. Confidential
65+
65) Avengers Assemble
66+
66) Return Of The Jedi
67+
67) Memento
68+
68) Ghostbusters
69+
69) Singin' In The Rain
70+
70) The Lion King
71+
71) Hot Fuzz
72+
72) Rear Window
73+
73) Seven Samurai
74+
74) Mulholland Dr.
75+
75) Fargo
76+
76) A Clockwork Orange
77+
77) Toy Story
78+
78) Oldboy
79+
79) Captain America: Civil War
80+
15) Spirited Away
81+
81) The Social Network
82+
82) Some Like It Hot
83+
83) True Romance
84+
84) Rocky
85+
85) Léon
86+
86) Indiana Jones And The Last Crusade
87+
87) Predator
88+
88) The Exorcist
89+
89) Shaun Of The Dead
90+
90) No Country For Old Men
91+
91) The Prestige
92+
92) The Terminator
93+
93) The Princess Bride
94+
94) Lost In Translation
95+
95) Arrival
96+
96) Good Will Hunting
97+
97) Titanic
98+
98) Amelie
99+
99) Raging Bull
100+
100) Stand By Me

0 commit comments

Comments
 (0)