File tree 4 files changed +149
-1
lines changed 4 files changed +149
-1
lines changed Original file line number Diff line number Diff line change 50
50
- [ Day 42] ( https://github.com/a092devs/100-days-of-python/tree/master/day042 ) - Intermediate HTML
51
51
- [ Day 43] ( https://github.com/a092devs/100-days-of-python/tree/master/day043 ) - Introduction to CSS
52
52
- [ Day 44] ( https://github.com/a092devs/100-days-of-python/tree/master/day044 ) - Intermediate CSS
53
+ - [ Day 45] ( https://github.com/a092devs/100-days-of-python/tree/master/day045 ) - Web Scraping with BeautifulSoup
53
54
54
55
## ⚙ Tools and Technologies Covered
55
56
- Python 3
67
68
- APIs
68
69
- Authentication
69
70
- HTML 5
70
- - CSS 3
71
+ - CSS 3
72
+ - Web Scraping
Original file line number Diff line number Diff line change
1
+ ## 100 Movies that You Must Watch
2
+
3
+ # Objective
4
+
5
+ Scrape the top 100 movies of all time from a website. Generate a text file called ` movies.txt ` that lists the movie titles in ascending order (starting from 1).
6
+ The result should look something like this:
7
+
8
+ ```
9
+ 1) The Godfather
10
+ 2) The Empire Strikes Back
11
+ 3) The Dark Knight
12
+ 4) The Shawshank Redemption
13
+ ... and so on
14
+ ```
15
+ The central idea behind this project is to be able to use BeautifulSoup to obtain some data - like movie titles - from a website like Empire's (or from, say Timeout or Stacker that have curated similar lists).
16
+
17
+ ### ⚠️ Important: Use the Internet Archive's URL
18
+
19
+ Since websites change very frequently, ** use this link**
20
+ ```
21
+ URL = "https://web.archive.org/web/20200518073855/https://www.empireonline.com/movies/features/best-movies-2/"
22
+ ```
23
+ from the Internet Archive's Wayback machine. That way your work will match the solution video.
24
+
25
+ (Do * not* use https://www.empireonline.com/movies/features/best-movies-2/ which I've used in the screen recording)
26
+
27
+ # Solution
28
+
29
+ You can find the code from my walkthrough and solution as a downloadable .zip file in the course resources for this lesson.
Original file line number Diff line number Diff line change
1
+ import requests
2
+ from bs4 import BeautifulSoup
3
+
4
+ URL = "https://web.archive.org/web/20200518073855/https://www.empireonline.com/movies/features/best-movies-2/"
5
+
6
+ response = requests .get (URL )
7
+ webpage_html = response .text
8
+
9
+ soup = BeautifulSoup (webpage_html , 'html.parser' )
10
+
11
+ all_movies = soup .find_all (name = 'h3' , class_ = 'title' )
12
+ movie_titles = [movie .getText () for movie in all_movies ]
13
+ movies = movie_titles [::- 1 ]
14
+
15
+ with open ('./day045/movies.txt' , mode = 'w' , encoding = "utf-8" ) as file :
16
+ for movie in movies :
17
+ file .write (f"{ movie } \n " )
Original file line number Diff line number Diff line change
1
+ 1) The Godfather
2
+ 2) The Empire Strikes Back
3
+ 3) The Dark Knight
4
+ 4) The Shawshank Redemption
5
+ 5) Pulp Fiction
6
+ 6) Goodfellas
7
+ 7) Raiders Of The Lost Ark
8
+ 8) Jaws
9
+ 9) Star Wars
10
+ 10) The Lord Of The Rings: The Fellowship Of The Ring
11
+ 11) Back To The Future
12
+ 12: The Godfather Part II
13
+ 13) Blade Runner
14
+ 14) Alien
15
+ 15) Aliens
16
+ 16) The Lord Of The Rings: The Return Of The King
17
+ 17) Fight Club
18
+ 18) Inception
19
+ 19) Jurassic Park
20
+ 20) Die Hard
21
+ 21) 2001: A Space Odyssey
22
+ 22) Apocalypse Now
23
+ 23) The Lord Of The Rings: The Two Towers
24
+ 24) The Matrix
25
+ 25) Terminator 2: Judgment Day
26
+ 26) Heat
27
+ 27) The Good, The Bad And The Ugly
28
+ 28) Casablanca
29
+ 29) The Big Lebowski
30
+ 30) Seven
31
+ 31) Taxi Driver
32
+ 32) The Usual Suspects
33
+ 33) Schindler's List
34
+ 34) Guardians Of The Galaxy
35
+ 35) The Shining
36
+ 36) The Departed
37
+ 37) The Thing
38
+ 38) Mad Max: Fury Road
39
+ 39) Saving Private Ryan
40
+ 40) 12 Angry Men
41
+ 41) Eternal Sunshine Of The Spotless Mind
42
+ 42) There Will Be Blood
43
+ 43) One Flew Over The Cuckoo's Nest
44
+ 44) Gladiator
45
+ 45) Drive
46
+ 46) Citizen Kane
47
+ 47) Interstellar
48
+ 48) The Silence Of The Lambs
49
+ 49) Trainspotting
50
+ 50) Lawrence Of Arabia
51
+ 51) It's A Wonderful Life
52
+ 52) Once Upon A Time In The West
53
+ 53) Psycho
54
+ 54) Vertigo
55
+ 55) Pan's Labyrinth
56
+ 56) Reservoir Dogs
57
+ 57) Whiplash
58
+ 58) Inglourious Basterds
59
+ 59) E.T. â The Extra Terrestrial
60
+ 60) American Beauty
61
+ 61) Forrest Gump
62
+ 62) La La Land
63
+ 63) Donnie Darko
64
+ 64) L.A. Confidential
65
+ 65) Avengers Assemble
66
+ 66) Return Of The Jedi
67
+ 67) Memento
68
+ 68) Ghostbusters
69
+ 69) Singin' In The Rain
70
+ 70) The Lion King
71
+ 71) Hot Fuzz
72
+ 72) Rear Window
73
+ 73) Seven Samurai
74
+ 74) Mulholland Dr.
75
+ 75) Fargo
76
+ 76) A Clockwork Orange
77
+ 77) Toy Story
78
+ 78) Oldboy
79
+ 79) Captain America: Civil War
80
+ 15) Spirited Away
81
+ 81) The Social Network
82
+ 82) Some Like It Hot
83
+ 83) True Romance
84
+ 84) Rocky
85
+ 85) Léon
86
+ 86) Indiana Jones And The Last Crusade
87
+ 87) Predator
88
+ 88) The Exorcist
89
+ 89) Shaun Of The Dead
90
+ 90) No Country For Old Men
91
+ 91) The Prestige
92
+ 92) The Terminator
93
+ 93) The Princess Bride
94
+ 94) Lost In Translation
95
+ 95) Arrival
96
+ 96) Good Will Hunting
97
+ 97) Titanic
98
+ 98) Amelie
99
+ 99) Raging Bull
100
+ 100) Stand By Me
You can’t perform that action at this time.
0 commit comments