A Study Session on Data Collection and Web Scraping
- Use Beautiful Soup to parse HTML.
- How to use Chrome Developer tools to identify the HTML elements.
- Scrape websites using Beautiful Soup.
- Automate scraping using Splinter.
- Collect and organize scraped data in a Pandas Data Frame.
0.1 Installations
The following tools and packages are required for the successful working of the activities:
Chrome
Chrome Driver
Beautiful Soup
requests
splinter
html5lib
lxml
pandas
Installation instructions for Chrome Driver
pip install requests
pip install beautifulsoup4
pip install "splinter[selenium4]"
pip install html5lib
pip install lxml
pip install pandas
0.2 Installation Check
Run the Installation Check to verify all installations. Install packages as seemed necessary.
Activity Time: 0:20 | Elapsed Time: 0:20 |
---|
1.2 Scraping A Webpage (10 min)
Starter : 1_2_Scraping_A_Webpage.ipynb
Solution : 1_2_Scraping_A_Webpage.ipynb
Activity Time: 0:25 | Elapsed Time: 0:45 |
---|
2.1 Inspect using Chrome Dev Tools (5 min)
Site: Laptops Site
2.2 Webscraper (20 min)
Site: Laptops Site
Starter : 2_2_Webscraper.ipynb
Solution : 2_2_Webscraper.ipynb
Activity Time: 0:30 | Elapsed Time: 1:15 |
---|
3.1 Stacking and Over Flowing (15 min)
Site: Stack Over Flow
Starter : 3_1_Stacking_and_Over_Flowing.ipynb
Solution : 3_1_Stacking_and_Over_Flowing.ipynb
3.2 Whats New (15 min)
Site: Global Voices
Starter : 3_2_Whats_New.ipynb
Solution : 3_2_Whats_New.ipynb
Activity Time: 0:25 | Elapsed Time: 1:40 |
---|
4.1 Framing the Quotes (25 min)
Site : Quotes to Scrape
Starter : 4_1_Framing_the_Quotes.ipynb
Solution : 4_1_Framing_the_Quotes.ipynb
Activity Time: 0:10 | Elapsed Time: 1:50 |
---|