Data-Collection-and-Web-Scraping

Overview

A Study Session on Data Collection and Web Scraping

Learning Objectives

Use Beautiful Soup to parse HTML.
How to use Chrome Developer tools to identify the HTML elements.
Scrape websites using Beautiful Soup.
Automate scraping using Splinter.
Collect and organize scraped data in a Pandas Data Frame.

0. Prework

0.1 Installations

The following tools and packages are required for the successful working of the activities:

Chrome
Chrome Driver
Beautiful Soup
requests
splinter
html5lib
lxml
pandas

Chrome

Installation instructions for Chrome Driver

pip install requests
pip install beautifulsoup4
pip install "splinter[selenium4]"
pip install html5lib
pip install lxml
pip install pandas

0.2 Installation Check

Run the Installation Check to verify all installations. Install packages as seemed necessary.

1. HTML, CSS and Javascript Scraping

Activity Time: 0:20	Elapsed Time: 0:20

1.1 Building A Webpage (10 min)

Starter : index.html

Solution : index.html

1.2 Scraping A Webpage (10 min)

Starter : 1_2_Scraping_A_Webpage.ipynb

Solution : 1_2_Scraping_A_Webpage.ipynb

2. Scraping the World Wide Web

Activity Time: 0:25	Elapsed Time: 0:45

2.1 Inspect using Chrome Dev Tools (5 min)

Site: Laptops Site

2.2 Webscraper (20 min)

Site: Laptops Site

Starter : 2_2_Webscraper.ipynb

Solution : 2_2_Webscraper.ipynb

3. Splinter

Activity Time: 0:30	Elapsed Time: 1:15

3.1 Stacking and Over Flowing (15 min)

Site: Stack Over Flow

Starter : 3_1_Stacking_and_Over_Flowing.ipynb

Solution : 3_1_Stacking_and_Over_Flowing.ipynb

3.2 Whats New (15 min)

Site: Global Voices

Starter : 3_2_Whats_New.ipynb

Solution : 3_2_Whats_New.ipynb

4. Data Collection

Activity Time: 0:25	Elapsed Time: 1:40

4.1 Framing the Quotes (25 min)

Site : Quotes to Scrape

Starter : 4_1_Framing_the_Quotes.ipynb

Solution : 4_1_Framing_the_Quotes.ipynb

5. Q & A

Activity Time: 0:10	Elapsed Time: 1:50

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Activites		Activites
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Collection-and-Web-Scraping

Overview

Learning Objectives

0. Prework

1. HTML, CSS and Javascript Scraping

2. Scraping the World Wide Web

3. Splinter

4. Data Collection

5. Q & A

End Session

About

Languages

License

compmonk/Data-Collection-and-Web-Scraping

Folders and files

Latest commit

History

Repository files navigation

Data-Collection-and-Web-Scraping

Overview

Learning Objectives

0. Prework

1. HTML, CSS and Javascript Scraping

2. Scraping the World Wide Web

3. Splinter

4. Data Collection

5. Q & A

End Session

About

Topics

Resources

License

Stars

Watchers

Forks

Languages