Skip to content

Wikipedia scraper for extracting and organizing data on the largest companies in the United States by revenue into a structured CSV file.

Notifications You must be signed in to change notification settings

SaiSurajMatta/Wikipedia-Web-Scraping-Python-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Wikipedia Web Scraping Python Project

Description: This Python script utilizes BeautifulSoup and requests libraries to scrape data from the Wikipedia page listing the largest companies in the United States by revenue. The data is then processed and converted into a structured DataFrame using the Pandas library. The final step involves exporting this data to a CSV file.

Usage:

  1. Clone the repository: git clone https://github.com/SaiSurajMatta/Wikipedia-Web-Scraping-Python-Project
  2. Install the required dependencies: pip install beautifulsoup4 requests pandas
  3. Run the file : Wikipedia_Web_Scraping_Project.ipynb

Requirements:

  • Python 3
  • BeautifulSoup
  • Requests
  • Pandas

How to Contribute:

  1. Fork the repository.
  2. Create a new branch: git checkout -b feature/new-feature.
  3. Make your changes and commit them: git commit -m 'Add new feature'.
  4. Push to the branch: git push origin feature/new-feature.
  5. Create a pull request.

About

Wikipedia scraper for extracting and organizing data on the largest companies in the United States by revenue into a structured CSV file.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published