This project is a web scraper that extracts lawyer information from Justia's directory of family law attorneys in Chicago, Illinois. The scraper uses Selenium to navigate through multiple pages and collect details such as names, profile links, phone numbers, images, descriptions, and consultation availability.
- Automated Pagination: Scrapes all available pages.
- Data Extraction: Collects lawyer details (name, profile link, phone, website, etc.).
- CSV Export: Saves the scraped data into
justia_lawyers_selenium.csv. - Error Handling: Handles missing elements to avoid crashes.
- Headless Mode: Runs without opening a browser window.
git clone https://github.com/yourusername/justia-scraper.git
cd justia-scraperEnsure you have Python 3.x installed. Then, install the required packages:
pip install selenium webdriver-manager pandaspython justia_scraper.pyjustia-scraper/
βββ justia_scraper.py # Main scraping script
βββ justia_lawyers_selenium.csv # Output file (generated after running the script)
βββ README.md # Documentation
βββ requirements.txt # List of dependencies
seleniumβ For browser automationwebdriver-managerβ Auto-downloads the correct ChromeDriverpandasβ For saving scraped data in CSV format
To install dependencies manually, run:
pip install -r requirements.txt- This scraper runs in headless mode to improve efficiency.
- Ensure that Google Chrome is installed on your system.
- IP Blocking Warning: Running the scraper too frequently may lead to blocking. Consider using proxies if needed.
- Add proxy rotation to avoid detection.
- Improve error handling and logging.
- Support other lawyer categories or cities.
Developed by Yuri P. π