Skip to content

Simple Python Script to scrape the contact details using google search api and web scraping.

License

Notifications You must be signed in to change notification settings

Chetan-Goyal/company-contract-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Company Contact Scraper 🔍📇

License: MIT Python 3.8+ Dependencies

A intelligent scraper that extracts company contact information using official Google APIs and advanced validation, with built-in Indian phone number verification and persistent history.

Features ✨

  • Smart Deduplication - Auto-skips previously processed companies
  • 📁 Multi-Format Reports - CSV & JSON outputs with timestamps
  • 📞 Indian Number Validation - Strict TRAI-compliant phone verification
  • 🔒 Secure Operations - Proxy support & SSL verification
  • 📊 Priority Tagging - Customizable lead priority levels
  • Rate Limited - Google API-friendly pacing
  • 🚫 No Duplicates - Persistent scrape history

Installation ⚙️

# Clone repository
git clone https://github.com/yourusername/company-scraper.git
cd company-scraper

# Install dependencies
pip install -r requirements.txt

# Create environment file
cp .env.example .env

Configuration ⚙️

  1. Get Google API Credentials:

    • Create project at Google Cloud Console
    • Enable "Custom Search JSON API"
    • Create API key and Custom Search Engine (CX)
  2. Edit .env:

GOOGLE_API_KEY="your_api_key_here"
GOOGLE_CX="your_search_engine_id"

Usage 🚀

# Basic usage
python scraper.py -i companies.txt

# Custom priority & notes
python scraper.py -i clients.txt --priority 75 --notes "Q4 Leads"

# Force re-scrape existing companies
python scraper.py -i list.txt --force

# Custom output directory
python scraper.py -i input.txt -o ./custom_reports

Command Options 📋

Options:
  -i, --input    Input file with company names (required)
  -o, --output   Output directory (default: reports)
  -p, --priority Default priority percentage (60-100)
  --force        Force re-scrape of existing companies
  --notes        Additional notes for all entries
  --remarks      Custom remarks column content

Report Structure 📊

Sample CSV Output:

Client Name,Position,Client Company,Contact Details,Email,Priority,Notes,Found in,Remarks
,,ORB Energy,"+91 80 4123 4567;080-41234567",contact@orb.com,60%,Q4 Leads,https://orb.com/contact,

File Organization:

reports/
├── 2023-10-05_14-30-22/
│   ├── contacts.csv
│   └── contacts.json
└── scraped_history.json

Legal Considerations ⚖️

  • ✔️ Complies with Google API Terms of Service
  • ✔️ Respects website robots.txt directives
  • ✔️ Rate limited to 1 request/second
  • ✔️ Data validation for accuracy
  • ❌ Not for scraping protected/personal data

FAQ ❓

Q: SSL certificate errors?
A: Run pip install --upgrade certifi and ensure system certificates are updated.

Q: No results found?
A: Check Google API quota and custom search engine configuration.

Q: How to reset history?
A: Delete reports/scraped_history.json

Q: Customize phone validation?
A: Modify validate_indian_phones() in scraper.py

License 📄

This project is licensed under the MIT License - see LICENSE file for details.

Contributing 🤝

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/AmazingFeature)
  3. Commit changes (git commit -m 'Add AmazingFeature')
  4. Push to branch (git push origin feature/AmazingFeature)
  5. Open Pull Request

Disclaimer: Use this tool responsibly. Always verify scraping legality for target websites and respect data privacy regulations.

About

Simple Python Script to scrape the contact details using google search api and web scraping.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages