Company Contact Scraper 🔍📇

A intelligent scraper that extracts company contact information using official Google APIs and advanced validation, with built-in Indian phone number verification and persistent history.

Features ✨

✅ Smart Deduplication - Auto-skips previously processed companies
📁 Multi-Format Reports - CSV & JSON outputs with timestamps
📞 Indian Number Validation - Strict TRAI-compliant phone verification
🔒 Secure Operations - Proxy support & SSL verification
📊 Priority Tagging - Customizable lead priority levels
⏳ Rate Limited - Google API-friendly pacing
🚫 No Duplicates - Persistent scrape history

Installation ⚙️

# Clone repository
git clone https://github.com/yourusername/company-scraper.git
cd company-scraper

# Install dependencies
pip install -r requirements.txt

# Create environment file
cp .env.example .env

Configuration ⚙️

Get Google API Credentials:
- Create project at Google Cloud Console
- Enable "Custom Search JSON API"
- Create API key and Custom Search Engine (CX)
Edit .env:

GOOGLE_API_KEY="your_api_key_here"
GOOGLE_CX="your_search_engine_id"

Usage 🚀

# Basic usage
python scraper.py -i companies.txt

# Custom priority & notes
python scraper.py -i clients.txt --priority 75 --notes "Q4 Leads"

# Force re-scrape existing companies
python scraper.py -i list.txt --force

# Custom output directory
python scraper.py -i input.txt -o ./custom_reports

Command Options 📋

Options:
  -i, --input    Input file with company names (required)
  -o, --output   Output directory (default: reports)
  -p, --priority Default priority percentage (60-100)
  --force        Force re-scrape of existing companies
  --notes        Additional notes for all entries
  --remarks      Custom remarks column content

Report Structure 📊

Sample CSV Output:

Client Name,Position,Client Company,Contact Details,Email,Priority,Notes,Found in,Remarks
,,ORB Energy,"+91 80 4123 4567;080-41234567",contact@orb.com,60%,Q4 Leads,https://orb.com/contact,

File Organization:

reports/
├── 2023-10-05_14-30-22/
│   ├── contacts.csv
│   └── contacts.json
└── scraped_history.json

Legal Considerations ⚖️

✔️ Complies with Google API Terms of Service
✔️ Respects website robots.txt directives
✔️ Rate limited to 1 request/second
✔️ Data validation for accuracy
❌ Not for scraping protected/personal data

FAQ ❓

Q: SSL certificate errors?
A: Run pip install --upgrade certifi and ensure system certificates are updated.

Q: No results found?
A: Check Google API quota and custom search engine configuration.

Q: How to reset history?
A: Delete reports/scraped_history.json

Q: Customize phone validation?
A: Modify validate_indian_phones() in scraper.py

License 📄

This project is licensed under the MIT License - see LICENSE file for details.

Contributing 🤝

Fork the repository
Create feature branch (git checkout -b feature/AmazingFeature)
Commit changes (git commit -m 'Add AmazingFeature')
Push to branch (git push origin feature/AmazingFeature)
Open Pull Request

Disclaimer: Use this tool responsibly. Always verify scraping legality for target websites and respect data privacy regulations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Company Contact Scraper 🔍📇

Features ✨

Installation ⚙️

Configuration ⚙️

Usage 🚀

Command Options 📋

Report Structure 📊

Legal Considerations ⚖️

FAQ ❓

License 📄

Contributing 🤝

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
reports		reports
utils		utils
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
companies.txt		companies.txt
main.py		main.py
requirements.txt		requirements.txt

License

Chetan-Goyal/company-contract-scraper

Folders and files

Latest commit

History

Repository files navigation

Company Contact Scraper 🔍📇

Features ✨

Installation ⚙️

Configuration ⚙️

Usage 🚀

Command Options 📋

Report Structure 📊

Legal Considerations ⚖️

FAQ ❓

License 📄

Contributing 🤝

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages