Skip to content

Production-ready data enrichment API with 9 AI-powered tools: web scraping, email intel, phone validation, company data, and more. SaaS-ready with OpenAPI docs.

License

Notifications You must be signed in to change notification settings

SCAILE-it/g-mcp-tools-fast

Repository files navigation

g-mcp-tools-fast - Production-Ready Enrichment API

Enterprise-grade data intelligence API with 9 enrichment tools + bulk processing

πŸš€ Status: Production-Ready | SaaS-Ready | Fully Deployed πŸ”— Live Endpoint: https://scaile--g-mcp-tools-fast-api.modal.run πŸ“š Interactive Docs: Swagger UI | ReDoc


🎯 Overview

A complete data enrichment API built on Modal.com, combining AI-powered web scraping with 8 specialized intelligence tools. Perfect for sales intelligence, market research, lead enrichment, and data validation.

Key Features

βœ… 9 Enrichment Tools - Web scraping, email intel, company data, phone validation, and more βœ… Bulk Processing - Process 100s-1000s of records in parallel with auto-detection βœ… Smart Auto-Detection - Automatically detect data types and apply appropriate tools βœ… Multi-Tool Enrichment - Combine multiple tools on a single record βœ… AI-Powered Extraction - Uses Gemini 2.5 Flash for intelligent data extraction βœ… Production-Ready - Authentication, health checks, comprehensive error handling βœ… Auto-Scaling - Serverless architecture handles traffic spikes automatically βœ… 24-Hour Cache - Reduces costs and improves response times βœ… OpenAPI Docs - Swagger/ReDoc for easy integration βœ… Type-Safe - Pydantic models for all inputs/outputs


πŸš€ Bulk Processing & Power Features

NEW: Process multiple records in parallel with intelligent auto-detection!

Multi-Tool Enrichment (/enrich)

Enrich a single record with multiple tools at once:

curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/enrich \
  -H 'Content-Type: application/json' \
  -d '{
    "data": {
      "phone": "+14155552671",
      "email": "john@anthropic.com"
    },
    "tools": ["phone-validation", "email-intel", "email-pattern"]
  }'

Auto-Detection (/enrich/auto)

Automatically detect data types and apply appropriate tools:

curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/enrich/auto \
  -H 'Content-Type: application/json' \
  -d '{
    "data": {
      "contact_phone": "+14155552671",
      "work_email": "john@anthropic.com",
      "company_domain": "anthropic.com"
    }
  }'

Response: Automatically detected and enriched with 5 tools (phone validation, email intel, email pattern, WHOIS, tech stack)!

Bulk Processing (/bulk)

Process multiple records in parallel with explicit tools:

curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/bulk \
  -H 'Content-Type: application/json' \
  -d '{
    "rows": [
      {"name": "Alice Johnson", "email": "alice@example.com"},
      {"name": "Bob Smith", "email": "bob@example.com"}
    ],
    "tools": ["email-intel", "email-pattern"]
  }'

Response:

{
  "success": true,
  "batch_id": "batch_1761503726531_7AzCBh1nHak",
  "status": "completed",
  "total_rows": 2,
  "successful": 2,
  "failed": 0,
  "processing_time_seconds": 1.24,
  "results": [ /* enriched rows */ ]
}

Bulk Auto-Processing (/bulk/auto)

Process multiple records with automatic tool detection:

curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/bulk/auto \
  -H 'Content-Type: application/json' \
  -d '{
    "rows": [
      {"name": "Alice", "email": "alice@example.com", "website": "example.com"},
      {"name": "Bob", "phone": "+14155551234"}
    ]
  }'

Smart Features:

  • βœ… Automatically detects emails, phones, domains, companies, GitHub usernames
  • βœ… Applies appropriate tools (email-intel, email-pattern, whois, tech-stack, etc.)
  • βœ… Processes rows in parallel using asyncio
  • βœ… Handles up to 10,000 rows per batch
  • βœ… Returns detailed success/error stats

πŸ› οΈ Individual Enrichment Tools

1. Web Scraper (/scrape)

Extract structured data from any website using natural language prompts.

Capabilities:

  • AI-powered extraction with Gemini 2.5 Flash
  • Multi-page scraping with auto-discovery
  • Custom JSON schema support
  • Link extraction
  • 24-hour intelligent caching

Example:

curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/scrape \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://anthropic.com",
    "prompt": "Extract the company mission and product names",
    "max_pages": 1
  }'

Response:

{
  "success": true,
  "data": {
    "company_mission": "Build safe, beneficial AI...",
    "product_names": ["Claude", "Claude Code", "Opus", "Sonnet", "Haiku"]
  },
  "metadata": {
    "extraction_time": 10.31,
    "pages_scraped": 1,
    "model": "gemini-2.5-flash"
  }
}

2. Email Intel (/email-intel)

Check which platforms an email is registered on (holehe).

curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/email-intel \
  -H 'Content-Type: application/json' \
  -d '{"email": "user@example.com"}'

3. Email Finder (/email-finder)

Find email addresses associated with a domain (theHarvester).

curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/email-finder \
  -H 'Content-Type: application/json' \
  -d '{"domain": "anthropic.com", "limit": 10}'

4. Company Data (/company-data)

Get company registration and corporate information (OpenCorporates).

curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/company-data \
  -H 'Content-Type: application/json' \
  -d '{"companyName": "Anthropic", "domain": "anthropic.com"}'

5. Phone Validation (/phone-validation)

Validate phone numbers with carrier, location, and line type info.

curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/phone-validation \
  -H 'Content-Type: application/json' \
  -d '{"phoneNumber": "+14155552671", "defaultCountry": "US"}'

Response:

{
  "success": true,
  "data": {
    "valid": true,
    "formatted": {
      "e164": "+14155552671",
      "international": "+1 415-555-2671",
      "national": "(415) 555-2671"
    },
    "country": "San Francisco, CA",
    "carrier": "Unknown",
    "lineType": "FIXED_LINE_OR_MOBILE",
    "lineTypeCode": 2
  }
}

6. Tech Stack (/tech-stack)

Detect technologies and frameworks used by a website.

curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/tech-stack \
  -H 'Content-Type: application/json' \
  -d '{"domain": "anthropic.com"}'

Response:

{
  "success": true,
  "data": {
    "domain": "anthropic.com",
    "technologies": [
      {"name": "Next.js", "category": "Framework"},
      {"name": "cloudflare", "category": "Web Server"}
    ],
    "totalFound": 2
  }
}

7. Email Pattern (/email-pattern)

Generate common email patterns for a domain.

curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/email-pattern \
  -H 'Content-Type: application/json' \
  -d '{"domain": "anthropic.com", "firstName": "John", "lastName": "Doe"}'

8. WHOIS (/whois)

Look up domain registration information.

curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/whois \
  -H 'Content-Type: application/json' \
  -d '{"domain": "anthropic.com"}'

Response:

{
  "success": true,
  "data": {
    "domain": "anthropic.com",
    "registrar": "MarkMonitor, Inc.",
    "creationDate": "2001-10-02",
    "expirationDate": "2033-10-02",
    "nameServers": ["ISLA.NS.CLOUDFLARE.COM", "RANDY.NS.CLOUDFLARE.COM"]
  }
}

9. GitHub Intel (/github-intel)

Analyze GitHub user profiles and repositories.

curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/github-intel \
  -H 'Content-Type: application/json' \
  -d '{"username": "anthropics"}'

Response:

{
  "success": true,
  "data": {
    "username": "anthropics",
    "name": "Anthropic",
    "location": "United States of America",
    "publicRepos": 54,
    "followers": 14565,
    "languages": {
      "Python": 6,
      "TypeScript": 3,
      "JavaScript": 1
    }
  }
}

πŸ” Authentication

The API supports optional API key authentication via the x-api-key header.

Enable Authentication

  1. Create Modal secret:
modal secret create modal-api-key MODAL_API_KEY=your-secret-key-here
  1. Redeploy the API:
./DEPLOY_G_MCP_TOOLS.sh
  1. Include API key in requests:
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/scrape \
  -H 'Content-Type: application/json' \
  -H 'x-api-key: your-secret-key-here' \
  -d '{"url": "https://example.com", "prompt": "Extract data"}'

Note: If MODAL_API_KEY is not set, the API is publicly accessible (useful for development).


πŸš€ Deployment

Prerequisites

  1. Install Modal CLI:
pip install modal
  1. Authenticate:
modal setup
  1. Create Gemini API secret:
modal secret create gemini-secret GOOGLE_GENERATIVE_AI_API_KEY=your-gemini-key

Deploy

chmod +x DEPLOY_G_MCP_TOOLS.sh
./DEPLOY_G_MCP_TOOLS.sh

Or manually:

modal deploy g-mcp-tools-complete.py

πŸ₯ Health Check

Monitor API status:

curl https://scaile--g-mcp-tools-fast-api.modal.run/health

Response:

{
  "status": "healthy",
  "service": "g-mcp-tools-fast",
  "version": "1.0.0",
  "tools": 9,
  "timestamp": "2025-10-26T17:30:00.000000Z"
}

πŸ“Š Response Format

All endpoints follow a consistent response format:

Success Response

{
  "success": true,
  "data": { ... },
  "metadata": {
    "source": "tool-name",
    "timestamp": "2025-10-26T17:30:00.000000Z"
  }
}

Error Response

{
  "success": false,
  "error": "Error message",
  "metadata": {
    "source": "tool-name",
    "timestamp": "2025-10-26T17:30:00.000000Z"
  }
}

πŸ’° Cost Optimization

The API includes several cost-saving features:

  1. 24-Hour Cache - Repeated requests return cached results
  2. Timeouts - Prevents runaway processes (30s default, 120s max)
  3. Container Idle Timeout - Containers shut down after 120s of inactivity
  4. Efficient Resource Usage - Only runs when needed

Estimated costs (Modal pricing):

  • Web scraping: ~$0.001 per request
  • Other tools: ~$0.0001 per request
  • Cache hits: $0 (served from memory)

πŸ§ͺ Testing

Run All Tests

# Test all 9 endpoints
./test-all-endpoints.sh

Individual Endpoint Tests

# Email pattern
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/email-pattern \
  -H 'Content-Type: application/json' \
  -d '{"domain": "anthropic.com"}'

# Phone validation
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/phone-validation \
  -H 'Content-Type: application/json' \
  -d '{"phoneNumber": "+14155552671"}'

# GitHub intel
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/github-intel \
  -H 'Content-Type: application/json' \
  -d '{"username": "anthropics"}'

πŸ“ˆ SaaS Readiness Checklist

  • Health Check Endpoint - /health for monitoring
  • API Authentication - Optional x-api-key header
  • OpenAPI Documentation - Swagger UI + ReDoc
  • Error Handling - Comprehensive error responses
  • Input Validation - Pydantic models
  • Rate Limiting - Handled by Modal platform
  • Monitoring - Modal dashboard + logs
  • Auto-Scaling - Serverless architecture
  • Cost Optimization - Caching + timeouts
  • Type Safety - TypeScript-style typing

Ready to Sell As:

βœ… B2B SaaS API βœ… Data Enrichment Service βœ… Lead Intelligence Platform βœ… Market Research Tool


πŸ”§ Monitoring & Logs

View Logs

modal app logs g-mcp-tools-fast --follow

Check App Status

modal app list | grep g-mcp-tools

View Secrets

modal secret list

πŸ—οΈ Architecture

Client Request
    ↓
FastAPI (Modal ASGI)
    ↓
Authentication Check (optional)
    ↓
Input Validation (Pydantic)
    ↓
Cache Check (24h TTL)
    ↓ (cache miss)
Tool Execution
    β”œβ†’ Web Scraper (crawl4ai + Gemini)
    β”œβ†’ Email Intel (holehe)
    β”œβ†’ Email Finder (theHarvester)
    β”œβ†’ Company Data (OpenCorporates API)
    β”œβ†’ Phone Validation (libphonenumber)
    β”œβ†’ Tech Stack (custom detection)
    β”œβ†’ Email Pattern (pattern generation)
    β”œβ†’ WHOIS (python-whois)
    β””β†’ GitHub Intel (GitHub API)
    ↓
Cache Result
    ↓
JSON Response

πŸ“ License

See parent repository for license information.


🀝 Support


🎯 Use Cases

Sales Intelligence

  • Enrich lead data with company info
  • Find contact emails and phone numbers
  • Validate contact information

Market Research

  • Scrape competitor websites
  • Analyze tech stacks
  • Track company changes via WHOIS

Developer Intelligence

  • Analyze GitHub profiles
  • Detect technologies used
  • Research developer ecosystems

Data Validation

  • Validate phone numbers
  • Verify email patterns
  • Check domain registrations

Built with: Modal.com | FastAPI | Gemini 2.5 Flash | crawl4ai

About

Production-ready data enrichment API with 9 AI-powered tools: web scraping, email intel, phone validation, company data, and more. SaaS-ready with OpenAPI docs.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published