MCP Web Scrape is designed with security and responsible web scraping practices at its core. We prioritize:
- Robots.txt compliance - respecting website policies
- Rate limiting - preventing server overload
- No paywall bypass - ethical content access only
- Safe content handling - preventing XSS and injection attacks
- Privacy protection - no sensitive data logging
- Always checks robots.txt before scraping
- Respects disallow rules for the configured user agent
- Honors crawl delays specified in robots.txt
- Fails safely if robots.txt cannot be accessed
// Example: Automatic robots.txt validation
const isAllowed = await checkRobotsTxt(url, 'mcp-web-scrape');
if (!isAllowed) {
throw new McpError(ErrorCode.InvalidRequest, 'Robots.txt disallows access');
}- Per-domain rate limits (default: 1 request/second)
- Configurable delays between requests
- Exponential backoff on rate limit errors
- Respect server response headers (Retry-After, etc.)
- HTML sanitization during Markdown conversion
- URL validation to prevent SSRF attacks
- Content-Type checking before processing
- Size limits to prevent memory exhaustion
- No sensitive data logging (URLs may be logged for debugging)
- Local caching only - no external data transmission
- Configurable cache retention periods
- Cache encryption for sensitive content (optional)
We provide security updates for the following versions:
| Version | Supported |
|---|---|
| 1.x.x | β Active support |
| 0.9.x | |
| < 0.9 | β No longer supported |
Please do NOT report security vulnerabilities through public GitHub issues.
- Email: Send details to
security@mcp-web-scrape.dev(if available) - GitHub Security Advisories: Use the private vulnerability reporting feature
- Encrypted communication: PGP key available on request
- Vulnerability description - what is the security issue?
- Impact assessment - what could an attacker achieve?
- Reproduction steps - how to demonstrate the vulnerability
- Affected versions - which releases are impacted
- Suggested fix - if you have ideas for remediation
- Disclosure timeline - your preferred timeline for public disclosure
Subject: [SECURITY] Vulnerability in MCP Web Scrape v1.2.3
## Summary
Brief description of the vulnerability
## Impact
- Confidentiality: [High/Medium/Low]
- Integrity: [High/Medium/Low]
- Availability: [High/Medium/Low]
## Reproduction
1. Step one
2. Step two
3. Observe vulnerability
## Affected Versions
- Version X.Y.Z through A.B.C
## Suggested Mitigation
Your ideas for fixing the issue
## Disclosure Timeline
Preferred timeline for coordinated disclosure
- Acknowledgment: Within 24 hours of report
- Initial assessment: Within 72 hours
- Regular updates: Every 7 days until resolution
- Coordinated disclosure: Work with reporter on timeline
- Day 0: Vulnerability reported
- Day 1: Acknowledgment sent
- Day 3: Initial assessment and severity rating
- Day 7: Fix development begins (for confirmed issues)
- Day 14-30: Patch release (depending on severity)
- Day 30-90: Public disclosure (coordinated with reporter)
- Critical: Remote code execution, data exfiltration
- High: Privilege escalation, authentication bypass
- Medium: Information disclosure, DoS attacks
- Low: Minor information leaks, configuration issues
We recognize security researchers who help improve MCP Web Scrape:
Be the first to help secure MCP Web Scrape!
{
"security": {
"respectRobotsTxt": true,
"userAgent": "mcp-web-scrape/1.0 (+https://github.com/mukul975/mcp-web-scrape)",
"rateLimitPerDomain": 1000,
"maxContentSize": "10MB",
"allowedProtocols": ["http", "https"],
"blockPrivateIPs": true,
"sanitizeContent": true
}
}When running the HTTP server, we recommend these headers:
// Express.js example
app.use((req, res, next) => {
res.setHeader('X-Content-Type-Options', 'nosniff');
res.setHeader('X-Frame-Options', 'DENY');
res.setHeader('X-XSS-Protection', '1; mode=block');
res.setHeader('Strict-Transport-Security', 'max-age=31536000');
next();
});The following are generally not considered security vulnerabilities:
- Rate limiting bypasses using multiple IPs/proxies
- Robots.txt violations when explicitly configured to ignore
- Content extraction from public, non-paywalled content
- Performance issues that don't lead to DoS
- Social engineering attacks against users
- Physical access to systems running MCP Web Scrape
- Static analysis: ESLint security rules
- Dependency scanning: npm audit, Snyk
- Runtime protection: Helmet.js for HTTP servers
- Monitoring: Application security monitoring
- Security Email:
security@mcp-web-scrape.dev(if available) - GitHub Security: Private vulnerability reporting
- Maintainer: @mukul975 (Mahipal)
Last Updated: January 2024
Note: This security policy is subject to change. Please check back regularly for updates.