WebScraper - Next.js Edition.

A modern, full-featured web scraper built with Next.js and Supabase. Crawl directory listings, classify files, and search through media collections with a beautiful UI.

🚀 Quick Deploy

✨ Features

🔍 Search Interface: Advanced search with file type filtering
📖 Browse Page: Paginated file viewer (10/20/50 per page)
📥 Download Manager: Bulk downloads in TXT, wget, and aria2 formats
⚙️ Admin Panel: URL management and crawler controls
🤖 Smart Crawler: Enhanced Node.js crawler with JSDOM + regex fallback
🎨 Modern UI: Beautiful, responsive design with Tailwind CSS
☁️ Cloud Database: Powered by Supabase PostgreSQL

📊 File Type Support

Automatically classifies and searches:

🎥 Video: mp4, mkv, avi, mov, mpg, mpeg, wmv, m4v
🎵 Audio: mp3, wav, ogg, wma, aif, mid, midi, mpa, wpl
🗜️ Compressed: zip, rar, 7z, tar.gz, deb, pkg, arj
💿 Disk Images: iso, dmg, bin, toast, vcd
🖼️ Images: jpg, png, gif, bmp, svg, ico, tif
📄 Documents: pdf, txt, doc, docx, rtf, wpd, odt
⚙️ Executables: exe, apk, bat, com, jar, py, wsf

🛠️ Setup Instructions

1. Clone & Deploy

git clone https://github.com/hybridx/WebScraper.git
cd WebScraper
npm install

2. Set Up Supabase Database

Create Supabase Project: Go to supabase.com
Get Credentials:
- Project URL: https://[project-id].supabase.co
- Anon Key: eyJ... (from Settings → API)

3. Configure Environment Variables

In your deployment platform (Vercel/Netlify/etc.), add:

SUPABASE_URL=https://your-project-id.supabase.co
SUPABASE_ANON_KEY=your_anon_key_here

4. Initialize Database

Go to your Supabase project → SQL Editor
Copy and paste the content from supabase-schema.sql
Click Run to create all tables and indexes

5. Deploy & Test

The application will auto-deploy when you push to git. Check the admin panel at /admin for system status.

🔧 Development

# Install dependencies
npm install

# Set up environment variables
cp .env.example .env.local
# Edit .env.local with your Supabase credentials

# Run development server
npm run dev

# Open http://localhost:3000

📱 Usage

Admin Panel (`/admin`)

Password: admin123 (change in API routes)
Add URLs: Submit directory listing URLs to crawl
Monitor Status: Real-time system health and database status
Manage URLs: View, delete, and track crawled URLs

Search (`/`)

Smart Search: Find files by name or URL
Type Filtering: Filter by media type (video, audio, etc.)
Real-time Results: Instant search with pagination

Browse (`/browse`)

Paginated View: 10, 20, or 50 files per page
Bulk Operations: Select multiple files for download
Filter by Type: Click stats cards to filter results

Download (`/download`)

Multiple Formats: TXT, wget script, or aria2 download files
Bulk Downloads: Generate download files for all or filtered results

🚀 Auto-Deployment Setup

Vercel (Recommended)

Connect Repository: Import your GitHub repository to Vercel
Configure Environment Variables: Add SUPABASE_URL and SUPABASE_ANON_KEY
Auto-Deploy: Every git push to master/main automatically deploys

Manual Deploy Commands

# Deploy to production
npx vercel --prod

# View deployment logs
npx vercel logs

# Check environment variables
npx vercel env ls

🏗️ Architecture

Frontend: Next.js 14 with TypeScript
Styling: Tailwind CSS with responsive design
Database: Supabase PostgreSQL with Row Level Security
Crawler: Node.js with JSDOM and regex fallback
Deployment: Vercel with git-based auto-deployment
File Classification: 27+ supported file types

🔐 Security

Environment Variables: All sensitive data via environment variables
Password Protection: Admin panel protected with configurable password
Database Security: Supabase RLS policies (customizable)
Input Validation: URL validation and SQL injection prevention

🐛 Troubleshooting

Common Issues

500 Errors: Check environment variables in admin panel health status
Database Connection: Ensure Supabase URL and key are correct
Missing Tables: Run the SQL schema in Supabase SQL Editor
Crawler Not Working: Check admin panel for detailed error messages

Health Check

Visit /api/health to see detailed system status including:

Environment variable configuration
Database connection status
Deployment information

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

📄 License

This project is open source and available under the MIT License.

🙏 Acknowledgments

Built with Next.js
Database by Supabase
UI components with Tailwind CSS
Icons by Lucide React

Ready to scrape the web? 🕸️ Deploy now and start discovering media files!

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github/workflows		.github/workflows
__pycache__		__pycache__
app		app
sandbox		sandbox
scripts		scripts
static/styles		static/styles
supabase/migrations		supabase/migrations
templates		templates
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
LICENSE		LICENSE
README.md		README.md
connections.py		connections.py
main.py		main.py
model.py		model.py
mongo.py		mongo.py
next.config.js		next.config.js
num		num
old__model.py		old__model.py
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
removed Functions.txt		removed Functions.txt
requirements-local.txt		requirements-local.txt
sleepySoup.py		sleepySoup.py
supabase-schema.sql		supabase-schema.sql
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WebScraper - Next.js Edition.

🚀 Quick Deploy

✨ Features

📊 File Type Support

🛠️ Setup Instructions

1. Clone & Deploy

2. Set Up Supabase Database

3. Configure Environment Variables

4. Initialize Database

5. Deploy & Test

🔧 Development

📱 Usage

Admin Panel (`/admin`)

Search (`/`)

Browse (`/browse`)

Download (`/download`)

🚀 Auto-Deployment Setup

Vercel (Recommended)

Manual Deploy Commands

🏗️ Architecture

🔐 Security

🐛 Troubleshooting

Common Issues

Health Check

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

hybridx/WebScraper

Folders and files

Latest commit

History

Repository files navigation

WebScraper - Next.js Edition.

🚀 Quick Deploy

✨ Features

📊 File Type Support

🛠️ Setup Instructions

1. Clone & Deploy

2. Set Up Supabase Database

3. Configure Environment Variables

4. Initialize Database

5. Deploy & Test

🔧 Development

📱 Usage

Admin Panel (/admin)

Search (/)

Browse (/browse)

Download (/download)

🚀 Auto-Deployment Setup

Vercel (Recommended)

Manual Deploy Commands

🏗️ Architecture

🔐 Security

🐛 Troubleshooting

Common Issues

Health Check

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Admin Panel (`/admin`)

Search (`/`)

Browse (`/browse`)

Download (`/download`)

Packages