A modern, full-featured web scraper built with Next.js and Supabase. Crawl directory listings, classify files, and search through media collections with a beautiful UI.
- π Search Interface: Advanced search with file type filtering
- π Browse Page: Paginated file viewer (10/20/50 per page)
- π₯ Download Manager: Bulk downloads in TXT, wget, and aria2 formats
- βοΈ Admin Panel: URL management and crawler controls
- π€ Smart Crawler: Enhanced Node.js crawler with JSDOM + regex fallback
- π¨ Modern UI: Beautiful, responsive design with Tailwind CSS
- βοΈ Cloud Database: Powered by Supabase PostgreSQL
Automatically classifies and searches:
- π₯ Video: mp4, mkv, avi, mov, mpg, mpeg, wmv, m4v
- π΅ Audio: mp3, wav, ogg, wma, aif, mid, midi, mpa, wpl
- ποΈ Compressed: zip, rar, 7z, tar.gz, deb, pkg, arj
- πΏ Disk Images: iso, dmg, bin, toast, vcd
- πΌοΈ Images: jpg, png, gif, bmp, svg, ico, tif
- π Documents: pdf, txt, doc, docx, rtf, wpd, odt
- βοΈ Executables: exe, apk, bat, com, jar, py, wsf
git clone https://github.com/hybridx/WebScraper.git
cd WebScraper
npm install
- Create Supabase Project: Go to supabase.com
- Get Credentials:
- Project URL:
https://[project-id].supabase.co
- Anon Key:
eyJ...
(from Settings β API)
- Project URL:
In your deployment platform (Vercel/Netlify/etc.), add:
SUPABASE_URL=https://your-project-id.supabase.co
SUPABASE_ANON_KEY=your_anon_key_here
- Go to your Supabase project β SQL Editor
- Copy and paste the content from
supabase-schema.sql
- Click Run to create all tables and indexes
The application will auto-deploy when you push to git. Check the admin panel at /admin
for system status.
# Install dependencies
npm install
# Set up environment variables
cp .env.example .env.local
# Edit .env.local with your Supabase credentials
# Run development server
npm run dev
# Open http://localhost:3000
- Password:
admin123
(change in API routes) - Add URLs: Submit directory listing URLs to crawl
- Monitor Status: Real-time system health and database status
- Manage URLs: View, delete, and track crawled URLs
- Smart Search: Find files by name or URL
- Type Filtering: Filter by media type (video, audio, etc.)
- Real-time Results: Instant search with pagination
- Paginated View: 10, 20, or 50 files per page
- Bulk Operations: Select multiple files for download
- Filter by Type: Click stats cards to filter results
- Multiple Formats: TXT, wget script, or aria2 download files
- Bulk Downloads: Generate download files for all or filtered results
- Connect Repository: Import your GitHub repository to Vercel
- Configure Environment Variables: Add
SUPABASE_URL
andSUPABASE_ANON_KEY
- Auto-Deploy: Every git push to
master
/main
automatically deploys
# Deploy to production
npx vercel --prod
# View deployment logs
npx vercel logs
# Check environment variables
npx vercel env ls
- Frontend: Next.js 14 with TypeScript
- Styling: Tailwind CSS with responsive design
- Database: Supabase PostgreSQL with Row Level Security
- Crawler: Node.js with JSDOM and regex fallback
- Deployment: Vercel with git-based auto-deployment
- File Classification: 27+ supported file types
- Environment Variables: All sensitive data via environment variables
- Password Protection: Admin panel protected with configurable password
- Database Security: Supabase RLS policies (customizable)
- Input Validation: URL validation and SQL injection prevention
- 500 Errors: Check environment variables in admin panel health status
- Database Connection: Ensure Supabase URL and key are correct
- Missing Tables: Run the SQL schema in Supabase SQL Editor
- Crawler Not Working: Check admin panel for detailed error messages
Visit /api/health
to see detailed system status including:
- Environment variable configuration
- Database connection status
- Deployment information
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is open source and available under the MIT License.
- Built with Next.js
- Database by Supabase
- UI components with Tailwind CSS
- Icons by Lucide React
Ready to scrape the web? πΈοΈ Deploy now and start discovering media files!