A Python-based tool for scraping tweets from your Twitter timeline and storing them in Supabase with duplicate detection.
Before using this tool, please be aware of the following security considerations:
-
Supabase & API Key Security:
- Never commit your
.envfile to version control - Keep your Supabase keys secure and rotate them if compromised
- Monitor your database for any unauthorized access
- Never commit your
-
Chrome Profile Security:
- The tool uses your Chrome profile to access Twitter
- This means it has access to your Twitter session
- Use a separate Chrome profile for scraping to limit access to your main account
- Never share your Chrome user data directory
-
Twitter Terms of Service:
- This tool is for educational purposes only
- Ensure you comply with Twitter's Terms of Service
- Do not use this tool for:
- Mass scraping
- Automated posting
- Data collection for commercial purposes without permission
- Any activity that could harm Twitter's services
-
Best Practices:
- Use a dedicated Twitter account for testing
- Regularly clear your Chrome profile data
- Monitor your Twitter account for any suspicious activity
- Keep your ChromeDriver and Chrome browser updated
- β Scrape tweets from your Twitter home timeline
- β Extract tweet ID, content, and author
- β Save to Supabase cloud database
- β Automatic duplicate detection and prevention
- β FastAPI REST API for programmatic access
- β Can run repeatedly without duplicates
- β Cloud-based data storage with Supabase dashboard
- Python 3.8 or higher
- Supabase account (free tier available)
- Google Chrome browser
- ChromeDriver (matching your Chrome version)
git clone https://github.com/VinVirtual/Twitter-Timeline-Scrapper-FastAPI.git
cd Twitter-Timeline-Scraper-FastAPIpython -m venv venv
# On Windows
venv\Scripts\activate
# On Unix or MacOS
source venv/bin/activatepip install -r requirements.txt- Create Supabase account: Go to https://supabase.com and sign up
- Create new project: Click "New Project" and wait for setup
- Run database schema:
- Go to SQL Editor in your Supabase dashboard
- Copy the contents of
supabase-schema.sql - Paste and run in SQL Editor
- Get credentials:
- Go to Settings > API
- Copy
URLandanon publickey
cp env.example .envEdit .env with your values:
# Supabase
SUPABASE_URL=https://xxxxxxxxxxxxx.supabase.co
SUPABASE_ANON_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
# Chrome (update paths for your system)
CHROME_USER_DATA_DIR=/Users/YourUsername/Library/Application Support/Google/Chrome
CHROME_PROFILE_NAME=Default
CHROME_BINARY_LOCATION=/Applications/Google Chrome.app/Contents/MacOS/Google Chrome
CHROMEDRIVER_PATH=/usr/local/bin/chromedrivermacOS:
- User Data:
~/Library/Application Support/Google/Chrome - Binary:
/Applications/Google Chrome.app/Contents/MacOS/Google Chrome - ChromeDriver: Install with
brew install chromedriver
Windows:
- User Data:
C:\Users\YourUsername\AppData\Local\Google\Chrome\User Data - Binary:
C:\Program Files\Google\Chrome\Application\chrome.exe - ChromeDriver: Download from https://chromedriver.chromium.org/
Linux:
- User Data:
~/.config/google-chrome - Binary:
/usr/bin/google-chrome - ChromeDriver:
sudo apt install chromium-chromedriver
# Scrape 10 tweets (default)
python run_scraper.py
# Scrape custom number of tweets
python run_scraper.py 20from app.scraper import TwitterScraperSupabase
# Create scraper
scraper = TwitterScraperSupabase()
# Scrape 10 tweets from your timeline
result = scraper.scrape_and_save(max_tweets=10)
print(f"Saved: {result['saved']}")
print(f"Skipped (duplicates): {result['skipped_duplicates']}")
# Get all tweets from database
tweets = scraper.get_all_tweets()
# Get statistics
stats = scraper.get_stats()
print(f"Total tweets: {stats['total_tweets']}")Start the server:
uvicorn app.main:app --reload --port 8000API Endpoints:
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
API documentation |
| POST | /scrape?max_tweets=10 |
Scrape tweets from timeline |
| GET | /tweets?limit=100 |
Get all tweets |
| GET | /tweets/recent?limit=10 |
Get recent tweets |
| GET | /tweets/author/{username} |
Get tweets by author |
| GET | /stats |
Database statistics |
| DELETE | /tweets/{tweet_id} |
Delete specific tweet |
| GET | /health |
Health check |
Example API calls:
# Scrape 10 tweets
curl -X POST "http://localhost:8000/scrape?max_tweets=10"
# Get all tweets
curl "http://localhost:8000/tweets"
# Get stats
curl "http://localhost:8000/stats"
# Get tweets by author
curl "http://localhost:8000/tweets/author/elonmusk"The scraper creates a timeline_tweets table in Supabase:
CREATE TABLE timeline_tweets (
id BIGSERIAL PRIMARY KEY,
tweet_id VARCHAR(50) UNIQUE NOT NULL,
content TEXT NOT NULL,
author VARCHAR(50),
tweet_url TEXT,
scraped_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);- Opens Twitter: Uses Selenium with your Chrome profile
- Scrolls timeline: Collects specified number of tweets
- Extracts data: Gets tweet ID from
/status/URL, content, and author - Checks duplicates: Queries Supabase for existing tweet_id
- Saves to database: Inserts new tweets, skips duplicates
- Returns results: Summary of saved vs skipped tweets
π Starting tweet collection (max: 10)...
π Found 20 tweet elements on page
π₯ Collected tweet 1/10: 1234567890 by @elonmusk
π₯ Collected tweet 2/10: 1234567891 by @naval
β
Collected 10 tweets
β
Saved new tweet: 1234567890 by @elonmusk
βοΈ Skipping duplicate tweet: 1234567892
β
Saved new tweet: 1234567893 by @sama
==================================================
π SCRAPING SUMMARY:
Total Collected: 10
β
Saved: 8
βοΈ Skipped (duplicates): 2
==================================================
π DATABASE STATS:
Total tweets in database: 45
Unique authors: 12
- Go to your Supabase project dashboard
- Click "Table Editor" in the sidebar
- Select "timeline_tweets" table
- View, filter, and export your scraped tweets
In Supabase SQL Editor:
-- Get all tweets
SELECT * FROM timeline_tweets ORDER BY scraped_at DESC;
-- Count tweets by author
SELECT author, COUNT(*) as count
FROM timeline_tweets
GROUP BY author
ORDER BY count DESC;
-- Get today's tweets
SELECT * FROM timeline_tweets
WHERE scraped_at >= CURRENT_DATE;
-- Use built-in views
SELECT * FROM recent_timeline_tweets;
SELECT * FROM tweets_by_author;# Edit crontab
crontab -e
# Run every 10 minutes
*/10 * * * * cd /path/to/project && python -c "from app.scraper import TwitterScraperSupabase; TwitterScraperSupabase().scrape_and_save(max_tweets=10)"- Open Task Scheduler
- Create Basic Task
- Set trigger (e.g., every 10 minutes)
- Action: Start a program
- Program:
python - Arguments:
-c "from app.scraper import TwitterScraperSupabase; TwitterScraperSupabase().scrape_and_save(max_tweets=10)"
Twitter-Timeline-Scraper-FastAPI/
βββ app/
β βββ __init__.py
β βββ scraper.py # Main scraper with Supabase
β βββ main.py # FastAPI server
βββ run_scraper.py # Simple run script
βββ supabase-schema.sql # Database schema
βββ env.example # Environment template
βββ requirements.txt # Python dependencies
βββ .gitignore # Git ignore rules
βββ README.md # This file
- Check your
SUPABASE_URLandSUPABASE_ANON_KEYin.env - Verify you ran the SQL schema in Supabase
- Check your internet connection
- Make sure Chrome is installed
- Verify ChromeDriver version matches Chrome version
- Check Chrome paths in
.envare correct - Ensure you're logged into Twitter in that Chrome profile
- Make sure you're logged into Twitter in the specified Chrome profile
- Check that
CHROME_PROFILE_NAMEis correct - Try opening Chrome manually with that profile first
- This is expected behavior! The scraper skips duplicates
- Check the logs to see which tweets were skipped
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
This tool is for educational purposes only. Please ensure you comply with Twitter's Terms of Service and API usage guidelines when using this scraper. The authors are not responsible for any misuse of this tool or any consequences resulting from such misuse.
- Selenium - Web scraping automation
- FastAPI - Modern web framework
- Supabase - Cloud PostgreSQL database
- Python - Programming language