🕸️ Universal Python Article Scraper

A flexible Python web scraper that extracts articles from Joomla, WordPress, Drupal, and JavaScript-heavy websites without requiring manual structure configuration.

📂 Contents

universal_scraper.py: Scraper for static HTML pages
js_scraper.py: Scraper with JavaScript support using Selenium

🛠️ Installation

Clone the repository:

git clone https://github.com/nickpsal/python_scrapper.git
cd python_scrapper

Create a virtual environment and install dependencies:

python3 -m venv venv
source venv/bin/activate   # Linux/macOS
.\venv\Scripts\activate    # Windows

pip install requests beautifulsoup4 newspaper3k selenium webdriver-manager lxml

⚙️ Usage

We run python scraper.py category_url for example python scraper.py https://www.cosmopolitan.com/style-beauty/fashion

Then at first it uses the Static Scraper

In case it didn't find any articles it automatically switches to the Dynamic JS Scraper

📑 Static Scraper (universal_scraper.py)

Used for basic HTML pages without heavy JavaScript.

📌 Outputs all articles with articles with:

Title
URL
Main content

🧠 Dynamic JS Scraper (js_scraper.py)

Uses Selenium for sites that load content via JavaScript.

🔍 How It Works

🔗 Detects potential articles by URL patterns (e.g., /2024/, slug with -)
📰 Tries to extract content using newspaper3k
🔁 Falls back to BeautifulSoup if necessary
💾 Displays title and full article text
💤 Adds 1-second delay to avoid overwhelming servers

✅ Supported Websites

Tested with:

Joomla, WordPress, Drupal (static or SEO-friendly URLs)
GlamourMagazine (JS-rendered content)
Any site with accessible <a href="..."> article links

📦 Requirements

requests
beautifulsoup4
newspaper4k
selenium
webdriver-manager
lxml

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.idea		.idea
__pycache__		__pycache__
README.md		README.md
js_scraper.py		js_scraper.py
scraper.py		scraper.py
universal_scraper.py		universal_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🕸️ Universal Python Article Scraper

📂 Contents

🛠️ Installation

⚙️ Usage

📑 Static Scraper (universal_scraper.py)

🧠 Dynamic JS Scraper (js_scraper.py)

🔍 How It Works

✅ Supported Websites

📦 Requirements

📜 License

About

Uh oh!

Releases

Packages

Languages

nickpsal/python_scrapper

Folders and files

Latest commit

History

Repository files navigation

🕸️ Universal Python Article Scraper

📂 Contents

🛠️ Installation

⚙️ Usage

📑 Static Scraper (universal_scraper.py)

🧠 Dynamic JS Scraper (js_scraper.py)

🔍 How It Works

✅ Supported Websites

📦 Requirements

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages