A Scrapy-based web crawler for extracting detailed information about Pokémon from pokemondb.net. This project scrapes structured data including forms, types, stats, and more, and stores it in a local MongoDB database.
- Crawls Pokémon data starting from a single Pokémon page
- Extracts structured data:
- Name, Form, Types, Species
- Pokédex Index (National and Regional)
- Height, Weight, Abilities
- Training and Breeding Information
- Base Stats and Total Stats
- Stores data in MongoDB (
pokemon_db.pokedex) - Uses
.envconfiguration for secure DB connection - Modular codebase with clean separation (spider, pipeline, DB client)
- Python 3.12+
- Scrapy
- MongoDB
- pymongo
- BeautifulSoup4
- python-dotenv
pokemon_scraper/
├── spiders/
│ └── pokemon_spider.py # Scrapes Pokémon data
├── items.py # Defines the data structure
├── pipelines.py # Handles MongoDB insertion
├── connection.py # MongoDB client using .env
├── middlewares.py # Default Scrapy middlewares
├── settings.py # Scrapy configuration
.env # MongoDB connection settings (not committed)
git clone https://github.com/your-username/Pokemon-WebCrawler.git
cd Pokemon-WebCrawlerpython -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activatepip install -r requirements.txtCreate a .env file in the project root:
MONGO_URI=mongodb://localhost:27017/
MONGO_DB_NAME=pokemon_dbEnsure MongoDB is running locally on localhost:27017.
scrapy crawl pokemon- Total Pages Crawled: 1,025
- Total Pokémon Entries Extracted: 1,215 (including alternate forms)
{
"name": "Bulbasaur",
"form": null,
"index": 1,
"types": ["Grass", "Poison"],
"species": "Seed Pokémon",
"height": "0.7 m",
"weight": "6.9 kg",
"abilities": ["Overgrow", "Chlorophyll"],
"local_index": {
"Kanto": 1
},
"training": {
"EV yield": "1 Special Attack",
"Base EXP": "64"
},
"breeding": {
"Egg group": "Monster, Grass"
},
"base_stats": {
"HP": 45,
"Attack": 49,
"Defense": 49,
"Total": 318
}
}- The crawler currently starts from a manually specified Pokémon page (e.g.,
https://pokemondb.net/pokedex/wo-chien). - Crawling the full Pokédex using
/pokedex/allis not yet implemented. - All Pokémon forms are stored separately — no deduplication is performed.
Arabind Meher