Skip to content

kurtnettle/bubt-faculty-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

🎓 BUBT Faculty Scraper

Part of Project "Harukaze🍃"

BUBT Support Chat

License: GPL-3.0 Node.js PNPM TypeScript ESBuild Inquirer Chalk Cheerio Commander Winston XO Rimraf

Overview

Effortlessly extract structured faculty information from the BUBT University website.


Faculty JSON structure

{
  "department": "",
  "name": "",
  "fcode": "",
  "designation": "",
  "room": "",
  "building": "",
  "telephone": {
    "personal": [],
    "office": [],
    "other": []
  },
  "email": {
    "personal": [],
    "office": [],
    "other": []
  },
  "status": "",
  "profileUrl": ""
}

Example Faculty Data

{
  "department": "Computer Science and Engineering",
  "name": "Md. Masudul Islam",
  "fcode": "MDI",
  "designation": "Assistant Professor",
  "room": "421",
  "building": "2",
  "telephone": {
    "office": [
      "016xxxxx"
    ]
  },
  "email": {
    "personal": [
      "masudulislam11@gmail.com"
    ]
  },
  "status": "active",
  "profileUrl": "https://cse.bubt.edu.bd/facultydetails/29/"
}

Note

Phone numbers are intentionally displayed with masked digits in the example.

🌟 Features

Structured Data Extraction:

  • Retrieves faculty details such as names, positions, departments, faculty codes, and categorized contact information.

    [!NOTE]
    While more information is available on university website, the primary goal was to extract the contact details of each faculty member. In the future, I may consider adding more data fields.

Ethical Scraping:

  • Built-in rate limiting and request throttling to ensure ethical scraping and avoid overloading the website.

Flexible Export Options:

  • Easily export extracted data in JSON format, ready for further processing, visualization, or integration into databases or applications.

⚡ Installation

Before you begin, ensure you have the following installed:

  • Node.js (v22.12.0 or later)
  • pnpm (v10.2.1 or later, highly recommended for faster dependency management)

Option 1: Run from Source (Recommended for Developers)

  1. Clone the repository

    git clone https://github.com/kurtnettle/bubt-faculty-scraper.git
  2. Navigate to the Project Directory

    cd bubt-faculty-scraper/js
  3. Install Dependencies

    Install the dependencies using pnpm (Recommended)

    pnpm install

    Alternatively, Using npm or yarn

    npm install # or yarn install
  4. Setup Configuration

    Rename config.example.json to config.json and update the required options.

  5. Run the Faculty Scraper

    pnpm run dev

Option 2: Use Precompiled Release (For End Users)

  1. Download the latest bubt-faculty-scraper.js from Releases page

  2. Create a new file config.json file in the directory you downloaded the bubt-faculty-scraper.js

  3. Copy the contents of config.example.json from the repo then update the required options.

  4. Run with node

    node bubt-faculty-scraper.js

Usage

To run, simply type in your terminal

node bubt-faculty-scraper.js

Commands

Command Description
extract Export processed faculty data
dump Download raw faculty webpage content

Common Options

These options apply to both commands:

Option Description
-D, --list-depts Display available departments
-d, --dept-alias Specify department by alias
-a, --all-dept Select all departments
-S, --list-snapshots Show available snapshot dates for a department

Command-Specific Options

  • extract
Option Description Default
-s, --snapshot Snapshot date (YYYY-MM-DD) Latest available
-o, --output-dir Custom output directory Department snapshot dir

🤝 Contributing

Contributions are welcome! If you'd like to improve the tool or fix bugs, feel free to submit a pull request. Please ensure your changes align with the project's coding standards and include appropriate tests.

📜 License

This project is licensed under the GPLv3 License. See the LICENSE file for full details.

By contributing to this project, you agree that your contributions will be licensed under the GPLv3 License as well.