Part of Project "Harukaze🍃"
Effortlessly extract structured faculty information from the BUBT University website.
Faculty JSON structure
{
"department": "",
"name": "",
"fcode": "",
"designation": "",
"room": "",
"building": "",
"telephone": {
"personal": [],
"office": [],
"other": []
},
"email": {
"personal": [],
"office": [],
"other": []
},
"status": "",
"profileUrl": ""
}
Example Faculty Data
{
"department": "Computer Science and Engineering",
"name": "Md. Masudul Islam",
"fcode": "MDI",
"designation": "Assistant Professor",
"room": "421",
"building": "2",
"telephone": {
"office": [
"016xxxxx"
]
},
"email": {
"personal": [
"masudulislam11@gmail.com"
]
},
"status": "active",
"profileUrl": "https://cse.bubt.edu.bd/facultydetails/29/"
}
Note
Phone numbers are intentionally displayed with masked digits in the example.
Structured Data Extraction:
-
Retrieves faculty details such as names, positions, departments, faculty codes, and categorized contact information.
[!NOTE]
While more information is available on university website, the primary goal was to extract the contact details of each faculty member. In the future, I may consider adding more data fields.
Ethical Scraping:
- Built-in rate limiting and request throttling to ensure ethical scraping and avoid overloading the website.
Flexible Export Options:
- Easily export extracted data in JSON format, ready for further processing, visualization, or integration into databases or applications.
Before you begin, ensure you have the following installed:
- Node.js (v22.12.0 or later)
- pnpm (v10.2.1 or later, highly recommended for faster dependency management)
-
Clone the repository
git clone https://github.com/kurtnettle/bubt-faculty-scraper.git
-
Navigate to the Project Directory
cd bubt-faculty-scraper/js
-
Install Dependencies
Install the dependencies using
pnpm
(Recommended)pnpm install
Alternatively, Using npm or yarn
npm install # or yarn install
-
Setup Configuration
Rename
config.example.json
toconfig.json
and update the required options. -
Run the Faculty Scraper
pnpm run dev
-
Download the latest
bubt-faculty-scraper.js
from Releases page -
Create a new file
config.json
file in the directory you downloaded thebubt-faculty-scraper.js
-
Copy the contents of
config.example.json
from the repo then update the required options. -
Run with
node
node bubt-faculty-scraper.js
To run, simply type in your terminal
node bubt-faculty-scraper.js
Command | Description |
---|---|
extract
|
Export processed faculty data |
dump
|
Download raw faculty webpage content |
These options apply to both commands:
Option | Description |
---|---|
-D, --list-depts |
Display available departments |
-d, --dept-alias |
Specify department by alias |
-a, --all-dept |
Select all departments |
-S, --list-snapshots |
Show available snapshot dates for a department |
extract
Option | Description | Default |
---|---|---|
-s, --snapshot |
Snapshot date (YYYY-MM-DD) | Latest available |
-o, --output-dir |
Custom output directory | Department snapshot dir |
Contributions are welcome! If you'd like to improve the tool or fix bugs, feel free to submit a pull request. Please ensure your changes align with the project's coding standards and include appropriate tests.
This project is licensed under the GPLv3 License. See the LICENSE file for full details.
By contributing to this project, you agree that your contributions will be licensed under the GPLv3 License as well.