This project provides a FastAPI-based backend for scraping, storing, and serving student data from the VIT VTOP portal. It is designed to automate the retrieval of student information such as profile, attendance, marks, timetable, and grade history, and make it accessible via RESTful API endpoints.
- Automated Scraping: Uses headless HTTP requests to log in and scrape student data from VTOP.
- Session Management: Handles sessions and CSRF tokens securely for each user.
- Data Storage: Persists all scraped data in a local SQLite database using SQLAlchemy ORM.
- REST API: Exposes endpoints for login, scraping, and fetching student data (profile, marks, attendance, timetable, etc.).
- Periodic Cleanup: Cleans up expired sessions automatically.
- Modular Codebase: Organized into routers, utilities, and scraping modules for maintainability.
Below are visual diagrams representing the workflow and endpoints of the project. These diagrams illustrate the API flow, required parameters.
.
├── main.py # FastAPI app entry point
├── database.py # SQLAlchemy DB setup and session management
├── models.py # SQLAlchemy ORM models
├── requirements.txt # Python dependencies
├── routers/ # FastAPI routers (student, llm)
├── streamlit_app.py # Streamlit UI for interacting with the application
├── utils/ # Utility modules (session, scraping, validation)
│ └── scrape/ # HTML scraping logic for each VTOP page
├── json_structure # contains response model for different endpoints.
-
Session Creation:
- A session is created for each student using their registration number.
- Session and CSRF token are managed in-memory for secure requests.
-
Login Flow:
- The API simulates the VTOP login process, including captcha handling and CSRF token management.
- On successful login, a session is established for subsequent scraping.
-
Scraping:
- The backend scrapes various VTOP pages (profile, attendance, marks, timetable, grade history, cgpa details, credit info etc...) using BeautifulSoup.
- Each type of data has a dedicated scraping module under
utils/scrape/.
-
Data Storage:
- Scraped data is stored in the
studentstable in SQLite, with columns for each data type (profile, marks, etc.).
- Scraped data is stored in the
-
API Endpoints:
- Endpoints are provided for session creation, login, scraping, and data retrieval.
- Data can be fetched per student and per semester.
-
Session Cleanup:
- Expired sessions are cleaned up every 10 minutes to free resources.
-
POST /student/create_session?reg_no=22BCE1519Creates a session for the given registration number. -
POST /student/prepare_loginPrepares for login and returns captcha image. -
POST /student/loginLogs in with registration number, password, and captcha. -
GET /student/start-scraping?reg_no=22BCE1519Scrapes all student data and stores it in the database. -
GET /student/logout?reg_no=22BCE1519Logs out and deletes all data for the student.
-
GET /llm/profile?reg_no=22BCE1519Returns the student's profile. -
GET /llm/semesters?reg_no=22BCE1519Returns all available semesters for the student. Note: Thesem_idvalues required for other endpoints can be found in the response from this endpoint. -
GET /llm/marks?reg_no=22BCE1519&sem_id=CH20242505Returns marks for a specific semester. If you omit thesem_idparameter, marks for all semesters will be returned. -
GET /llm/attendance?reg_no=22BCE1519&sem_id=CH20242505Returns attendance for a specific semester. If you omit thesem_idparameter, attendance for all semesters will be returned. -
GET /llm/timetable?reg_no=22BCE1519&sem_id=CH20242505Returns timetable for a specific semester. If you omit thesem_idparameter, timetable for all semesters will be returned. -
GET /llm/grade_history?reg_no=22BCE1519Returns grade history.
Tip: The
sem_idparameter for marks, attendance, and timetable endpoints can be obtained from the/llm/semestersendpoint. If you do not provide asem_id, the API will return data for all semesters.
To set up and run the Streamlit application:
-
Create a Virtual Environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install Dependencies:
pip install -r requirements.txt
-
Run the Application:
streamlit run streamlit_app.py
The application will be accessible at
http://localhost:8501.
- Fork the repository.
- Create your feature branch (
git checkout -b feature/fooBar). - Commit your changes.
- Push to the branch (
git push origin feature/fooBar). - Create a new Pull Request.
- FastAPI
- SQLAlchemy
- BeautifulSoup
- VIT VTOP (for the data source)