Skip to content

TelRich/mongodb-eduhub-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

EduHub - MongoDB E-Learning Platform Database

Data Engineering - AltSchool (Tinyuka) 2024 Second Semester Project Exam

A comprehensive MongoDB database system for an online e-learning platform, demonstrating advanced NoSQL database concepts, operations, and optimization techniques.

πŸ“‹ Project Overview

EduHub is a complete database backend for an online learning platform that supports:

  • User management (students and instructors)
  • Course creation and management
  • Student enrollment and progress tracking
  • Assignment submission and grading system
  • Analytics and reporting capabilities
  • Advanced search and filtering functionality

πŸ› οΈ Technical Requirements

  • MongoDB: v8.0 or higher (MongoDB Atlas Cloud or Local)
  • Python: 3.8 or higher
  • Environment: MongoDB Compass (GUI) and MongoDB Shell (CLI)
  • Libraries:
    • pymongo - MongoDB Python driver
    • pandas - Data manipulation and analysis
    • datetime - Date and time operations
    • json - Data serialization

🌐 Database Configuration

This project uses MongoDB Atlas (cloud database) for production deployment with fallback to local MongoDB for development.

Connection Configuration

# Cloud MongoDB Atlas (Primary)
CONNECTION_STRING = "mongodb+srv://username:password@cluster.mongodb.net/"

# Local MongoDB (Development)
CONNECTION_STRING = "mongodb://localhost:27017/"

πŸ—‚οΈ Database Schema

Collections Structure

Users Collection

Stores information about students and instructors:

{
  "userId": "ST_001",
  "email": "student@example.com",
  "firstName": "John",
  "lastName": "Doe",
  "role": "student", // "student" or "instructor"
  "dateJoined": "2024-01-15T10:00:00Z",
  "profile": {
    "bio": "Passionate learner",
    "avatar": "avatar_url",
    "skills": ["Python", "JavaScript"]
  },
  "isActive": true
}

Courses Collection

Contains course information and metadata:

{
  "courseId": "CO_001",
  "title": "Introduction to Python",
  "description": "Learn Python programming fundamentals",
  "instructorId": "IN_001",
  "category": "Programming",
  "level": "beginner", // "beginner", "intermediate", "advanced"
  "duration": 40,
  "price": 99.99,
  "tags": ["python", "programming", "basics"],
  "createdAt": "2024-01-10T09:00:00Z",
  "updatedAt": "2024-01-15T14:30:00Z",
  "isPublished": true,
  "rating": 4.5
}

Enrollments Collection

Tracks student course enrollments:

{
  "enrollmentId": "EN_001",
  "studentId": "ST_001",
  "courseId": "CO_001",
  "enrollmentDate": "2024-02-01T10:00:00Z",
  "status": "active", // "active", "completed", "dropped"
  "progress": 65,
  "completionDate": null
}

Lessons Collection

Stores individual lessons within courses:

{
  "lessonId": "LE_001",
  "courseId": "CO_001",
  "title": "Variables and Data Types",
  "content": "In this lesson, we'll explore...",
  "duration": 30,
  "order": 1,
  "videoUrl": "https://example.com/video1",
  "materials": ["slides.pdf", "code_examples.py"],
  "createdAt": "2024-01-12T11:00:00Z"
}

Assignments Collection

Contains course assignments:

{
  "assignmentId": "AS_001",
  "courseId": "CO_001",
  "title": "Python Basics Quiz",
  "description": "Test your understanding of Python basics",
  "dueDate": "2024-02-15T23:59:59Z",
  "maxPoints": 100,
  "createdAt": "2024-01-20T09:00:00Z",
  "instructions": "Complete all questions and submit your code"
}

Submissions Collection

Tracks student assignment submissions:

{
  "submissionId": "SU_001",
  "assignmentId": "AS_001",
  "studentId": "ST_001",
  "submissionDate": "2024-02-14T18:30:00Z",
  "content": "My solution to the assignment...",
  "attachments": ["solution.py"],
  "grade": 85,
  "feedback": "Great work! Consider optimizing your loops.",
  "gradedDate": "2024-02-16T10:00:00Z"
}

πŸš€ Setup Instructions

Prerequisites

  1. MongoDB Atlas Account (Recommended):

    • Sign up at MongoDB Atlas
    • Create a free cluster
    • Get your connection string
  2. Alternative - Local MongoDB Installation:

    # On macOS using Homebrew
    brew tap mongodb/brew
    brew install mongodb-community@8.0
    
    # On Ubuntu/Debian
    sudo apt-get install mongodb-org
    
    # On Windows
    # Download from https://www.mongodb.com/try/download/community
  3. Install Python Dependencies:

    pip install pymongo pandas jupyter notebook

Database Setup

  1. Configure Connection:

    • Create a config.json file in the notebooks/ directory:
    {
      "cloud_db_url": "mongodb+srv://username:password@cluster.mongodb.net/"
    }
  2. Clone and Navigate to Project:

    git clone <repository-url>
    cd mongodb-eduhub-project
  3. Initialize Database:

    python src/eduhub_queries.py
  4. Run Jupyter Notebook:

    jupyter notebook notebooks/eduhub_mongodb_project.ipynb

πŸŽ† Key Project Features

πŸ”„ No External Dependencies for Data Generation

  • Realistic Educational Data: Removed faker dependency in favor of curated educational content
  • Professional Profiles: Real instructor backgrounds and student interests
  • Industry-Standard Courses: Actual course titles and descriptions from tech industry
  • Academic Authenticity: Proper assignment structures and grading patterns

🌐 MongoDB Atlas Cloud Integration

  • Cloud-First Architecture: Designed for MongoDB Atlas deployment
  • Scalable Infrastructure: Ready for production workloads
  • Connection Flexibility: Supports both cloud and local development
  • Performance Optimized: Cloud-optimized indexing strategies

1. Comprehensive CRUD Operations

  • Create: Add users, courses, enrollments, lessons, assignments
  • Read: Advanced queries with filtering, sorting, and aggregation
  • Update: Profile updates, grade assignments, course modifications
  • Delete: Soft delete users, remove enrollments and lessons

2. Advanced Aggregation Pipelines

  • Course Enrollment Statistics: Enrollment counts and ratings by category
  • Student Performance Analysis: Average grades and completion rates
  • Instructor Analytics: Revenue and student metrics per instructor
  • Trend Analysis: Monthly enrollment trends and engagement metrics

3. Performance Optimization

  • Indexing Strategy: Optimized indexes for common query patterns
  • Query Optimization: Efficient text search and date range queries
  • Performance Monitoring: Query analysis and execution time tracking

4. Data Validation and Integrity

  • Schema Validation: JSON schema validation for all collections
  • Unique Constraints: Prevent duplicate enrollments and users
  • Error Handling: Comprehensive error handling for edge cases

πŸ” Query Examples

Basic Queries

# Find all active students
active_students = db.find_all_active_students()

# Get courses in a specific category
programming_courses = db.get_courses_by_category("Programming")

# Search courses by title
search_results = db.search_courses_by_title("Python")

Advanced Aggregations

# Get course enrollment statistics
stats = db.get_course_enrollment_statistics()

# Analyze student performance
performance = db.get_student_performance_analysis()

# Generate instructor analytics
analytics = db.get_instructor_analytics()

Complex Queries

# Find courses in price range
courses = db.find_courses_by_price_range(50, 200)

# Get recent users (last 6 months)
recent_users = db.get_recent_users(6)

# Find assignments due next week
upcoming = db.get_assignments_due_next_week()

πŸ“ˆ Performance Analysis

Indexing Strategy

  • Primary Indexes: Unique constraints on all identifier fields
  • Secondary Indexes: Optimized for common query patterns
  • Compound Indexes: Multi-field indexes for complex queries
  • Text Indexes: Full-text search on course titles and descriptions

Query Execution Times

  • Basic lookups: ~2ms with proper indexing
  • Complex aggregations: ~50ms for 1000+ documents
  • Text searches: ~8ms with text indexes
  • Date range queries: ~12ms with date indexes

πŸ“ Project Structure

mongodb-eduhub-project/
β”œβ”€β”€ README.md                          # Project documentation
β”œβ”€β”€ .gitignore                         # Git ignore file
β”œβ”€β”€ notebooks/
β”‚   └── eduhub_mongodb_project.ipynb   # Interactive Jupyter notebook
β”œβ”€β”€ src/
β”‚   └── eduhub_queries.py              # Main Python implementation
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ sample_data.json               # Exported sample data
β”‚   └── schema_validation.json         # Schema validation rules
└── docs/
    β”œβ”€β”€ performance_analysis.md        # Performance optimization docs
    └── presentation.pptx              # Project presentation

🎯 Learning Objectives Achieved

βœ… Database Design: Proper schema design with relationships and constraints

βœ… CRUD Operations: Complete implementation of Create, Read, Update, Delete operations

βœ… Advanced Queries: Complex filtering, sorting, and aggregation pipelines

βœ… Performance Optimization: Strategic indexing and query optimization

βœ… Data Validation: Schema validation and error handling

βœ… Analytics: Business intelligence queries and reporting

πŸ”§ Usage Examples

Initialize Database

from src.eduhub_queries import EduHubDatabase

# Initialize with MongoDB Atlas
CONNECTION_STRING = "mongodb+srv://username:password@cluster.mongodb.net/"
db = EduHubDatabase(CONNECTION_STRING)

# Populate with sample data
db.populate_sample_data()

Create Operations

# Add a new student
student_id = db.add_new_student(
    email="john.doe@email.com",
    first_name="John",
    last_name="Doe",
    bio="Aspiring data scientist"
)

# Create a new course
course_id = db.create_new_course(
    title="Advanced Python Programming",
    description="Deep dive into Python advanced concepts",
    instructor_id="IN_001",
    category="Programming",
    level="advanced",
    duration=60,
    price=199.99
)

Analytics Operations

# Get comprehensive analytics
analytics = db.get_advanced_analytics()
print(f"Monthly trends: {analytics['monthly_trends']}")
print(f"Popular categories: {analytics['popular_categories']}")

πŸ§ͺ Testing and Validation

Data Validation Tests

# Test email validation
valid_email = db.validate_email_format("test@example.com")  # Returns True
invalid_email = db.validate_email_format("invalid-email")   # Returns False

# Test user creation with validation
user_data = {
    "userId": "ST_100",
    "email": "test@example.com",
    "firstName": "Test",
    "lastName": "User",
    "role": "student"
}
result = db.validate_and_insert_user(user_data)

Performance Testing

# Analyze query performance
performance = db.analyze_query_performance(
    "courses", 
    {"category": "Programming"}
)
print(f"Execution time: {performance['executionStats']['executionTimeMillis']}ms")

# Optimize slow queries
db.optimize_slow_queries()

πŸ“Š Sample Data Overview

The database is populated with realistic educational data:

  • 20 Users: 15 students and 5 instructors with professional profiles
  • 8 Courses: Comprehensive courses across different tech categories
  • 25 Lessons: Educational content with proper learning progression
  • 10 Assignments: Real-world project assignments
  • 15 Enrollments: Student-course relationships with progress tracking
  • 10+ Submissions: Assignment submissions with realistic grades and feedback

Realistic Data Features

  • Professional Instructors: Former Google, Facebook, Stanford backgrounds
  • Comprehensive Courses: Python, Data Science, AI/ML, Web Development, Cloud Computing
  • Educational Content: Proper lesson structures with video URLs and materials
  • Industry Projects: Real-world assignment descriptions and requirements
  • Academic Grading: Realistic grade distribution and professional feedback

Users Collection

users collection

Courses Collection

course collection

Enrollments Collection

enrollments collection

Lessons Collection

lessons collection

Assignments Collection

assignments collection

Submissions Collection

submissions collection

πŸ” Key Queries and Operations

Business Intelligence Queries

  1. Revenue Analysis: Calculate total revenue per instructor
  2. Engagement Metrics: Track student completion rates
  3. Trend Analysis: Monthly enrollment and completion trends
  4. Performance Metrics: Average grades and top performers

Administrative Operations

  1. User Management: Registration, profile updates, soft deletion
  2. Course Management: Creation, publishing, content organization
  3. Enrollment Tracking: Student progress and completion status
  4. Assignment Grading: Submission handling and feedback

🚨 Error Handling

The system implements comprehensive error handling:

  • Duplicate Key Errors: Prevent duplicate users and enrollments
  • Validation Errors: Ensure data integrity and format compliance
  • Missing Data: Handle incomplete or invalid data gracefully
  • Connection Errors: Robust database connection management

🎨 Data Visualization

The Jupyter notebook includes data visualization examples:

  • Enrollment trends over time
  • Course popularity by category
  • Student performance distributions
  • Instructor analytics dashboards

πŸ”§ Maintenance and Monitoring

Available Monitoring Tools

# Get collection statistics
stats = db.get_collection_statistics()

# Export data for backup
db.export_sample_data("backup_data.json")

# Analyze database performance
info = db.get_database_info()

🀝 Contributing

This project is part of the AltSchool Data Engineering curriculum. For academic purposes:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ‘¨β€πŸ’» Author

Goodrich Okoro

πŸ† Project Status

βœ… Database Design and Setup - Complete

βœ… Data Population - Complete

βœ… CRUD Operations - Complete

βœ… Advanced Queries - Complete

βœ… Performance Optimization - Complete

βœ… Documentation - Complete


This project demonstrates comprehensive understanding of MongoDB database concepts, operations, and best practices for modern web applications.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published