This project implements a cloud-native, event-driven backend system capable of asynchronously processing computationally intensive tasks. The system is designed for scalability, fault-tolerance, and high availability using core AWS services.
- API Service (Spring Boot 3.x, Java 17):
- Exposes REST endpoints for submitting tasks (
POST /tasks) and retrieving task status (GET /tasks/{taskId}). - Immediately returns HTTP 201 (Created) upon task submission to enable non-blocking processing.
- Exposes REST endpoints for submitting tasks (
- Worker Service:
- Languages & Frameworks: Java 17, Spring Boot 3.x, Jackson, Lombok, Apache Log4j2
- Cloud Platform: Amazon Web Services (AWS)
- Core AWS Services:
- Amazon SQS (Simple Queue Service)
- Amazon DynamoDB
- Amazon S3 (Simple Storage Service)
- AWS CloudWatch (logging and monitoring)
- IAM (access control)
- Build Tools: Maven
- Version Control: Git, GitHub
- ✅ Asynchronous Task Submission & Execution
- ✅ Decoupled API and Worker Microservices
- ✅ Persistent Task State Management (PENDING, RUNNING, COMPLETED, FAILED)
- ✅ Durable Object Storage for Processed Files
- ✅ Robust Error Handling with Automatic Retry Logic
- ✅ CloudWatch Logging for Operational Monitoring
API Service (Task Submission):
- Input: Take input from the user.
- Graceful Queueing: Persists initial task status to DynamoDB before sending to SQS. If SQS send fails, the task state is still recorded, preventing lost tasks (though it would require manual re-queueing or a reconciliation process).
- Immediate Client Feedback: Returns 202 Accepted status immediately, even if background processing might eventually fail, ensuring the client isn't blocked.
Worker Service (Task Processing):
- Stage-Specific Error Handling: Employs granular try-catch blocks within the processMessage method to specifically handle failures at each stage (e.g., JsonProcessingException for message deserialization, IOException for image download/upload, S3Exception for S3 issues, DynamoDbException for DB updates). Task Status Update to FAILED: Upon any processing error, the task's status in DynamoDB is promptly updated to FAILED, along with a failureReason, providing transparent error visibility to the client.
- SQS Message Retries (via Visibility Timeout): Messages are deliberately NOT deleted from SQS on processing failure. This allows the SQS Visibility Timeout to expire, making the message visible again for other workers to retry, improving resilience against transient issues.
- Detailed Logging: Comprehensive logger.error statements are used throughout with stack traces (e parameter) to provide precise diagnostic information in CloudWatch Logs, crucial for troubleshooting.
- Idempotency: Worker logic is designed to tolerate reprocessing the same message multiple times (e.g., by checking DynamoDB status before major work, S3 uploads overwriting).
Demonstrates hands-on experience in building scalable, event-driven, cloud-native backend systems using AWS managed services and microservice principles.
