This repository provides two complementary solutions for PDF accessibility:
- PDF-to-PDF Remediation: Processes PDFs and maintains the PDF format while improving accessibility.
 - PDF-to-HTML Remediation: Converts PDFs to accessible HTML format.
 
Both solutions leverage AWS services and generative AI to improve content accessibility according to WCAG 2.1 Level AA standards.
| Index | Description | 
|---|---|
| Architecture Overview | High level overview illustrating component interactions | 
| Automated One Click Deployment | How to deploy the project | 
| Testing Your PDF Accessibility Solution | User guide for the working solution | 
| PDF-to-PDF Remediation Solution | PDF format preservation solution details | 
| PDF-to-HTML Remediation Solution | HTML conversion solution details | 
| Monitoring | System monitoring and observability | 
| Troubleshooting | Common issues and solutions | 
| Contributing | How to contribute to the project | 
The following architecture diagram illustrates the various AWS components utilized to deliver the solution.
We provide a unified deployment script that allows you to deploy either or both the solutions with a single command. Choose your preferred solution during deployment:
Common Requirements:
- AWS Account with appropriate permissions to create and manage AWS resources
- See IAM Permissions Guide for detailed permission requirements
 
 - AWS CloudShell access (AWS CLI is pre-installed and configured automatically)
- Sign in to the AWS Management Console
 - In the top navigation bar, click the CloudShell icon (terminal symbol) next to the search bar
 - Wait for CloudShell to initialize (this may take a few moments on first use)
 
 - Enable AWS Bedrock Nova-Pro model in your AWS account (For PDF to PDF remediation)
Enable AWS Bedrock Nova-Lite model in your AWS account (For PDF to HTML remediation)- Request access to Amazon Bedrock through the AWS console if not already enabled
 - Navigate to the AWS Bedrock console
 - Click "Model access" in the left navigation pane
 - Click "Manage model access"
 - Find Nova-Pro/Nova-Lite in the list and select the checkbox
 - Click "Save changes" and wait for access to be granted
 
 
Solution-Specific Requirements:
- PDF-to-PDF:
- Adobe API Access - An enterprise-level contract or a trial account (For Testing) for Adobe's API is required.
- Adobe PDF Services API to obtain API credentials.
 
 
 - Adobe API Access - An enterprise-level contract or a trial account (For Testing) for Adobe's API is required.
 - PDF-to-HTML: AWS Bedrock Data Automation service access
- Ensure you have access to create a Bedrock Data Automation project - usually present by default
 
 
Step 1: Open AWS CloudShell and Clone the Repository
git clone https://github.com/ASUCICREPO/PDF_Accessibility.git
cd PDF_AccessibilityStep 2: Run the Unified Deployment Script
chmod +x deploy.sh
./deploy.shStep 3: Follow the Interactive Prompts
The script will guide you through:
- Solution Selection: Choose between PDF-to-PDF or PDF-to-HTML remediation
 - Solution-Specific Setup:
- PDF-to-PDF: Enter Adobe API credentials (stored securely in AWS Secrets Manager)
 - PDF-to-HTML: Automatic creation of Bedrock Data Automation project
 
 - Automated Deployment: Real-time monitoring of the deployment progress
 - Optional UI Deployment: After successful deployment of your chosen solution(s), you'll have the option to deploy a user interface as well
 
Step 4: Test Your Deployment
After successful deployment, the script provides specific testing instructions for your chosen solution.
- 
Navigate to Your S3 Bucket
- In the AWS S3 Console, find the bucket starting with 
pdfaccessibility- - This bucket was automatically created during deployment
 
 - In the AWS S3 Console, find the bucket starting with 
 - 
Create the Input Folder
- Create a folder named 
pdf/in the root of the bucket - This is where you'll upload PDFs for processing
 
 - Create a folder named 
 - 
Upload Your PDF Files
- Upload any PDF file(s) to the 
pdf/folder - Bulk Processing: You can upload multiple PDFs in the bucket for batch remediation
 - The process automatically triggers when files are uploaded
 
 - Upload any PDF file(s) to the 
 - 
Monitor Processing
- Temporary Files: A 
temp/folder will be created containing intermediate processing files - Final Results: A 
result/folder will be created with your accessibility-compliant PDF files - Use the CloudWatch dashboard to monitor processing progress
 
 - Temporary Files: A 
 - 
Download Results
- Navigate to the 
result/folder to access your remediated PDFs - Files maintain their original names with "COMPLIANT" prefix after accessibility improvements applied
 
 - Navigate to the 
 
- 
Navigate to Your S3 Bucket
- In the AWS S3 Console, find the bucket starting with 
pdf2html-bucket- - This bucket was automatically created during deployment
 
 - In the AWS S3 Console, find the bucket starting with 
 - 
Upload Your PDF Files
- Navigate to the 
uploads/folder (created automatically during deployment) - Bulk Processing: You can upload multiple PDFs in the bucket for batch remediation
 - The process automatically triggers when files are uploaded
 
 - Navigate to the 
 - 
Monitor Processing
- Two folders will be created automatically:
output/: Contains temporary processing data and intermediate filesremediated/: Contains the final remediated results
 
 - Two folders will be created automatically:
 - 
Access Your Results
- Navigate to the 
remediated/folder - Download the zip file named 
final_{your-filename}.zip 
 - Navigate to the 
 - 
Explore the Remediated Content The downloaded zip file contains:
remediated.html: Final accessibility-compliant HTML versionresult.html: Original HTML conversion (before remediation)images/folder: Extracted images with generated alt textremediation_report.html: Detailed report of accessibility improvements madeusage_data.json: Processing metrics and usage statistics
 
Redeployment After initial deployment, you can redeploy using the created CodeBuild project:
aws codebuild start-build --project-name YOUR-PROJECT-NAME --source-version mainOr simply re-run the deployment script and choose the solution your want redeploy.
This solution processes PDFs while maintaining the original PDF format. It uses AWS CDK to build infrastructure that splits PDFs into chunks, processes them via AWS Step Functions, and merges the results using ECS tasks.
- S3 Bucket: Stores input and processed PDFs
 - Lambda Functions: PDF splitting, merging, and accessibility checking
 - Step Functions: Orchestrates the processing workflow
 - ECS Fargate: Runs containerized processing tasks
 - CloudWatch Dashboard: Monitors progress and performance
 
For detailed manual deployment instructions, see our Manual Deployment Guide.
This solution converts PDF documents to accessible HTML format while preserving layout and visual appearance. It leverages AWS Bedrock Data Automation for PDF parsing and uses a serverless Lambda architecture.
- S3 Bucket: Stores input PDFs and remediated HTML files
 - Lambda Function: Processes PDFs using containerized accessibility utility
 - ECR Repository: Hosts the Docker image for Lambda
 - Bedrock Data Automation: Provides PDF parsing and extraction capabilities
 
- CloudWatch Dashboard: Automatically created during deployment
 - Step Functions Console: Monitor workflow executions
 - ECS Console: Track container task status
 
- Lambda Logs: 
/aws/lambda/Pdf2HtmlPipeline - S3 Events: Monitor file processing status
 - CloudWatch Metrics: Track function performance
 
AWS Credentials
- Ensure AWS CLI is configured with appropriate permissions
 - Verify access to required AWS services (S3, Lambda, ECS, Bedrock)
 
Service Limits
- Check AWS service quotas if deployment fails
 - Request additional Elastic IPs if needed: EC2 Service Quotas
 
Build Failures
- Check CodeBuild console for detailed error messages
 - Verify all prerequisites are met
 - Ensure Docker is available for PDF-to-HTML deployments
 
PDF-to-PDF Issues
- Verify Adobe API credentials are correct and active
 - Check CloudWatch logs for Lambda functions and ECS tasks
 - Ensure NOVA_PRO Bedrock model access is granted
 
PDF-to-HTML Issues
- Verify Bedrock Data Automation permissions
 - Check Lambda function logs in CloudWatch
 - Ensure Docker image was pushed to ECR successfully
 
- Check build logs in CodeBuild console
 - Review CloudWatch logs for runtime issues
 - Verify all prerequisites are met
 - For deployment issues, refer to: CDK GitHub Issue
 - For additional troubleshooting: Troubleshooting Guide
 - Contact support: ai-cic@amazon.com
 
Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes.
The PDF-to-HTML remediation functionality in this project is adapted from AWS Labs' Content Accessibility Utility on AWS. This version includes updates and enhancements tailored for integration within the PDF Accessibility backend.
For questions, issues, or support:
- Email: ai-cic@amazon.com
 - Issues: GitHub Issues
 
Built by Arizona State University's AI Cloud Innovation Center (AI CIC)
Powered by AWS
