In this project, we will build a practical intelligent document processing system that automatically extracts data from forms, invoices, receipts, contracts, etc. We will utilize Python, Amazon Textract, and AWS-managed Amazon Bedrock.
- Reduces manual data entry by ~90%.
- Speeds up billing, auditing, and legal workflows.
- Minimizes human error and provides auditable logs.
- AWS Account with IAM User
- Amazon Textract - A machine learning service that automatically extracts text, handwriting, layout elements, and data from scanned documents.
- Amazon Bedrock - fully managed service that offers a choice of high-performing foundation LLM models & tools to deploy and operate agents.
- AWS CLI installed & configured
- Python 3.8+ installed
- Install Streamlit and dependencies
- Upload file (or scanned image).
- Use Boto3 to call Amazon Textract sync API.
- Textract parses results and returns text/blocks.
- Draws bounding boxes over detected lines and display them alongside the raw image.
- Deploy Amazon Bedrock (Anthropic Claude v2 LLM) to create a RAG system for Q&A based on the context.
python3 -m venv idps-venv
source idps-venv/bin/activate
pip install -r requirements.txt
- Install AWS CLI on your terminal (or verify version)
aws --version
- Create an IAM user on AWS, create Access Keys and Download the csv file.
- Configure the IAM User on your terminal
aws configure
To add Q&A capabilities to the app, we will use AWS Bedrock.
- On AWS Management Console, navigate to AWS Bedrock Console
- Click on "Model access" under "Configure and learn" in the lower left sidebar
- Click on "Enable all models" or "select specific models".
- Provide some details like company name, website, industry, etc.
- Review your selections and submit it.
- Once approved, you can proceed to the application.
export AWS_REGION=us-east-1
streamlit run streamlitapp.py
- Upload an invoice/receipt
- Scroll down & ask a question like:
- What is the invoice number?
- What is the total amount?
- Who is the recipient?
- What is the address of the recipient? Be brief and concise
- How much in total was charged for consulting? Be brief
- What are the terms of the invoice? Be brief and concise
- How is the balance at closing going to be transferred? Be brief and concise
- Who is Frank Winfield?
- What did Frank Winfield invent?