Data Extractor

Extract and download key-value pairs, tables, and paragraphs from your scanned pdf, jpg, and png documents as CSV files.

Tech stack

Technology	Used for
Flask	Backend
React + Tailwind + DaisyUI	Frontend
Azure FormRecognizer	Extracting data from document
Azure BlobStorage	Storing uploaded documents

Usage (as a webapp)

Run npm i in frontend folder followed by npm run build
Run pip install -r requirements.txt in root folder
Create a .env file with the below content:

Create a Azure FormRecognizer service and copy the Endpoint and KEY1 from Keys and Endpoint. These will be the ENDPOINT and KEY respectively. Next create an azure storage account, and create a container in it. Go to Shared access tokens and click Generate SAS token and URL. Copy the Blod SAS URL. The part to the left of ? goes in BLOB_ENDPOINT and the part to the right goes in BLOB_QUERY

ENDPOINT = "https://xyz.cognitiveservices.azure.com"
KEY = "12345something"
BLOB_ENDPOINT = "https://xyz.blob.core.windows.net/containerName/"
BLOB_QUERY = "?xyz=xyz&xyz=xyz..."

Run with py main.py

Usage (as a script)

Run py extract.py -i "input/file/path.pdf" -o "output/file/path.csv"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Extractor

Tech stack

Usage (as a webapp)

Usage (as a script)

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
frontend		frontend
upload		upload
.gitignore		.gitignore
README.md		README.md
extract.py		extract.py
main.py		main.py
process.py		process.py
requirements.txt		requirements.txt

dashroshan/data-extractor

Folders and files

Latest commit

History

Repository files navigation

Data Extractor

Tech stack

Usage (as a webapp)

Usage (as a script)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages