This project is a Cloudflare serverless worker with an AI service binding deployed for processing either audio, image or text submitted by users into semantic useful information, especially in the context for submitting expenses.
For users that prefer a UI, a simple NextJS web app is created to demonstrate how to upload files and utilizes API routes to make a POST request to the Cloudflare worker endpoint.
From Cloudflare AI Models:
- Image-to-text: @cf/unum/uform-gen2-qwen-500m
- Automatic Speech Recognition: @cf/openai/whisper
- Text Generation: @hf/thebloke/mistral-7b-instruct-v0.1-awq
The worker is deployed at https://cf-journal.senchatea.workers.dev.
Text input
curl --location 'https://cf-journal.senchatea.workers.dev?type=text' \
--header 'Content-Type: text/plain' \
--data '2 weeks ago, I went to eat a buffet at Swensens Unlimited at the T2 airport, it'\''s really nice but it costs like $36 per person after GST, and there were 2 of us.'
Audio input
curl --location 'https://cf-journal.senchatea.workers.dev?type=audio' \
--header 'Content-Type: application/octet-stream' \
--header 'Authorization: Bearer pYrzMvsyURxsCUeaQDsa3lSO_tBDQEuiPB3iLEQt' \
--data '@postman-cloud:///1eef4b5b-a64c-4c10-ad4b-5f22c528880b'
Image input
curl --location 'https://cf-journal.senchatea.workers.dev?type=image' \
--header 'Content-Type: application/octet-stream' \
--header 'Authorization: Bearer pYrzMvsyURxsCUeaQDsa3lSO_tBDQEuiPB3iLEQt' \
--data '@postman-cloud:///1eef4b68-e5d7-49e0-a28d-f60b3bd7d70e'
These are some resources from Cloudflare that I read up to work on this project.