Skip to content

DocumentAI processor service which parses the documents and outputs them as a JSON format

License

Notifications You must be signed in to change notification settings

Koshkaj/expensebot

Repository files navigation

DocumentAI Expense Bot

Pre requisites

  • you should have a gloud user credentials file generated after performing authentication using gloud auth login, it should be located at ~/.config/gcloud/application_default_credentials.json

  • You either should have mongodb installed or docker if you want to use mongo driver to store documents if not you can use memory database which can be defined from environment variables

  • .env file with data, see example in .env.example file

Usage

make install # installs the dependencies defined in go.mod

make mongo # creates a mongodb docker container (obviously needs docker to be installed)

make env # creates a .env file based on .env.example

make test # you've guessed it, tests *_test.go files

make run # runs the project

Env example:

DB_TYPE=mongo
STORE_TYPE=local
PROCESSOR_TYPE=google
STORE_DIRECTORY_NAME=fileStorage
SERVER_PORT=:8080
MONGO_URI=mongodb://myuser:mypass@localhost:27017
MONGO_DB_NAME=expenses_main
MONGO_COLLECTION=receipts
GCP_LOCATION=
GCP_PROJECT_ID=
GCP_PROCESSOR_ID=

Exposed endpoints:

POST /documents - uploads a document (body -> key: "receipt", value: file)
GET /documents/<UUID> - gets existing document data

How can this be improved ?

To make this project ready to be deployed in a production environment it needs some adjustments and obviously more test coverage. In prod environment usually we would have a defined storage such as S3, NFS or something among those lines, perhaps even both. I think it would make sense to implement this logic using pub sub mechanism, for example we can upload documents to the service which then would publish an event to NATS and then the second service that listens to the upload events would replicate this data to different storages and also trigger scanning service so that the json is also stored along with the original documents.

It would be also nice to have prometheus metrics for those microservices after they are split so that we keep track of metrics and use tools then to visualize those metrics and monitor them.

Perhaps it is a good idea to also write a different transport layers if we talk about the microservices, we could have gRPC communication between some internal microservices and it makes sense to have both HTTP and gRPC in place so that it could be used by all services that implement those transport layers

Notes

For sake of this task I went with the simple flat file structure which I think is fine but its purely subjective. I've used interfaces almost everywhere so that if we would have multiple implementations of the logic but with different drivers,configs or etc.

Regarding the unit tests, I did not spend too much time to have 100% coverage becuase this task is for demonstration purposes.

Logging is another thing to point out, currently I am using the logger provided by echo but would be better to have slog or uber-go/zap for structurized logging

Author: Davit.K

About

DocumentAI processor service which parses the documents and outputs them as a JSON format

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published