This app demonstrates the concept of an interface orchestrated by an autonomous agent nicknamed "Interface-omniscient" Agent.
The concept is all about giving all the levers to pull to an autonomous chatbot that can help you navigate a website:
- It can open pages and click on buttons by knowing what are the elements in the page
- It can give you information about the current page
- It can track your actions in order to make informed decisions
In this demonstration you can order food autonomously, by voice, with no need to pay attention to a simulated confusing/populated/messy interface, and the chat assistant can also autonomously answer to you using voice.
Audio is not great, and it had to be compressed many times, sorry about it :)
LLM-Delivery-app-demo.1.1.mp4
- Create an OpenAI API key
- Create a Pinecone API key and Index named
auto-food-order
- Find .env.example and write your API keys
- Rename
.env.example
to.env
- Create a pinecone index
- Create a database following the tutorial in here
- Open config.yaml and choose the model you would like to use (works better with gpt-4 or gpt-4-32k)
- Feel free to tweak the parameters
- Install docker on your machine
- Run
docker compose up --build
and have fun
The main data used are restaurant names, descriptions, food/beverage items names and descriptions
-
All the data was generated by GPT-4
-
All the images of restaurants and foods were generated by Dalle-3 using the image_generator.ipynb pipeline
-
Databases
- SQL: SQLite (sqlalchemy) was used to create the database
- Vector DB: We used the free version of pinecone to store restaurants for restaurant search
Tutorial available in backend/src/data/README.md
All the possible user-actions were mapped as functions (either dummy or not) in the backend that allows the chatbot to perform it for the user (see prompts/README.md and services/README.md)
One particular function calling is the Retrieval of restaurants
Llama-index and Pinecone for orchestrating the search, meanwhile OpenAI Embeddings were used to encode the restaurant names and descriptions to carry out semantic search
The chatbot can autonomously trigger this search when asked by the user
Services are the main engines used, both from OpenAI and also custom functions that are used by the frontend to perform actions
Find more information in backend/src/services/README.md
Handlers are classes that will orchestrate data before using a service/engine and sending it to the frontend
Find more information in backend/src/handlers/README.md
The endpoints functions will receive data from the frontend, use Services and Handlers to manipulate this data and generate an output and then send it back
Find more information in backend/src/endpoints/README.md
In the frontend code there are function call handlers (e.g. handlePlaceOrder), those are reponsible for carrying out some actions inside the interface
When the user sends a message, if the chat completion model decides it is time to call a function it will return a function flagged output with the name of the function to be called
This is routed in the handleFunctionCall (also see generateAnswer) function that will call a function handler in the interface to perform an action
Most of the frontend actions are registered in natural language in the frontend
registerAction: function(msg) {
this.actions.push("@action:" + msg + " at " + this.getCurrentTime())
console.log(this.actions)
},
The latest actions are made available to the chatbot on demand within a function calling setup
The chatbot then has the autonomy to decide when to call it to verify the users/its own latest actions and be able to make more informed decisions
Try asking the chatbot: What did I just do?
after interacting with the interface
Audio is not great, and it had to be compressed many times, sorry about it :)