Botiverse is Python package that bridges the gap between developers regardless of their machine learning expertise and building chatbots. It offers a diverse set of modern chatbot architectures that are ready to be trained in a high-level fashion while offering optional fine-grained control for advanced use-cases.
We strongly recommend referring to the documentation which also comes with a humble user guide.
For standard use, consider Python 3.9+ and
pip install botiverse
This installs botiverse excluding the dependencies needed for the voice bot and its preprocessors. To include those as well, consider installing
pip install botiverse[voice]
and make sure to also have FFMPEG on your machine, as needed by the unavoidable dependency PyAudio
.
Please be welcome to submit an issue.
Import the chatbot you need from botiverse.bots
. All bots have a similar interface consisting of a read, train and infer method.
from botiverse.bots import BasicBot
# make a chatbot instance
bot = BasicBot(machine='nn', repr='tf-idf')
# read the data
bot.read_data('dataset.json')
# train the chatbot
bot.train()
# infer
bot.infer("Hello there!")
Botiverse offers 7 main chatbot architectures that cover a wide variety of use cases:
Chatbot | Description | Example Use Case |
---|---|---|
Basic Bot | A light-weight intent-based chatbot based on classical or deep classiciation models | Answer frequently asked questions on a website while remaining insensitive to wording |
Whiz Bot | A multi-lingual intent-based chatbot based on deep sequential models | Similar to basic bot but suitable for cases where there is more data or better performance or multilinguality is needed in return of more computation |
Task Bot | A task-oriented chatbot based on encoder transformer models | A chatbot that can collect all the needed information to perform a task such as booking a flight or a hotel |
Basic Task Bot | A basic light-weight version of the task bot purely based on Regex and grammars | When insufficient data exists for the deep version and developers are willing to design a general grammar for the task |
Converse Bot | A conversational chatbot based on language modeling with transformers | A chatbot that converses similar to human agents; e.g., like a narrow version of ChatGPT as customer service |
Voice Bot | A voice bot that simulates a call state machine based on deep speech and embedding models | A voice bot that collects important information from callers before transferring them to a real agent |
Theorizer | Based on deep classification and language models | Converts textual data into question-answer pairs suitable for later training |
- All chatbot architectures that Botiverse support (i.e., in
botiverse.bots
) are composed of a representer that puts the input text or audio in the right representation and a model that is responsible for the chatbot's output. - All representers (top row) and models (bottom row) with a non-white frame were implemented from scratch for some definition of that.
- Beyond being a chatbot package, most representers and models can be also used independently and share the same API. For instance, you can import your favorite model or representer from
botiverse.models
orbotiverse.preprocessors
respectively and use it for any ordinary machine learning task. - It follows that some chatbot architectures also allow using a customly defined representer or model as long as it satisfies the relevant unified interface (as in the docs).
Now let's learn more about each chatbot available in botiverse.bots
bot = BasicBot(machine='nn', repr='tf-idf')
bot.read_data('dataset.json')
bot.train()
bot.infer("Hello there!")
Please check this for the documentation which also includes the user guide.
The following is the result (in its best form) from training the Basic Bot
on a small synthetic dataset.json
as found in the examples to answer FAQs for a university website
You can simulate a similar demo offline using the notebook in the Examples folder or online on Google collab.
Google colab won't have a server to run the chat gui, the options are to use a humble version by setting
server=False
or to provide an ngrok authentication token in theauth_token
argument.
You will have to manually drop the dataset from the examples folder into the data section in colab.
bot = WhizBot(repr='BERT')
bot.read_data('./dataset_ar.json')
bot.train(epochs=10, batch_size=32)
bot.infer("ما هي الدورات المتاحة؟")
Please check this for the documentation which also includes the user guide.
The following is the result (in its best form) from training the Whiz Bot
on a small synthetic dataset.json
as found in the examples to answer FAQs for a university website in Arabic
You can simulate a similar demo offline using the notebook in the Examples folder or online on Google collab.
Note that the performance of both the basic bot and whiz bot largely scales with the quality and size of the dataset; the one we use here is a small synthetic version generated by LLMs and could be greatly improved if given time.
tbot = BasicTaskBot(domains_slots, templates, domains_pattern, slots_pattern)
bot.infer("I want to book a flight")
Please check this for the documentation which also includes the user guide.
The following is the result from building a simple Basic Task Bot
to perform simple flight booking tasks
You can simulate a similar demo offline using the notebook in the Examples folder or online on Google collab.
bot = TaskBot(domains, slot_list, start, templates)
bot.read_data(train_path, dev_path, test_path)
bot.train()
bot.infer("I want to eat in a restaurant")
Please check this for the documentation which also includes the user guide.
The following is the result from training the Task Bot
on the sim-R dataset which includes many possible tasks.
You can simulate a similar demo offline using the notebook in the Examples folder or online on Google collab.
bot = ConverseBot()
bot.read_data("conversations.json")
bot.train(epochs=1, batch_size=1)
bot.save_model("conversebot.pt")
bot.infer("What is Wikipedia?")
Please check this for the documentation which also includes the user guide.
The following is the result from the Converse Bot
before training on Amazon customer service conversations dataset and after it was pretrained on an assistance corpus. You can check for post-training results by checking the examples (training takes time).
You can simulate a similar demo offline using the notebook in the Examples folder or online on Google collab.
bot = VoiceBot('call.json')
bot.simulate_call()
Please check this for the documentation which also includes the user guide. An independent submodule of the voice bot
is a speech classifier which may learn from zero-shot data (synthetic generation). If interested in that then check this for the documentation which also includes the user guide.
The following is the result from building a Voice Bot
on a hand-crafted call state machine as found in the Examples. The voice bot requires no training data.
Voice.Demo.mp4
You can only simulate a similar demo offline using the notebook in the Examples folder. This applies to both the voice bot and the speech classifier.
context = "Some very long text"
QAs = generate(context)
print(json.dumps(QAs,indent=4))
Please check this for the documentation which also includes the user guide.
No demo is available yet for the Theorizer; you may check the example in the Examples folder.
Most could be indepdendently used in any task; please consult the relevant section of the documentation and the Examples
folder.
Essam | Yousef Atef | Muhammad Saad | Kariiem Taha |
Basic Bot and Voice Bot & Relevant Models | Basic and Deep Task Bot & Relevant Models | Whiz and Converse Bot & Relevant Models | Theorizer & Relevant Models |
Sincere thanks to Abdelrahman Jamal for helping test the package on Windows.