Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bus: "Universal Protocol" design #23

Open
sheosi opened this issue Sep 9, 2021 · 3 comments
Open

Bus: "Universal Protocol" design #23

sheosi opened this issue Sep 9, 2021 · 3 comments
Labels
question Further information is requested

Comments

@sheosi
Copy link
Collaborator

sheosi commented Sep 9, 2021

Seeing as the community is swarming towards microservices (because of several benefits: easier dependency management, easy concurrency, sandboxing ...) I think that designing a common protocol has benefits that potentially can revolutionize our community, mostly:

  • Compatibility of projects: Despite being summarized so easily, this is a big feature. It has three subvariants:
    • Core: If we get our core components to be compatible this could mean that users could choose those that favour them best. This in turn could incentivize people to make new components that cater to really high standards, or some component that is extremely specialized (like components for running on small devices).
    • Clients: Having compatible clients means that it doesn't matter whether one is using project A or project B, and even allow for architectural differences accross them (which can make sense for different use cases, e.g: a distributed core would be great for personalization but a single-process core might make sense for final users looking for something lighter and easier to manage).
    • Skills: This is crucial. Having a common set of skills would make our community better as a whole. This way someone can write a skill for any project and be usable for all of them. There's the possibility of a common shared store!!! This would make having a set of skills of good quality easier.
  • Better protocol: Even if nothing is shared at the end of the day, a protocol is the bloodline of such projects as voice assistant, having a well-thought, researched and simple protocol will make our projects more robust.

Now that we know the "why?" the "what?" should be discussed, or in other terms, which features should be included, here are some but consider this an open list:

  • Clients with different capabilities: That is, clients that are capable of different things, maybe some are CLI and have no voice and some might have the ability to show pictures or small interactions/programs to the user, heck even an integration with the OS could be possible, this would make the "hey computer, close all my computer windows" feasible even if the client and the core/server are not in the same computer. Also, clients and skills should be able to define their own capabilities, this would make it really easy for clients to expose custom behaviour for skills to use. This effectively means breaking the whole protocol into smaller ones, which is pretty good for reasoning and documentation.
  • Versionning: Technology is never still, it's always evolving moving towards some other places, what can seem a good idea initially, might be bad in the future, if think about voice even if we are using the best encoder right now, maybe someone comes up with something incredible in a couple of years, and switching to it might be beneficial for a bonus in voice clarity and a reduciton in network usage. Also, new features could be added if needed.

Of course there might be more out there, anyone with a suggestion feel free to comment.

Finally, there's the "how?" which is decissive, as it will determine the robustness and performance of our solution. Some questions about how to build this:

  • Messaging Protocol: We need something to communicate all the components, but should it be based on TCP or on UDP? Which protocol?
  • Data interchange format: The protocol might send our message, but we still need a structure to hold them: JSON, MsgPack, FlexBuffers...?
  • Sandboxing: Some of the current implementations treat every node of the voice assistant as equals, even skills, this might open the door to bad-behaved skills, how this should be addressed?
  • Other features: how would versioning work? How to implement other features?
@sheosi sheosi added the question Further information is requested label Sep 9, 2021
@sheosi
Copy link
Collaborator Author

sheosi commented Sep 9, 2021

Some of my own thoughts about this:

  • Messaging Protocol: After investigation, the ones I'd propose are MQTT and CoAP, the former is TCP-based while the later is UDP-based. Reasearch suggests that CoAP performs better in conditions of bad network (in situations of congestion or with bad network reception) especially on cases that we want to make sure that our messages arrive (by default a message is not guaranteed to arrive, this makes the communication lighter), it also has both a Publish-Subscribe mechanism (like MQTT) and a Request-Response one (like HTTP), with the first one being useful for having the core and clients interact while the second one would be good for sandobxing skills, finally, it's Publish-Subscribe mechanism has no need of a server, unlike MQTT. The problem is that I fear CoAP being UDP-based might be bad with crappy home routers not being prepared to read those packets properly and handle them in a worse way when bad conditions arise, also MQTT is everywhere meanwhile CoAP might be harder to find or even missing in some "exotic" platforms/languages.
  • Data interchange format: We have one big question here? A schema-based protocol or not. For the time being and since I want capabilities (which means we don't know everything about the message right away), I've went the route of schema-less protocols, from them there are two major ones: MsgPack and FlexBuffers (an schema-less version of flatbuffers). I'd rather avoid JSON since even if some implementations make it fast, they use SIMD and other magic not available to MCUs (yes, I'd like an MCU to be able to implement this protocol), also those implementations are very hard to make and even then, sending binary data (like voice) is a problem: either you encode it as base64 (making it bigger by a 35%) or you extend the JSON protocol in a non-conforming way.
  • Sandboxing: Yes, please, I wouldn't like a random skill I want to try to try and interact with others or maybe try to scam me or some other shady situations. The best way to solve this would be by using a Request-Response model and have each skill node not know anything about the rest and being unable to interact directly with the clients.
  • Versioning: A lot of debate goes in hrere: Maybe some versioning system that helps us know what is compatible and what is not, on the other hand this might make the protocol overly complicated, and just having some suffix like "voicev1" and "voicev2" would suffice for marking incompatible versions while additions can be provided just by checking whether they exist or not.

@sheosi
Copy link
Collaborator Author

sheosi commented Sep 12, 2021

One point I forgot to mention and seems rather important is mobile systems and massging protocols. Talking to people about using MQTT (though the same applies to any protocol) as messaging protocol on mobile systems (iOS and Android ) I get the idea that communication can be lagged or even broken in those systems when sending data from one process to another in the voice assistant chain due to the fact that they mean to restrict their processes CPU time as much as possible for battery life sake. Which means that for something to work there reliable IPC must be used (Snips does some mention of this too) and each platform has their own IPC mechanism, meaning that wathever we make must be able to at least work on: the messaging protocol selected, on XPC and on wathever is chosen for Android (there are several, can't decide which one should be used).

@secretsauceai secretsauceai changed the title "Universal Protocol" design Bus: "Universal Protocol" design Oct 4, 2021
@secretsauceai
Copy link
Owner

The only way to determine what is best is an old fashion benchmark.

  • What parameters and tests can we pick to measure performance?
  • What device would be used to measure on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants