Skip to content

allowing nested calls after recognition #12

@ezavesky

Description

@ezavesky

I would like to chain additional processing steps after the recognition has been completed. This allows the inclusion of other cool things to be executed on top of the speech alone: sentiment analysis, topic understanding, speaker detection, etc.

Here's a rough sketch of the concept...

  • Each may follow their own "server" file that launches a new service so that they don't complicate the existing single-server architecture
  • Each would communicate over web calls (REST) to avoid process confusion; in the future, we could expand it to be something more rigorous like a message queue.
  • Each can communicate via stored JSON/metadata or audio files written to disk
  • Each can "register" itself with the main speech server as a secondary process on start-up. For example, the "speaker detection" module will (a) launch it's own service, (b) register with primary server, (c) accept REST calls and reply with JSON / text as required
  • Just one example, but each module could leverage other OSS like uis-rnn or pyannote-audio (both taken from this great repo of examples)

Seeking opinions at this point with more details to be flushed out later. Of course, eventually we may convert this suite into a package (e.g. satisfying #2), but that's not paramount right now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions