Skip to content

Shubhra/ajs 31 refactor vad with streams #390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
May 19, 2025

Conversation

Shubhrakanti
Copy link
Contributor

@Shubhrakanti Shubhrakanti commented May 14, 2025

Refactor the VAD component to use WHATWG streams.

We use two Identity TransformStreams to represent input and output.

Some parts of the VAD component are deprecated because of this refactor. Will remove once we do away with VoicePipelineAgent. Keeping it around for now because we're still developing.

Copy link

changeset-bot bot commented May 14, 2025

⚠️ No Changeset found

Latest commit: c4a4fa1

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@Shubhrakanti
Copy link
Contributor Author

Shubhrakanti commented May 15, 2025

I tried to get this to work by implementing a VadSrouce (similar to how we did Audio/VideoStream in node-sdk) but was struggling to get it working.

You can see my progress on this commit. dd14362

@@ -80,46 +84,73 @@ export abstract class VAD extends (EventEmitter as new () => TypedEmitter<VADCal

export abstract class VADStream implements AsyncIterableIterator<VADEvent> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should implement AsyncIterableIterator. I think it would be better to have a method

get stream(): ReadableStream<VADEvent> {
  return this.output.readable
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate on why (out of curiousity)?

Copy link
Contributor Author

@Shubhrakanti Shubhrakanti May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this isn't a huge issue for the VAD, but the other components (LLM/TTS/STT) expect the output to be a ReadableStream for their default implementations (see this comment). This way the default usage of the components can just be

static default = {
     async [stt/llm/tts]Node(inputStream: ReadableStream<...>): Promise<ReadableStream<...>> {
          ...
          stream = [stt/llm/tts].stream()
          stream.updateInputStream(inputStream)
          return stream.stream()
     }
}

Reference python implementation

I just want to keep all the [STT/LLM/TTS/VAD]Stream classes similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although in this case it makes more sense for VADStream to be called VADStreamManager or something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense to me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do this in a separate PR. There are some tricky things with the legacy implementation not worth wrestling right now since we will delete that anyway. https://linear.app/livekit/issue/AJS-48/remove-iterator-from-vadsream

Comment on lines 50 to 52
// eslint-disable-next-line @typescript-eslint/no-unused-vars
onVADInferenceDone(ev: VADEvent): void {
this.logger.info('VAD inference done', ev);
// this.logger.info('VAD inference done', ev);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore this. Forgot to pick this up form another PR's changes. Will resolve with merge conflict

Comment on lines 39 to 43
this.vadStreamProcessor = this.vadTask().catch((err) => {
this.logger.error('Error in VAD task', err);
// raise the error
throw err;
this.logger.error(`Error in VAD task: ${err}`);
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the proper way to handle errors from these async functions we run in node? I'll do some digging on this. Ideally we want to raise them rather than swallow them.

@Shubhrakanti Shubhrakanti requested a review from lukasIO May 15, 2025 05:34
@@ -37,7 +37,9 @@ export class AudioRecognition {

start() {
this.vadStreamProcessor = this.vadTask().catch((err) => {
this.logger.error('Error in VAD task', err);
// raise the error
throw err;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will always throw as an unhandled exception, which we'd want to avoid.

You're trying to handle an error in an async task from within a non-async method. I think catch is the best you can do there

Base automatically changed from shubhra/design-doc to dev-1.0 May 15, 2025 18:58
@Shubhrakanti Shubhrakanti requested a review from lukasIO May 15, 2025 20:42
@Shubhrakanti
Copy link
Contributor Author

Shubhrakanti commented May 16, 2025

Refactor the streaming logic into a clean interface. The streaming logic is similar to what we need for LLM And TTS and STT. https://linear.app/livekit/issue/AJS-43/stream-wrapper-that-takes-multiple-sources-and-combine-them

@Shubhrakanti Shubhrakanti merged commit bcfc134 into dev-1.0 May 19, 2025
5 checks passed
@Shubhrakanti Shubhrakanti deleted the shubhra/ajs-31-refactor-vad-with-streams branch May 19, 2025 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants