-
Notifications
You must be signed in to change notification settings - Fork 132
Shubhra/ajs 32 add support for realtime model #530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shubhra/ajs 32 add support for realtime model #530
Conversation
|
constructor( | ||
options: { | ||
model?: string; | ||
voice?: string; | ||
temperature?: number; | ||
toolChoice?: llm.ToolChoice; | ||
baseURL?: string; | ||
inputAudioTranscription?: api_proto.InputAudioTranscription | null; | ||
// TODO(shubhra): add inputAudioNoiseReduction | ||
turnDetection?: api_proto.TurnDetectionType | null; | ||
speed?: number; | ||
// TODO(shubhra): add openai tracing options | ||
azureDeployment?: string; | ||
apiKey?: string; | ||
entraToken?: string; | ||
apiVersion?: string; | ||
maxSessionDuration?: number; | ||
// TODO(shubhra): add connOptions | ||
} = {}, | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just adds a default value
for (const nf of this.bstream.write(f.data.buffer)) { | ||
this.sendEvent({ | ||
type: 'input_audio_buffer.append', | ||
audio: Buffer.from(nf.data).toString('base64'), | ||
audio: Buffer.from(nf.data.buffer).toString('base64'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really nasty bug
textStream: new ReadableStream<string>({ | ||
async start(controller) { | ||
for await (const chunk of itemGeneration.textChannel) { | ||
controller.enqueue(chunk); | ||
} | ||
}, | ||
cancel() { | ||
itemGeneration.textChannel.close(); | ||
}, | ||
}), | ||
audioStream: new ReadableStream<AudioFrame>({ | ||
async start(controller) { | ||
for await (const chunk of itemGeneration.audioChannel) { | ||
controller.enqueue(chunk); | ||
} | ||
}, | ||
cancel() { | ||
itemGeneration.audioChannel.close(); | ||
}, | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might need to reconsider how we do this in the future but fine for now.
@@ -416,6 +416,7 @@ export interface InputAudioBufferSpeechStoppedEvent extends BaseServerEvent { | |||
|
|||
export interface ConversationItemCreatedEvent extends BaseServerEvent { | |||
type: 'conversation.item.created'; | |||
previous_item_id: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should try to pull these types from the open ai agents sdk.
agents/src/voice/agent_activity.ts
Outdated
} | ||
|
||
updateAudioInput(audioStream: ReadableStream<AudioFrame>): void { | ||
this.audioRecognition?.setInputAudioStream(audioStream); | ||
// TODO(shubhra): might need to tee the streams here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a ticket
this.realtimeSession.on( | ||
'input_audio_transcription_completed', | ||
this.onInputAudioTranscriptionCompleted, | ||
this.realtimeSession.on('generation_created', (ev) => this.onGenerationCreated(ev)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrow functions to preserve this
context
private async _mainTaskImpl(signal: AbortSignal): Promise<void> { | ||
const reader = this.deferredInputStream.stream.getReader(); | ||
while (true) { | ||
const { done, value } = await reader.read(); | ||
if (done || signal.aborted) { | ||
break; | ||
} | ||
this.pushAudio(value); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again might need to reconsider this before launch but it's fine for now.
export function removeInstructions(chatCtx: ChatContext) { | ||
// loop in case there are items with the same id (shouldn't happen!) | ||
while (true) { | ||
const idx = chatCtx.indexById(INSTRUCTIONS_MESSAGE_ID); | ||
if (idx !== undefined) { | ||
chatCtx.items.splice(idx, 1); | ||
} else { | ||
break; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copied directly from python
Adds support to realtime model. Voice only - function calling is not yet enabled. There is still a bug with interrupts. The audio immediately after an interrupt gets cut off.