Skip to content

Shubhra/ajs 32 add support for realtime model #530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 15, 2025

Conversation

Shubhrakanti
Copy link
Contributor

@Shubhrakanti Shubhrakanti commented Jul 11, 2025

Adds support to realtime model. Voice only - function calling is not yet enabled. There is still a bug with interrupts. The audio immediately after an interrupt gets cut off.

Copy link

changeset-bot bot commented Jul 11, 2025

⚠️ No Changeset found

Latest commit: c57eb66

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@Shubhrakanti Shubhrakanti changed the base branch from main to dev-1.0 July 11, 2025 20:11
Comment on lines +127 to +146
constructor(
options: {
model?: string;
voice?: string;
temperature?: number;
toolChoice?: llm.ToolChoice;
baseURL?: string;
inputAudioTranscription?: api_proto.InputAudioTranscription | null;
// TODO(shubhra): add inputAudioNoiseReduction
turnDetection?: api_proto.TurnDetectionType | null;
speed?: number;
// TODO(shubhra): add openai tracing options
azureDeployment?: string;
apiKey?: string;
entraToken?: string;
apiVersion?: string;
maxSessionDuration?: number;
// TODO(shubhra): add connOptions
} = {},
) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just adds a default value

Comment on lines +502 to +505
for (const nf of this.bstream.write(f.data.buffer)) {
this.sendEvent({
type: 'input_audio_buffer.append',
audio: Buffer.from(nf.data).toString('base64'),
audio: Buffer.from(nf.data.buffer).toString('base64'),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really nasty bug

Comment on lines +886 to +905
textStream: new ReadableStream<string>({
async start(controller) {
for await (const chunk of itemGeneration.textChannel) {
controller.enqueue(chunk);
}
},
cancel() {
itemGeneration.textChannel.close();
},
}),
audioStream: new ReadableStream<AudioFrame>({
async start(controller) {
for await (const chunk of itemGeneration.audioChannel) {
controller.enqueue(chunk);
}
},
cancel() {
itemGeneration.audioChannel.close();
},
}),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might need to reconsider how we do this in the future but fine for now.

@@ -416,6 +416,7 @@ export interface InputAudioBufferSpeechStoppedEvent extends BaseServerEvent {

export interface ConversationItemCreatedEvent extends BaseServerEvent {
type: 'conversation.item.created';
previous_item_id: string;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should try to pull these types from the open ai agents sdk.

}

updateAudioInput(audioStream: ReadableStream<AudioFrame>): void {
this.audioRecognition?.setInputAudioStream(audioStream);
// TODO(shubhra): might need to tee the streams here.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a ticket

this.realtimeSession.on(
'input_audio_transcription_completed',
this.onInputAudioTranscriptionCompleted,
this.realtimeSession.on('generation_created', (ev) => this.onGenerationCreated(ev));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrow functions to preserve this context

Comment on lines +126 to +135
private async _mainTaskImpl(signal: AbortSignal): Promise<void> {
const reader = this.deferredInputStream.stream.getReader();
while (true) {
const { done, value } = await reader.read();
if (done || signal.aborted) {
break;
}
this.pushAudio(value);
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again might need to reconsider this before launch but it's fine for now.

@Shubhrakanti Shubhrakanti requested a review from toubatbrian July 15, 2025 17:46
Comment on lines +699 to +709
export function removeInstructions(chatCtx: ChatContext) {
// loop in case there are items with the same id (shouldn't happen!)
while (true) {
const idx = chatCtx.indexById(INSTRUCTIONS_MESSAGE_ID);
if (idx !== undefined) {
chatCtx.items.splice(idx, 1);
} else {
break;
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied directly from python

@Shubhrakanti Shubhrakanti merged commit c2f547c into dev-1.0 Jul 15, 2025
8 checks passed
@Shubhrakanti Shubhrakanti deleted the shubhra/ajs-32-add-support-for-realtime-model branch July 15, 2025 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant