Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Buffers, Blobs, or Streams inside experimental_streamData, not just JSON keys. #695

Open
ChristopherTrimboli opened this issue Oct 28, 2023 · 0 comments
Labels
ai/ui enhancement New feature or request

Comments

@ChristopherTrimboli
Copy link

ChristopherTrimboli commented Oct 28, 2023

Feature Description

Not sure the technical constraints, maybe impossible... but I'll show my use case that would be heavily improved and my bottleneck I ran into.

I'm using PlayHT AI audio and want to attach the audio data alongside the text. Latency is important, I want to do everything at once, inside the stream.

The major line in question is:

      data.append({
        voiceData: Buffer.from(await resp.arrayBuffer()).toString("base64"),
      });

You can see how I'm hacking a Buffer, then I decode back to audio on frontend client side because data only supports JSON values.

Some may say, use blob storage... I tried writing to vercel blob instead and pass URL, but I found base64 was still faster.
Ideally, no conversions... I am able to send a Blob or Buffer directly in data would be very cool!

Here is an example of my API:

export async function POST(req: Request) {
  // Extract the `messages` from the body of the request
  const { messages, personaName } = await req.json();

  // Request the OpenAI API for the response based on the prompt
  const aiResponse = await openai.chat.completions.create({
    model: "gpt-3.5-turbo",
    stream: true,
    messages: messages,
  });

  const data = new experimental_StreamData();

  const persona = await prisma.persona.findFirst({
    where: { name: personaName },
  });

  const stream = OpenAIStream(aiResponse, {
    onFinal: async (completion) => {
      const voicesFiltered = voices.filter(
        (v) =>
          v.voice_engine === "PlayHT2.0" &&
          v.gender === persona?.gender &&
          v.accent === persona?.accent
      );

      const resp = await fetch("https://api.play.ht/api/v2/tts/stream", {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          AUTHORIZATION: `${process.env.PLAYHT_SECRET_KEY}`,
          "X-USER-ID": process.env.PLAYHT_USER_ID!,
          accept: "audio/mpeg",
        },
        body: JSON.stringify({
          text: completion,
          voice:
            persona?.voiceId ??
            voicesFiltered[Math.floor(Math.random() * voicesFiltered.length)]
              .id,
          output_format: "mp3",
          voice_engine: "PlayHT2.0-turbo",
        }),
      }).catch((err) => console.log("fetch error:", err));

      if (!resp) return;
      
      
     // hack here to get around JSON keys
      data.append({
        voiceData: Buffer.from(await resp.arrayBuffer()).toString("base64"),
      });

      // IMPORTANT! you must close StreamData manually or the response will never finish.
      data.close();
    },
    // IMPORTANT! until this is stable, you must explicitly opt in to supporting streamData.
    experimental_streamData: true,
  });

  // Respond with the stream
  return new StreamingTextResponse(stream, {}, data);
}

Use Case

For voice audio streaming alongside text AI responses. Probably many other Buffer uses as well people doing. Images, webcam streams, etc.

Additional context

No response

@lgrammel lgrammel added enhancement New feature or request ai/ui labels Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ai/ui enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants