Skip to content

Commit

Permalink
Update voice-settings.mdx
Browse files Browse the repository at this point in the history
  • Loading branch information
J-ElevenLabs authored Sep 8, 2023
1 parent f395212 commit 3aa11f3
Showing 1 changed file with 7 additions and 11 deletions.
18 changes: 7 additions & 11 deletions speech-synthesis/voice-settings.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,29 +4,25 @@ description: "A guide on using stability, similarity sliders for tailored voice
---


Our users have found different workflows that suit them. The one you'll see most often is setting stability around 50 and similarity near 80, with minimal changes thereafter. Of course, this all depends on the original voice and the style of performance you're aiming for.
Our users have found different workflows that work for them. The one you'll see most often is setting stability around 50 and similarity near 80, with minimal changes thereafter. Of course, this all depends on the original voice and the style of performance you're aiming for.

The AI is non-deterministic, which means that each time you press generate, you will get slightly different performance, even with the exact same settings.
It's important to note that the AI is non-deterministic; setting the sliders to specific values won't guarantee the same results every time. Instead, the sliders function more as a range, determining how wide the randomization can be between each generation. Setting stability low means a wider range of randomization, often resulting in a more emotive performance, but this is also highly dependent on the voice itself.

Hovering over the `!` icon next to the sliders will provide additional information.

For a more lively and dramatic performance, it is recommended to set the stability slider lower and generate a few times until you find a performance you like.

On the other hand, if you want a more serious performance, even bordering on monotone on very high values, it is recommended to set the stability slider higher. And since it's more consistent and stable, you usually don't need to do as many generations to get what you are looking for. Experiment to find what works best for you!

Some users have taken it a step further with the API, making the sliders dynamic based on text length.

It's important to note that the AI is non-deterministic; setting the sliders to a specific values won't guarantee the same results every time. Instead, the sliders function more as a range, determining how wide the randomization can be between each generation. Setting stability low means a wider range of randomization, often resulting in a more emotive performance, but this is also highly dependent on the voice itself.

Hovering over `!` icon next to the sliders will provide additional information.


## Stability

The stability slider determines how stable the voice is and the randomness of each new generation. Lowering this slider introduces a broader emotional range for the character - this, as mentioned before, is also influenced heavily by the original voice. Setting the slider too low may result in odd performances that are overly random and cause the character to speak too quickly. On the other hand, setting it too high can lead to a monotonous voice with limited emotion.
The stability slider determines how stable the voice is and the randomness between each generation. Lowering this slider introduces a broader emotional range for the voice. As mentioned before, this is also influenced heavily by the original voice. Setting the slider too low may result in odd performances that are overly random and cause the character to speak too quickly. On the other hand, setting it too high can lead to a monotonous voice with limited emotion.


## Similarity

The similarity slider dictates how closely the AI should adhere to the original voice when attempting to replicate it. If the original audio is of poor quality and the similarity slider is set too high, the AI may reproduce artefacts or background noise when trying to mimic the voice if those were present in the original recording.
The similarity slider dictates how closely the AI should adhere to the original voice when attempting to replicate it. If the original audio is of poor quality and the similarity slider is set too high, the AI may reproduce artifacts or background noise when trying to mimic the voice if those were present in the original recording.


## Style Exaggeration
Expand All @@ -36,6 +32,6 @@ With the introduction of the newer models, we also added a style exaggeration se
In general, we recommend keeping this setting at 0 at all times.


## Speaker boost
## Speaker Boost

This is another setting that was introduced in the new models. The setting itself is quite self-explanatory – it boosts the similarity to the original speaker. However, using this setting requires a slightly higher computational load, which in turn increases latency. The differences introduced by this setting are generally rather subtle.

0 comments on commit 3aa11f3

Please sign in to comment.