-
-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove redundant steps & improve README.md #26
Conversation
…ed "whisper-playground," integrated configuration options for the transcription device and compute type, improved clarity of README instructions, and streamlined package selection by removing redundancies.
…izationPlayground
…izationPlayground
If you want minimal latency, use the real-time mode. If you don't mind growing latency and prioritize accuracy, use the sequential mode. | ||
## Troubleshooting | ||
|
||
- If you're unable to connect from the client to the server, use an ngrok tunnel to expose port 8000. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, no idea. Worked fine on MacOS, but didn't work on Windows. If I had to take a guess, I'd say the connection is blocked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, no idea. Worked fine on MacOS, but didn't work on Windows. If I had to take a guess, I'd say the connection is blocked.
After changing http://0.0.0.0:8000/ to http://localhost:8000/ in App.js, it is able to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Epresin Good find! Appreciate it :)
Seems to work just fine on MacOS as well, so it's probably the safer bet :)
1. On MacOS, there's a clash between av files preventing transcription (works well on Google Colab with Python 3.8). | ||
2. In the sequential mode, there may be uncontrolled speaker swapping, which can be fixed by using pyannote's building blocks and handling speakers manually. | ||
3. In real-time mode, audio data not meeting the transcription timeout won't be transcribed. | ||
4. Speechless batches will cause errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a link to the issue you've opened
@@ -1,6 +1,9 @@ | |||
from diart import PipelineConfig | |||
from enum import Enum | |||
|
|||
TRANSCRIPTION_DEVICE = "cuda" # use 'cpu' if it doesn't work | |||
COMPUTE_TYPE = "int8_float16" # use float32 with cpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but the comment seems off. Float 32 though using int8_float16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If one is using "cpu" as their transcription device, it should be float32. float16 wouldn't make that much of a difference even if supported.
int8_float16 works well with cuda.
@ethanzrd I approved and merged but please reply to my comments or just go ahead and fix them if relevant |
The installation script also installs portaudio as part of the Conda environment to avoid having the user install it on their device, eliminating the first step.
The README file addresses troubleshooting and initial configuration.