Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant steps & improve README.md #26

Merged
merged 6 commits into from
Aug 10, 2023

Conversation

ethanzrd
Copy link
Collaborator

The installation script also installs portaudio as part of the Conda environment to avoid having the user install it on their device, eliminating the first step.
The README file addresses troubleshooting and initial configuration.

ethanzrd and others added 6 commits August 10, 2023 22:46
…ed "whisper-playground," integrated configuration options for the transcription device and compute type, improved clarity of README instructions, and streamlined package selection by removing redundancies.
If you want minimal latency, use the real-time mode. If you don't mind growing latency and prioritize accuracy, use the sequential mode.
## Troubleshooting

- If you're unable to connect from the client to the server, use an ngrok tunnel to expose port 8000.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this happen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, no idea. Worked fine on MacOS, but didn't work on Windows. If I had to take a guess, I'd say the connection is blocked.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, no idea. Worked fine on MacOS, but didn't work on Windows. If I had to take a guess, I'd say the connection is blocked.

After changing http://0.0.0.0:8000/ to http://localhost:8000/ in App.js, it is able to run.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Epresin Good find! Appreciate it :)
Seems to work just fine on MacOS as well, so it's probably the safer bet :)

1. On MacOS, there's a clash between av files preventing transcription (works well on Google Colab with Python 3.8).
2. In the sequential mode, there may be uncontrolled speaker swapping, which can be fixed by using pyannote's building blocks and handling speakers manually.
3. In real-time mode, audio data not meeting the transcription timeout won't be transcribed.
4. Speechless batches will cause errors.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link to the issue you've opened

@@ -1,6 +1,9 @@
from diart import PipelineConfig
from enum import Enum

TRANSCRIPTION_DEVICE = "cuda" # use 'cpu' if it doesn't work
COMPUTE_TYPE = "int8_float16" # use float32 with cpu
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but the comment seems off. Float 32 though using int8_float16

Copy link
Collaborator Author

@ethanzrd ethanzrd Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one is using "cpu" as their transcription device, it should be float32. float16 wouldn't make that much of a difference even if supported.
int8_float16 works well with cuda.

@saharmor
Copy link
Owner

@ethanzrd I approved and merged but please reply to my comments or just go ahead and fix them if relevant

@saharmor saharmor merged commit f840fd1 into saharmor:main Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants