Skip to content

Commit

Permalink
Sync with whisper.cpp v1.4.3 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions[bot] committed Nov 9, 2023
1 parent 9dd19cb commit 74dc0df
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 21 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/publish-docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,12 @@ jobs:
run: |
source $GITHUB_ENV
# Always set the default tags
DEFAULT_TAGS="dublok/stt:${GITHUB_REF_NAME},dublok/stt:${SHA_SHORT}"
DEFAULT_TAGS="dublok/whisperdock:${GITHUB_REF_NAME},dublok/whisperdock:${SHA_SHORT}"
echo "TAGS=${DEFAULT_TAGS}" >> $GITHUB_ENV
# Check if GITHUB_REF_NAME starts with 'v' and if so, append 'latest'
if [[ $GITHUB_REF_NAME == v* ]]; then
LATEST_TAGS="${DEFAULT_TAGS},dublok/stt:latest"
LATEST_TAGS="${DEFAULT_TAGS},dublok/whisperdock:latest"
echo "TAGS=${LATEST_TAGS}" >> $GITHUB_ENV
fi
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/sync-whisper.yml
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ jobs:
steps:
- name: 🛒 Checkout repository for sync
run: |
git clone https://x-access-token:${{ secrets.PAT }}@github.com/ErcinDedeoglu/stt.git .
git clone https://x-access-token:${{ secrets.PAT }}@github.com/ErcinDedeoglu/whisperdock.git .
- name: 🔄 Sync whisper.cpp repository to src/whisper
run: |
Expand Down
33 changes: 15 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# [Speech-to-Text (STT) Transcription Service 🎤](https://github.com/ErcinDedeoglu/stt)
# [WhisperDock - Speech-to-Text Service 🎤](https://github.com/ErcinDedeoglu/WhisperDock)

This repository hosts the Dockerized speech-to-text transcription service, which utilizes Whisper C++ alongside Python to provide an API for audio file transcription.

<div align="center"><img src="https://github.com/ErcinDedeoglu/stt/assets/6512072/0f63bc19-7662-4a03-b7bb-dea8bfbb44d7" width="400"></div>
<div align="center"><img src="https://github.com/ErcinDedeoglu/WhisperDock/raw/main/assets/logo.png" width="400"></div>

## Background and Motivation

In the rapidly advancing field of machine learning, access to efficient and robust tools for everyday applications is essential. Speech-to-text transcription is one of the areas that has seen significant improvements, but deploying these models quickly and efficiently remains a challenge. Whisper C++, a high-performance transcription tool, has emerged as a powerful option, yet it still requires a streamlined pathway to deployment.
Access to efficient and robust tools for everyday applications is essential in the rapidly advancing field of machine learning. Speech-to-text transcription is one of the areas that has seen significant improvements, but deploying these models quickly and efficiently remains a challenge. Whisper C++, a high-performance transcription tool, has emerged as a powerful option, yet it still requires a streamlined pathway to deployment.

This repository was created out of a necessity to bridge the gap between the development of speech-to-text models and their deployment in real-world applications. Many existing solutions require extensive setup, intricate knowledge of systems, and can be time-consuming to deploy, creating a barrier for developers, researchers, and businesses who want to integrate transcription capabilities into their services.
This repository was created out of a necessity to bridge the gap between the development of speech-to-text models and their deployment in real-world applications. Many existing solutions require extensive setup and intricate knowledge of systems and can be time-consuming to deploy, creating a barrier for developers, researchers, and businesses who want to integrate transcription capabilities into their services.

The Speech-to-Text Transcription Service aims to provide a fast, reliable, and easy-to-use solution for deploying Whisper C++ models. By containerizing the service with Docker, we significantly reduce the complexity of deployment and make it possible to launch a transcription service that is both scalable and accessible.

Expand All @@ -17,7 +17,7 @@ Here are some of the key motivations behind this project:
- **Speed of Deployment**: By providing a Dockerized solution, we enable rapid deployment of the transcription service, allowing users to go from zero to a fully functioning service in minutes.
- **Ease of Use**: The provided APIs and Docker setup are designed to be as simple as possible, requiring minimal configuration and allowing for easy integration into existing workflows.
- **Accessibility**: Making Whisper C++ easily deployable opens up more opportunities for developers and organizations of all sizes to utilize state-of-the-art transcription technology.
- **Continuous Integration and Delivery**: With GitHub Actions, updates and improvements are integrated seamlessly, ensuring that the service remains up-to-date with the latest advancements from the Whisper C++ repository.
- **Continuous Integration and Delivery**: With GitHub Actions, updates, and improvements are integrated seamlessly, ensuring the service remains up-to-date with the latest advancements from the Whisper C++ repository.

In contributing to this repository, I hope to empower individuals and organizations to harness the capabilities of Whisper C++ without the overhead of complex deployment processes, thus fostering innovation and development in the field of speech recognition.

Expand All @@ -31,14 +31,14 @@ For quick deployment, use the Docker images provided in the Docker registry.

For the latest stable version:
```bash
docker pull dublok/stt:latest
docker run -p 5000:5000 dublok/stt:latest
docker pull dublok/whisperdock:latest
docker run -p 5000:5000 dublok/whisperdock:latest
```

For the nightly build (unstable but with early access to new features):
```bash
docker pull dublok/stt:main
docker run -p 5000:5000 dublok/stt:main
docker pull dublok/whisperdock:main
docker run -p 5000:5000 dublok/whisperdock:main
```

The service should now be accessible at `http://localhost:5000`.
Expand All @@ -47,21 +47,19 @@ The service should now be accessible at `http://localhost:5000`.

1. Clone the repository:
```bash
git clone https://github.com/ErcinDedeoglu/stt
git clone https://github.com/ErcinDedeoglu/WhisperDock
```

2. Build the Docker image:
```bash
docker build -t stt-service .
docker build -t whisperdock .
```

3. Run the container:
```bash
docker run -p 5000:5000 stt-service
docker run -p 5000:5000 whisperdock
```

Certainly! Below is an updated README section that includes an example response from the API after submitting an audio file for transcription.

---

## API Usage
Expand All @@ -72,7 +70,7 @@ To transcribe audio, make a POST request to the `/transcribe` endpoint with the
curl -X POST -F 'file=@/path/to/your/audio.wav' http://localhost:5000/transcribe
```

Make sure your audio file is in WAV format with a sample rate of 16kHz.
Ensure your audio file is in WAV format with a sample rate of 16kHz.

### Example Response

Expand All @@ -84,7 +82,7 @@ Upon successful transcription, the service will return a JSON response containin
{
"start_time": "00:00:00.000",
"end_time": "00:00:03.000",
"text": "Welcome to our speech to text service."
"text": "Welcome to our speech-to-text service."
},
{
"start_time": "00:00:03.500",
Expand Down Expand Up @@ -135,5 +133,4 @@ This project uses GitHub Actions for continuous integration, which automates the

## License

This Speech-to-Text Transcription Service is made available under the [CC0 1.0 Universal](LICENSE) public domain dedication.

This Speech-to-Text Transcription Service is available under the [CC0 1.0 Universal](https://github.com/ErcinDedeoglu/WhisperDock/blob/main/LICENSE) public domain dedication.
Binary file added assets/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 74dc0df

Please sign in to comment.