-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Create new service diarizer for diarization #3881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new diarizer service and its corresponding Helm chart. The overall structure is good, but there are several high and critical severity issues that should be addressed. My review focuses on improving the reliability and performance of the FastAPI service by making it non-blocking, reducing memory usage during file uploads, and enhancing the Kubernetes configurations for better robustness and reusability. I've also included suggestions to correct and simplify the Dockerfile.
| def diarization(file: UploadFile = File(...)): | ||
| print('diarization') | ||
| return diarization_endpoint(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diarization endpoint is synchronous (def), but it calls a long-running, blocking function (diarization_endpoint). This will block the Uvicorn event loop, making the entire service unresponsive to any other requests, including health checks from Kubernetes. This can lead to Kubernetes killing the pod while it's processing a valid request. To fix this, the endpoint should be async and the blocking call should be wrapped in fastapi.concurrency.run_in_threadpool to run it in a separate thread.
You'll need to import run_in_threadpool from fastapi.concurrency.
| def diarization(file: UploadFile = File(...)): | |
| print('diarization') | |
| return diarization_endpoint(file) | |
| async def diarization(file: UploadFile = File(...)): | |
| print('diarization') | |
| return await run_in_threadpool(diarization_endpoint, file) |
References
- In an async application, avoid blocking the event loop with synchronous operations to prevent performance degradation and unresponsiveness. This principle applies to long-running blocking functions as well as synchronous HTTP clients.
| livenessProbe: | ||
| tcpSocket: | ||
| port: 8080 | ||
| failureThreshold: 30 | ||
| periodSeconds: 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The liveness, readiness, and startup probes are currently using tcpSocket, which only checks if the port is open. This doesn't guarantee the application is healthy and ready to serve traffic. Since a /health endpoint is available, it's better to use an httpGet probe to get a more accurate health status of the application. This change should be applied to livenessProbe, readinessProbe, and startupProbe for a more reliable service.
livenessProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 10| livenessProbe: | ||
| tcpSocket: | ||
| port: 8080 | ||
| failureThreshold: 30 | ||
| periodSeconds: 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The liveness, readiness, and startup probes are currently using tcpSocket, which only checks if the port is open. This doesn't guarantee the application is healthy and ready to serve traffic. Since a /health endpoint is available, it's better to use an httpGet probe to get a more accurate health status of the application. This change should be applied to livenessProbe, readinessProbe, and startupProbe for a more reliable service.
livenessProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 10| apiVersion: cloud.google.com/v1 | ||
| kind: BackendConfig | ||
| metadata: | ||
| name: diarizer-backend-config | ||
| spec: | ||
| healthCheck: | ||
| checkIntervalSec: 10 | ||
| timeoutSec: 5 | ||
| port: {{ .Values.service.port }} | ||
| type: HTTP | ||
| requestPath: /health # Ensure this path returns 200 OK | ||
| --- | ||
| apiVersion: v1 | ||
| kind: Service | ||
| metadata: | ||
| name: {{ include "diarizer.fullname" . }} | ||
| labels: | ||
| {{- include "diarizer.labels" . | nindent 4 }} | ||
| annotations: | ||
| cloud.google.com/neg: '{"ingress": true}' | ||
| cloud.google.com/backend-config: '{"default": "diarizer-backend-config"}' # Link to BackendConfig | ||
| spec: | ||
| type: {{ .Values.service.type }} | ||
| ports: | ||
| - port: {{ .Values.service.port }} | ||
| targetPort: {{ .Values.service.port }} | ||
| protocol: TCP | ||
| name: http | ||
| selector: | ||
| {{- include "diarizer.selectorLabels" . | nindent 4 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BackendConfig resource name is hardcoded as diarizer-backend-config. This will cause naming conflicts if you try to install this Helm chart more than once in the same namespace. It's a best practice to template resource names using {{ include "diarizer.fullname" . }} to ensure they are unique per release. The annotation in the Service resource that refers to it should also be updated accordingly.
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: {{ include "diarizer.fullname" . }}-backend-config
spec:
healthCheck:
checkIntervalSec: 10
timeoutSec: 5
port: {{ .Values.service.port }}
type: HTTP
requestPath: /health # Ensure this path returns 200 OK
---
apiVersion: v1
kind: Service
metadata:
name: {{ include "diarizer.fullname" . }}
labels:
{{- include "diarizer.labels" . | nindent 4 }}
annotations:
cloud.google.com/neg: '{"ingress": true}'
cloud.google.com/backend-config: '{"default": "{{ include "diarizer.fullname" . }}-backend-config"}' # Link to BackendConfig
spec:
type: {{ .Values.service.type }}
ports:
- port: {{ .Values.service.port }}
targetPort: {{ .Values.service.port }}
protocol: TCP
name: http
selector:
{{- include "diarizer.selectorLabels" . | nindent 4 }}| FROM python:3.11-slim | ||
| # Python 3.11 on Debian 12 (Bookworm) | ||
|
|
||
| WORKDIR /app | ||
| ENV PATH="/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/venv/bin:$PATH" | ||
| ENV LD_LIBRARY_PATH="/usr/local/nvidia/lib:/usr/local/nvidia/lib64" | ||
|
|
||
| # Install CUDA Toolkit 12.6 Update 3 | ||
| RUN apt-get update && apt-get -y install build-essential ffmpeg curl unzip wget && \ | ||
| wget https://developer.download.nvidia.com/compute/cuda/12.6.3/local_installers/cuda-repo-debian12-12-6-local_12.6.3-560.35.05-1_amd64.deb && \ | ||
| dpkg -i cuda-repo-debian12-12-6-local_12.6.3-560.35.05-1_amd64.deb && \ | ||
| cp /var/cuda-repo-debian12-12-6-local/cuda-*-keyring.gpg /usr/share/keyrings/ && \ | ||
| apt-get update && \ | ||
| apt-get -y install cuda-toolkit-12-6 && \ | ||
| rm -rf /var/lib/apt/lists/* cuda-repo-debian11-12-6-local_12.6.3-560.35.05-1_amd64.deb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manually installing the CUDA toolkit inside a generic Python image is complex and can be brittle. A better practice is to use an official NVIDIA base image that comes with CUDA pre-installed. This simplifies the Dockerfile, reduces the chance of errors, and often results in a more optimized image. Consider using a base image from nvidia/cuda on Docker Hub, such as nvidia/cuda:<version>-cudnn-runtime-ubuntu<version>, and then installing Python and your dependencies on top of it.
| cp /var/cuda-repo-debian12-12-6-local/cuda-*-keyring.gpg /usr/share/keyrings/ && \ | ||
| apt-get update && \ | ||
| apt-get -y install cuda-toolkit-12-6 && \ | ||
| rm -rf /var/lib/apt/lists/* cuda-repo-debian11-12-6-local_12.6.3-560.35.05-1_amd64.deb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo in the name of the .deb file being removed. It says debian11 but the downloaded file is for debian12. This causes the downloaded package file to not be cleaned up, which unnecessarily increases the final Docker image size.
rm -rf /var/lib/apt/lists/* cuda-repo-debian12-12-6-local_12.6.3-560.35.05-1_amd64.deb
| try: | ||
| # Save uploaded file | ||
| with open(file_path, 'wb') as f: | ||
| f.write(file.file.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code f.write(file.file.read()) reads the entire uploaded file into memory before writing it to disk. This can lead to high memory consumption and potential out-of-memory errors for large audio files. It's better to stream the file content directly to disk in chunks. You can use shutil.copyfileobj for this.
Don't forget to add import shutil at the top of the file.
| f.write(file.file.read()) | |
| shutil.copyfileobj(file.file, f) |
|
lgtm @thainguyensunya |
Changes: