-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
404 Session Not Found error When accessing gradio via a proxy #6920
Comments
I am also experiencing this issue after bumping gradio from 3.50.2 to 4.12.0. I basically have a Gradio app deployed on a k8s cluster. Port-forwarding directly to the pod works as expected, but accessing it externally via emissary causes this same error. |
@abidlabs Do you have any ideas on what might be the issue and how to work around it, as it's blocking the major version upgrade for us? Is it related to the switch to using SSE by default in v4? Is there a way to disable it? I think this may be an important issue as more "production" Gradio apps being served on internal infra try upgrading to v4. |
Hi @shimizust ! I think the It uses nginx but I think it should apply to other proxies |
Thanks @freddyaboulton, although I think this may be a different issue. My app is running at the root of the domain already. I think this has to do with session affinity and the use of SSE when you have the app running on multiple pods. Once I configured session affinity at the ingress layer (or if you are accessing within the cluster, you would need to configure session affinity at service layer), I was able to get past the initial "An Unexpected Error Occurred" on initial load of the app. However, I'm still starting to get these 404 Session Not Found errors while using the app: These go away if I decrease the number of pods to a single pod, which isn't ideal. I'd like to be able to put multiple pods behind a load balancer. Not sure if anyone has any insights. Again, this wasn't an issue with gradio 3.x. |
@abidlabs @freddyaboulton Also see same issues when using gradio 4.17.0 on k8 even though not trying to access it directly, just across pods. 3.50.2 worked perfectly in exact same setup. Probably will have to unfortunately revert again back to 3.50.2 (I've tried 4 times to upgrade :( ). Note that we use nginx perfectly fine on 4.17.0, so it's not just a proxy issue. |
im running with fastapi too , with gradio.queue , gradio mount app sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 3600 # 1 hour tried also with gunicorn and still no success and same exception:
maybe there is a problem with my configurations, can you guys check also? |
…er build" This reverts commit c9b18d6.
…off docker build"" This reverts commit d5d9101.
…ial one-off docker build""" This reverts commit 4405324.
…for special one-off docker build"""" This reverts commit c24cc26.
@abidlabs Note that this is a regression, 3.50.2 worked fine. Should be fixed I'd hope. I'm unable to upgrade to gradio 4 due to his, event though all non-networking things are wonderful with gradio4. |
Looking into this! |
Collecting info and repro-ness. When things are bad on k8, on 4.17.0 (before nginx issue), this is one failure:
Another one is:
|
Could someone here please try installing this version of
|
I am also getting these "404: Session not found." errors all the time after upgrading to
I have tried this wheel, and it seems to make the exception go away, but I still get several "Connection errored out." popups in the UI if I refresh it a few times with F5/Ctrl+F5. This is the offending line in I don't have a simple example to reproduce the issue, but it happens all the time in the dev branch of my project, which now uses Gradio 4.19. |
Thanks @oobabooga are you running behind a proxy as well? Also when you say that you see this error after upgrading to 4.19, what version were you upgrading from? I.e what was the latest version that did not have this issue for you? |
I am not using a proxy, just launching the server with For clarity, several error popups with the message "404: Session not found." appear in the UI when the stacktrace that I posted happens. The last gradio version I used was 3.50.2, and that issue never happened there. |
I found that if I comment my |
Ok I think I know why its happening on k8, but not sure why its happening for you @oobabooga. It seems like that's a separate issue. If you are able to put together a more self-contained repro, that would be veryy appreciated. |
I have been trying to come up a minimal example to reproduce the issue, but it has been difficult. I did find that the same error has been happening in other repositories: daswer123/xtts-finetune-webui#7 It may be the case that the problem in this issue is not the use of a proxy, but that fact that events behind a proxy take longer to run, somehow triggering the error. |
Hi @abidlabs thank you so much for the prompt response. It's me again, I had to change machines but I've sth that may help reproduce the issue, please follow the steps below:
import gradio as gr
# Function that echoes the text input
def echo_text(text):
return text
# Function that displays the uploaded image
def show_image(image):
return image
# Create the text interface
text_interface = gr.Interface(fn=echo_text, inputs="text", outputs="text")
# Create the image interface
image_interface = gr.Interface(fn=show_image, inputs="image", outputs="image")
# Combine both interfaces into a tabbed interface
tabbed_interface = gr.TabbedInterface([text_interface, image_interface], ["Text Echo", "Image Display"])
# Launch the app
tabbed_interface.launch(server_name="0.0.0.0", server_port=7860)
# Use an official Python runtime as a parent image
FROM python:3.9-slim
# Set the working directory in the container
WORKDIR /usr/src/app
# Copy the current directory contents into the container at /usr/src/app
COPY . .
# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Make port 7860 available to the world outside this container
EXPOSE 7860
# Define environment variable
ENV NAME World
# Run app.py when the container launches
CMD ["python", "app.py"] You can run the docker image to test as well and it will work. Then tag and push that docker image to whichever platform you'd like, - I used ECR.
apiVersion: apps/v1
kind: Deployment
metadata:
name: whatever-you'd-like
namespace: whatever-you'd-like
labels:
app: whatever-you'd-like
spec:
replicas: 1
selector:
matchLabels:
app: whatever-you'd-like
template:
metadata:
labels:
app: whatever-you'd-like
spec:
containers:
- name: whatever-you'd-like
image: Please use the image deployed and pushed to some containers registry as explained earlier.
ports:
- containerPort: 7860
-----
apiVersion: v1
kind: Service
metadata:
name: whatever-you'd-like
namespace: whatever-you'd-like
labels:
app: whatever-you'd-like
spec:
selector:
app: whatever-you'd-like
ports:
- protocol: TCP
port: 80
targetPort: 7860
------
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: whatever-you'd-like
namespace: whatever-you'd-like
annotations:
kubernetes.io/ingress.class: "nginx"
spec:
rules:
- host: test.exampledomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: whatever-you'd-like
port:
number: 80 Make sure to configure DNS for the domain you use eg: test.exampledomain.com
You can do that directly in the requirements.txt file, build the image again, push it and update the deployment.yml file with the new image URI/... whatever you use. Everything will work well after that. I have to say that in different stages of the deployment you find the different errors many mentioned above - this is already long so I can't reproduce them all step by step but I've attached screenshots of some of them below. @abidlabs I hope this helps you get a picture of what might have changed to cause the problem. |
Subscribing to this issue. Hoping a fix can be issued soon since the newest version without this issue has known security vulnerabilities. |
@abidlabs Is there any updates on fixing this issue on versions after 4.16 ? |
Have you tried with the latest version ( |
I haven't tested out anything because the only way to test out is to push it to prod on my end which i hesitate to do in case it breaks again. @abidlabs seemed to know exactly what my specific issue was in this case as he mentioned something they changed in 4.17 that broke it for me. Maybe he can comment on wether 4.21 will work in my case and then I'll try it out. |
@arian81 if my hypothesis is correct, then the fix should be out in 4.21 and I suggest giving it a shot. That being said, it looks like the issue is not resolved with multiple replicas, as @mgirard772 pointed out so still taking a look at how ECS works. |
@abidlabs For what it's worth @edsna's suggestion of rolling back to 3.50.2 works, and that's what I have deployed in stage and production. Not sure exactly what breaking changes were made since then though. I do want to update to a newer version to resolve the security vulernabilities listed by dependabot prior to version 4.19.2, but I can't do that until this issue is resolved. |
Any update here? Is there anything I can do to help move us towards a solution? I ended up deploying 4.22.0 through to production due to security concerns, which has left my production deployment with just one replica. I'd strongly prefer to have this in HA at least in production, but in order to achieve that this issue needs to be resolved. |
I'm looking into this issue, currently with @edsna's repro. What would be helpful is if we had a repro that was fully local, e.g. using minikube. |
Can you try setting up sticky sessions (e.g. ensure that connections from the same client IP are routed to the same machine). Here's an example of how to do that: https://gist.github.com/fjudith/e8acc791f015adf6fd47e5ad7be736cb |
@abidlabs Good news. Per your second comment, I enabled stickiness in the target group for my ALB in AWS and added a replica in ECS and that seems to have resolved the issue. For reference here are settings I used in the AWS Console: For those that use terraform, you'll want to add a stickiness block like this into your target group definition stickiness {
type = "lb_cookie"
cookie_duration = 86400
enabled = true
} |
Glad it's working @mgirard772 ! We should probably make a note of that in the deployment guide |
Awesome! I’ll add this to the docs, maybe in the nginx guide (did you have a different one in mind @freddyaboulton?) In the meantime, if anyone else is still facing this issue and the above solution doesn’t work for them, please let us know so that we can investigate further if needed |
Thanks for all the efforts to investigate the issue. For others' reference, if using Emissary Ingress (previously Ambassador) to route external requests to your k8s service, you can update the Emissary mapping resource to inject a cookie to enable sticky sessions like below. I was able to deploy multiple replicas without issue this way. apiVersion: getambassador.io/v2
kind: Mapping
metadata:
name: gradio-ui-emissary-mapping
spec:
...
service: http://<service>.<namespace>:7860
load_balancer:
policy: ring_hash
cookie:
name: emissary-sticky-cookie
ttl: 7200s |
I am still getting the error that @edsna got. I am using gradio==4.26.0 and gradio-client==0.15.1. I have built docker image of the gradio app and hosted it on cloud service. When I try to access it using a URL, my app keeps loading. |
Hi @SamridhiVaid are you running multiple replicas? Can you provide more details on your app and your deployment setup? Are there any Python or JS console logs? We'll need more information to help |
Hi. I am not running multiple replicas. I have two docker containers hosted on a cloud service. One is a vllm and other is the Gradio app. The textbox on the Gradio app should display the output of the llm. I can verify that both llm and Gradio are able to communicate with each other but the only problem is that Gradio doesn't display the output. I am not sure if I should just downgrade my gradio version and check this. |
Just confirming that this issue was also occuring with Azure App Service when deployed with multiple instances, but enabling session affinity fixed the issue immediately. |
I also have issue with Azure web app! |
Describe the bug
I am running a Gradio application locally, where there's a requests request to a remote server in the click event function of a button, and the result is returned to the component.
Everything works fine, but if I turn on a proxy (Shadowsocks) to access the Gradio application, requests with short response time return normally, while requests that take longer return exceptions.
Have you searched existing issues? 🔎
Reproduction
Screenshot
Logs
System Info
Severity
Blocking usage of gradio
The text was updated successfully, but these errors were encountered: