-
-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: wireguard - adding ipv6 rule: file exists #2521
Comments
@qdm12 is more or less the only maintainer of this project and works on it in his free time.
|
Same issue found recently, can't trace exactly when it started, but I'm getting the same logs. |
Oddly it seems that ipv6 rule exists before Gluetun does anything, not too sure why, let's try to find why first.
PS: in case this cannot be fixed, I can change the code to consider "file exists" as yep it has been created ok, but I would prefer to understand the root cause if possible since this isn't normal behavior really. Also I'm inclined to think this is a host system/kernel problem, since other users are running gluetun with ipv6 just fine. |
I went the extra yard (not mile yet 😄) to have an image tag |
Cool. Let me give it a go. |
Result of LOG_LEVEL=debug
Result of ip -6 rule within the container command
Result of podman run --rm --cap-add NET_ADMIN alpine:3.20 ip -6 rule
|
Just to add to the above info. At the moment while Gluetun isn't connected to the rest of the containers that I am running on Fedora there is a VPN connected to the Host of the containers via opnSENSE Wireguard Selective Routing. So maybe that is affecting the containers . Additionally, I run ULAs IPv6 Addresses internally on my LAN. So that is why you may see it is successful in the logs, but I am not 100% sure if that will affect it. |
@Ttfgggf Wait I'm a bit confused, why is the container not crashing in the last logs you shared with the error |
Not sure to be honest, but it has crashed. Could SELinux be affecting it? Right now nothing is connected to the gluetun container. But there is another AirVPN connection to the machine hosting gluetun is using in the meantime. With a Local ULA for IPV6 and an IPV4 address. |
I'm running into the same issue. Tried to run pr-2526 image but get the same behavior.
With provider:custom is fails straight away, when I set the provider to protonvpn the VPN connects, everything works for between 5 tot 20 minutes. Qbittorrent can download with 200mbps in that time, then the VPN becomes unhealthy, restarts and 'bootloops' with the same iptables file exists error from which it never recovers (unless i manually restart the pod, then it works again for sometime before it fails again). |
Thanks @leovanalphen for trying that image! 👍
There is no fix in the image, it just logs out existing rules if adding a rule fails with I've updated the |
@qdm12 No worries, glad to be able to contribute in some way. Thank you for sharing your work with all of us. I just repulled pr-2526, waited a couple of minutes for the VPN to become unhealthy. And to my surprise this time the HC kicked in, restarted the VPN and it came back up first try. My test transfer over the VPN just kept running with a barely noticeable temporary slow down. So far it has recovered without issue three times. Added my logs below. I haven't changed anything in my setup other than repull pr-2526. For completeness, I'm running on Kubernetes 1.30 using Talos as underlying OS. Chart I'm using as a base is from truecharts, I just edited the image url to point to pr-2526.
|
Thanks @leovanalphen that's really not what I was expecting! Anyone else can try? FYI that PR (files changes: https://github.com/qdm12/gluetun/pull/2526/files) does not really change anything except log rules when it errors. Kind of weird it's fixed now. |
I have no idea what's going on, but I noticed the VPN stopped working again a couple of days ago. I just redeployed (also recreating the container) and the VPN doesn't come up anymore with the firewall rule already exists error:
I'm trying to think if I changed anything anywhere on my side but I can't come up with anything that would influence this container in this way. |
👍 my apologies, can you pull the image tag |
@qdm12 I think the logs I posted above should contain that logging. I just repulled the pr-2526 tag, but when I start the container I get the same version string in the logs as my post directly preceding this one.
Unless I am missing something, I do get a pull event (Successfully pulled image "qmcgaw/gluetun:pr-2526" in 21ms (21ms including waiting). Image size: 13660822 bytes.) so I am fairly certain it repulled and recreated. Edit: Wanted to add for completeness, though it might already be obvious to you, or I might be completely wrong, but if I disable port forwarding the container works fine, other than that it has to reconnect a bit more than i'd like. It seems to me the issue might be in the port forwarding code. |
Please re-pull I think? I just pulled it and the version log line says
Noted, although port forwarding doesn't do anything with ip rules (it does with iptables chains/rules though), so that's unlikely but let's see! |
@qdm12 My apologies. Turns out kubernetes does not actually repull images it has in the cache, even with imagePullPolicy: Always. With some help from stackoverflow I figured out I can specify an image digest to force a repull, and that worked. Still learning new things everyday... I also separated the gluetun image from my truecharts helm qbittorrent deployment. I'm now just started the gluetun container by itself without helm by just using a kubernetes manifest file and using kubectl apply -f. Since in the helm deployment it is started as a sidecar container and the qbittorent container has bunch of volume mounts which I was unsure if it could influence the gluetun behavior. I added the pod specification I used to start the container, and the container logs below. Pod Spec:
And the pod logs:
Hope this is what you're looking for. |
Nice, thank you! Looking at the first 'crash':
It looks like both IPv4 AND IPv6 have the rule One last thing out of curiosity, can you change the entrypoint of Gluetun to be Finally, since those rules seem to be there no matter what, I pushed e92d07f to that PR image tag, where it now simply considers the |
Hey @qdm12 -- I have a belated response to #2471 (comment). My initial proposed solution did not hold, but I believe I may have discovered the cause. Something that I did not realize prior was that iptables entries are not ephemeral to a single pod (and the containers in it), but seem to be shared by all containers on the machine. I think the iptables rule conflcit is being cause by 2 things:
The second one seems particularly tricky -- even deleting the rule with the pod starts up is not guaranteed to fix the issue because the first instance of the pod may delete the newer version of the rule created by the second pod before it's replaced. For my current workaround, I had to both add a post-start hook to remove the rule (to account for the first scenario) and changed my deployment strategy to "Recreate" (to account for the second. So far, this has worked for a couple of weeks now without issue. I've been using the following deployment for a few weeks now (which has the bonus of automatically configuring the forwarded_port) and it seems to be working excellently: apiVersion: apps/v1
kind: Deployment
metadata:
name: qbittorrent
labels:
app: qbittorrent
annotations:
network.beta.kubernetes.io/ipv6: "false"
spec:
strategy:
type: Recreate
replicas: 1
selector:
matchLabels:
app: qbittorrent
template:
metadata:
labels:
app: qbittorrent
spec:
containers:
- name: qbittorrent
image: linuxserver/qbittorrent:5.0.1-r0-ls362
env:
- name: PUID
value: "1000"
- name: GID
value: "1000"
- name: DOCKER_MODS
value: ghcr.io/vuetorrent/vuetorrent-lsio-mod:latest
ports:
- containerPort: 8080
- containerPort: 6881
protocol: TCP
- containerPort: 6881
protocol: UDP
volumeMounts:
- name: qbittorent-config
mountPath: /config
- name: media
mountPath: /media
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1
memory: 1Gi
livenessProbe:
httpGet:
path: /#/
port: 8080
initialDelaySeconds: 300
periodSeconds: 15
failureThreshold: 2
- name: vpn
image: qmcgaw/gluetun:v3.39.0
securityContext:
capabilities:
add:
- NET_ADMIN
env:
- name: VPN_SERVICE_PROVIDER
value: "protonvpn"
- name: VPN_TYPE
value: "wireguard"
- name: WIREGUARD_PRIVATE_KEY
valueFrom:
secretKeyRef:
name: protonvpn-credentials
key: wiregaurd-private-key
- name: SERVER_COUNTRIES
value: <REDACTED>
- name: PORT_FORWARD_ONLY
value: "on"
- name: VPN_PORT_FORWARDING
value: "on"
- name: TZ
value: <REDACTED>
# - name: LOG_LEVEL
# value: "debug"
resources:
requests:
cpu: 250m
memory: 1Gi
limits:
cpu: 500m
memory: 1Gi
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "(ip rule del table 51820; ip -6 rule del table 51820) || true"]
startupProbe:
exec:
command:
- "/bin/sh"
- "-c"
- >
set -eu;
PORT=$(cat /tmp/gluetun/forwarded_port);
wget --header="Content-Type: application/x-www-form-urlencoded" \
--post-data='json={"listen_port": '$PORT'}' \
--output-document - \
http://localhost:8080/api/v2/app/setPreferences;
periodSeconds: 30
failureThreshold: 10
livenessProbe:
exec:
command: ["/gluetun-entrypoint", "healthcheck"]
initialDelaySeconds: 500
periodSeconds: 30
failureThreshold: 3
volumes:
- name: qbittorent-config
nfs:
server: <REDACTED>
path: <REDACTED>
- name: media
nfs:
server: <REDACTED>
path: <REDACTED> I believe the fix in your PR would help ignore the first scenario. I think it would probably still help with the 2nd, as gluten seems to attempt to recreate the rule if it is missing? |
That's awesome, congratulations for digging this out! 🎖️
It doesn't really check for now, it only keeps the state of rules within itself, so it assumes it's blank at start even if it's not.
Note it's an ip rule, not an iptables rule.
The PR would fix the first one, and also a faster, more deterministic shutdown would help (I'm working on this slowly, reworking all 'run loops' within gluetun). |
Is this urgent?
None
Host OS
Fedora 40
CPU arch
x86_64
VPN service provider
AirVPN
What are you using to run the container
Podman
What is the version of Gluetun
Running version latest built on 2024-10-11T18:31:08.386Z (commit abe9dcb)
What's the problem 🤔
The problem is a similar to one to #1991.
I made a change to my Podman Quadlet file and it stopped working although it was working before.
Share your logs (at least 10 lines)
Share your configuration
The text was updated successfully, but these errors were encountered: