Skip to content

Bug: AIO Helm: notify-push stuck waiting due to unresponsive port 9001 on nextcloud #6334

@RAPHCVR

Description

@RAPHCVR

Bug: nextcloud-aio-notify-push stuck waiting as port 9001 on nextcloud-aio-nextcloud becomes unresponsive (AIO Helm Chart)

Area: AIO / Helm Chart / Kubernetes

Describe the bug

When running Nextcloud AIO deployed via the official Helm chart on Kubernetes, the nextcloud-aio-notify-push pod intermittently fails to start or run correctly. It gets stuck logging Waiting for Nextcloud to start....

This appears to be caused by the nextcloud-aio-nextcloud pod/service becoming unresponsive on TCP port 9001 after running fine for a period (ranging from days to weeks). The start.sh script within the nextcloud-aio-notify-push container specifically waits for connectivity to $NEXTCLOUD_HOST (which resolves to the nextcloud-aio-nextcloud service) on port 9001 before proceeding:

# From notify-push start.sh
# Only start container if nextcloud is accessible
while ! nc -z "$NEXTCLOUD_HOST" 9001; do
    echo "Waiting for Nextcloud to start..."
    sleep 5
done

While port 9001 becomes unresponsive, the main Nextcloud interface served by the same nextcloud-aio-nextcloud pod on port 9000 remains accessible and functional. However, features relying on notify-push, such as Talk connections or calls, may fail.

Steps to reproduce

The issue is intermittent, making exact reproduction steps difficult, but the pattern is:

  1. Deploy Nextcloud AIO using the official Helm chart on a Kubernetes cluster.
  2. Ensure the notify-push component is enabled and deployed.
  3. The system runs correctly for an indeterminate amount of time (days or weeks).
  4. Eventually, the nextcloud-aio-notify-push pod (if restarted, or potentially during operation) starts logging Waiting for Nextcloud to start....
  5. Attempting to connect to the nextcloud-aio-nextcloud service/pod IP on port 9001 from another pod fails (e.g., nc -zv nextcloud-aio-nextcloud 9001 times out or gets connection refused).
  6. Attempting to connect to the nextcloud-aio-nextcloud service/pod IP on port 9000 succeeds (e.g., nc -zv nextcloud-aio-nextcloud 9000 reports success).

Expected behavior

The nextcloud-aio-nextcloud pod should consistently listen and respond on port 9001, allowing the nextcloud-aio-notify-push service to connect and function reliably.

Actual behavior

The process listening on port 9001 within the nextcloud-aio-nextcloud container appears to stop or crash intermittently, making the port unreachable and blocking nextcloud-aio-notify-push.

Log entries

  • nextcloud-aio-notify-push pod logs:

    Waiting for Nextcloud to start...
    Waiting for Nextcloud to start...
    [...]
    
  • Diagnostic commands (run from another pod in the cluster when the issue occurs):

    # Fails
    $ nc -zv nextcloud-aio-nextcloud 9001
    nc: connect to nextcloud-aio-nextcloud (10.x.x.x) port 9001 (tcp) failed: Connection timed out (or refused)
    
    # Succeeds
    $ nc -zv nextcloud-aio-nextcloud 9000
    Connection to nextcloud-aio-nextcloud (10.x.x.x) 9000 port [tcp/*] succeeded!
  • nextcloud-aio-nextcloud pod logs: Standard container logs (kubectl logs ...) at the time of the failure have not yet revealed a clear cause in the cases observed so far. Further investigation of the internal nextcloud.log within the data volume might be needed when the issue occurs. The startup sequence seems normal otherwise (example startup logs can be provided if needed).

Environment

  • Installation method: Official Nextcloud AIO Helm Chart on Kubernetes.
  • Nextcloud Server version: 30.0.9 (latest today)
  • Relevant AIO Components: nextcloud-aio-nextcloud, nextcloud-aio-notify-push

Workaround

Deleting the nextcloud-aio-nextcloud pod (kubectl delete pod <pod-name> -n <namespace>) forces Kubernetes to recreate it. Upon restart, the process listening on port 9001 becomes available again, and nextcloud-aio-notify-push can successfully connect and start. This workaround is temporary, as the issue tends to reappear later.

Potential Cause / Analysis

It seems likely that an internal process or component within the nextcloud-aio-nextcloud container, specifically responsible for handling connections on port 9001 (required by notify-push), is unstable and crashes or stops running after some time under certain conditions. Identifying this specific process and the reason for its failure is key.

Related Information

This issue appears identical to the one reported in the Nextcloud Help forum:
https://help.nextcloud.com/t/nextcloud-stops-accepting-connections-on-port-9001-after-a-while-helm-chart/218295

Metadata

Metadata

Assignees

No one assigned

    Labels

    2. developingWork in progressbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions