Skip to content

Podman hangs, possible deadlock ? #16062

@lukasmrtvy

Description

@lukasmrtvy

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description
Seems that I am suffering from the very same problem as mentioned in #1574 ( 4y old ), but with the latest Podman version ( 4.2.0 ).

My scenario is a little bit different. I am running "custom scheduler" in a container with a mounted podman.socket and its running via systemd service. This scheduler is responsible for creating/managing/deleting ( there are also other operations like attaching, streaming logs, stats, etc involved ) and also for waiting for containers on behalf. Seems that when I restart this scheduler, Podman will become unavailable. Also using --userns=auto for containers created via "custom scheduler".

Steps to reproduce the issue:
( Not relevant due to its own business logic )

  1. systemctl start scheduler.service ( podman run -v "/run/podman/podman.sock:/run/podman/podman.sock" myscheduler )

  2. systemctl restart scheduler.service

Describe the results you received:
Podman is not accessible via CLI and API

Describe the results you expected:
Podman API is accessible via CLI and API

Additional information you deem important (e.g. issue happens only occasionally):

Related logs:
journald -u podman --no-pager | grep 04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944

...
Oct 05 16:50:32 ip-10-2-22-82 podman[5526]: @ - - [05/Oct/2022:16:50:27 +0000] "POST /containers/04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944/start HTTP/1.1" 204 0 "" ""
Oct 05 16:50:37 ip-10-2-22-82 podman[5526]: @ - - [05/Oct/2022:16:50:32 +0000] "GET /containers/04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944/json HTTP/1.1" 200 9033 "" ""
Oct 05 17:00:32 ip-10-2-22-82 podman[5526]: @ - - [05/Oct/2022:16:50:22 +0000] "POST /containers/04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944/attach?stream=true&stdout=true&stderr=true HTTP/1.1" 200 0 "" ""
Oct 05 17:00:32 ip-10-2-22-82 podman[5526]: 2022-10-05 17:00:32.984086144 +0000 UTC m=+1448.094378702 container died 04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944 (image=XXXX.dkr.ecr.us-east-1.amazonaws.com/XXXX:0NkyJzxqJQWVq2aet, name=lucid_mclaren, health_status=)
Oct 05 17:00:52 ip-10-2-22-82 podman[5526]: @ - - [05/Oct/2022:16:50:32 +0000] "POST /containers/04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944/wait HTTP/1.1" 200 30 "" "
...

ps -ef | grep 04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944

...
root       13829       1  0 16:50 ?        00:00:00 /usr/bin/conmon --api-version 1 -c 04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944 -u 04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944 -r /usr/bin/crun -b /var/lib/containers/storage/overlay-containers/04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944/userdata -p /run/containers/storage/overlay-containers/04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944/userdata/pidfile -n lucid_mclaren --exit-dir /run/libpod/exits --full-attach -s -l k8s-file:/var/lib/containers/storage/overlay-containers/04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944/userdata/ctr.log --log-level info --log-size-max 10485760 --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/containers/storage/overlay-containers/04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944/userdata/oci-log --conmon-pidfile /run/containers/storage/overlay-containers/04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /run/containers/storage --exit-command-arg --log-level --exit-command-arg info --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --network-config-dir --exit-command-arg  --exit-command-arg --network-backend --exit-command-arg netavark --exit-command-arg --volumepath --exit-command-arg /var/lib/containers/storage/volumes --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev,metacopy=on --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944
root       44396   13829  0 17:00 ?        00:00:00 /usr/bin/podman --root /var/lib/containers/storage --runroot /run/containers/storage --log-level info --cgroup-manager systemd --tmpdir /run/libpod --network-config-dir  --network-backend netavark --volumepath /var/lib/containers/storage/volumes --runtime crun --storage-driver overlay --storage-opt overlay.mountopt=nodev,metacopy=on --events-backend journald container cleanup 04b5091d21ad06885e48915d9539997c1b58745a52e3178ce12995cf6b82f944
...

lslocks | grep 44396

...
podman   5482 POSIX      WRITE  0     0   0 /var/lib/containers/storage/userns.lock
podman  44396 POSIX  64B WRITE* 0     0   0 /var/lib/containers/storage/storage.lock
podman   5482 POSIX  64B WRITE  0     0   0 /var/lib/containers/storage/storage.lock
podman   5482 POSIX  64B WRITE  0     0   0 /var/lib/containers/storage/overlay-layers/layers.lock
podman   5482 POSIX  64B WRITE  0     0   0 /var/lib/containers/storage/overlay-images/images.lock
...

ps -ef | grep 5482

root        5482       1 81 19:09 ?        00:17:37 /usr/bin/podman --log-level=info system service

strace podman ps

...
futex(0x56081b801868, FUTEX_WAIT_PRIVATE, 0, NULL

Output of podman version:

Client:       Podman Engine
Version:      4.2.0
API Version:  4.2.0
Go Version:   go1.18.4
Built:        Thu Aug 11 14:42:17 2022
OS/Arch:      linux/amd64

Output of podman info:

host:
  arch: amd64
  buildahVersion: 1.27.0
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-2.fc36.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpuUtilization:
    idlePercent: 92.99
    systemPercent: 1.91
    userPercent: 5.09
  cpus: 16
  distribution:
    distribution: fedora
    variant: coreos
    version: "36"
  eventLogger: journald
  hostname: ip-10-2-22-82
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.18.18-200.fc36.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 123904901120
  memTotal: 133529747456
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.5-1.fc36.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.5
      commit: 54ebb8ca8bf7e6ddae2eb919f5b82d1d96863dea
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 1h 25m 45.00s (Approximately 0.04 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 34
    paused: 0
    running: 0
    stopped: 34
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 1098974756864
  graphRootUsed: 12981538816
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 11
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.2.0
  Built: 1660228937
  BuiltTime: Thu Aug 11 14:42:17 2022
  GitCommit: ""
  GoVersion: go1.18.4
  Os: linux
  OsArch: linux/amd64
  Version: 4.2.0

Package info (e.g. output of rpm -q podman or apt list podman or brew info podman):

podman-4.2.0-2.fc36.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):
AWS EC2 instance, /var/lib/containers/storage/ mounted from EBS running Fedora CoreOS 36.20220820.3.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.locked - please file new issue/PRAssist humans wanting to comment on an old issue or PR with locked comments.stale-issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions