Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres STS fails to start - mkdir: cannot create directory ‘/bitnami/postgresql/data’ #522

Open
kbristow opened this issue Jul 11, 2023 · 11 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale

Comments

@kbristow
Copy link

kbristow commented Jul 11, 2023

Expected Behavior

Postgres created as part of the release manifests would start up successfully.

Actual Behavior

The postgres pod does not become healthy. The pod fails and enters crashloopbackoff with the following error in the logs:

mkdir: cannot create directory ‘/bitnami/postgresql/data’: Permission denied

This appears similar to this issue: bitnami/charts#1210

Investigating the recommendation here i added an init container that looks as follows which resolves the issue:

initContainers:
  - name: init-chmod-data
    image: docker.io/bitnami/bitnami-shell:11-debian-11-r130
    imagePullPolicy: "IfNotPresent"
    resources:
      limits: {}
      requests: {}
    command:
      - /bin/sh
      - -ec
      - |
        chown 1001:1001 /bitnami/postgresql
        mkdir -p /bitnami/postgresql/data
        chmod 700 /bitnami/postgresql/data
        find /bitnami/postgresql -mindepth 1 -maxdepth 1 -not -name "conf" -not -name ".snapshot" -not -name "lost+found" | \
          xargs -r chown -R 1001:1001
    securityContext:
      runAsUser: 0
    volumeMounts:
      - name: postgredb
        mountPath: /bitnami/postgresql

I took the above from the output of running the below (with slight modifications):

helm template my-release oci://registry-1.docker.io/bitnamicharts/postgresql --set volumePermissions.enabled=true

Note that I am using EKS with ebs volumes and a storage class as below if useful:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-sc
allowVolumeExpansion: true
provisioner: ebs.csi.aws.com
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer

Steps to Reproduce the Problem

  1. Install Tekton Results (v0.7.0) on eks 1.23 with ebs volume
  2. Observe error with postgres pod

Additional Info

  • Kubernetes version:

    Output of kubectl version:

Server Version: v1.23.17-eks-c12679a
  • Tekton Pipeline version:
v0.38.4
@kbristow kbristow added the kind/bug Categorizes issue or PR as related to a bug. label Jul 11, 2023
@xinnjie
Copy link
Contributor

xinnjie commented Jul 12, 2023

Did you ever create PVC to interact with Postgres manually? This may cause different file permission since your reclaim policy of storage class is Retain.

If the data is only for test, try:

  1. Delete Postgres deployment (or entire Results deployment).
  2. Delete pv that Postgres used.
  3. Redeploy Postgres (or Results)

@kbristow
Copy link
Author

I have tried the above but the same issue occurs. I am happy to use the fix i did as per my issue., I wanted to raise it as a potential issue others using Tekton Results may run into.

@xinnjie
Copy link
Contributor

xinnjie commented Jul 12, 2023

I'm a little curious what's the environment difference make you encounter this problem.
Could you

  1. create a running pod mount the pvc that Postgres used
  2. attach to the pod3
  3. use ls -l /bitnami/postgresql to check who is owner of that directory?

It would be weird If the directory belongs to root. The deployment configuration never uses root privilege. Then could be ebs default file permission.

Another approach would be modifying StorageClass mountOptions:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-sc
allowVolumeExpansion: true
provisioner: ebs.csi.aws.com
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
mountOptions:
  - uid=1001
  - gid=1001

@kbristow
Copy link
Author

Looks like root ownership is correct and is the root cause of the issue:

$ ls -l /bitnami/postgresql
total 16
drwx------ 2 root root 16384 Jul 13 13:43 lost+found

I guess that is how ebs volumes are permissioned by default. Is that something you want to cater for in your default release manifest? Whilst I probably wont be using the postgres created via the Results release manifest, for users that want to try Results, it may be worthwhile putting something in to handle this permission issue incase?

Either way, happy to close the issue from my side if there is not something further you want me test.

@xinnjie
Copy link
Contributor

xinnjie commented Jul 14, 2023

The default Results release does require right file permissions in volume implicitly.

Whilst I probably wont be using the postgres created via the Results release manifest, for users that want to try Results, it may be worthwhile putting something in to handle this permission issue incase?

Yes, agreed with you. Especially for users want to try out in environment provided by cloud providers, the default file permission strategy varies depending on which storage they use.

It would be appreciated if you could make a PR for it, document this potential permission issue and handle the permission (of course you cloud let me do it if you don't want to).

We could close this issue after merging that PR.

@kbristow
Copy link
Author

I am not going to be around until next Wednesday so happy for you to do the PR. To add, once you mentioned that the postgres is running as user 1001, I realised I could just set spec.template.spec.securityContext.fsGroup: 1001 on the postgres sts which also resolves my issue. That seems like a better solution and probably doesnt need any documentation changes anything either. Thoughts?

@xinnjie
Copy link
Contributor

xinnjie commented Jul 15, 2023

Ok let me do it.

@tekton-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@gerrnot
Copy link

gerrnot commented Mar 25, 2024

The same happens for the logs PV:

{"level":"error","ts":1711353330.4722874,"caller":"zap/options.go:212","msg":"finished streaming call with code Unknown","grpc.auth_disabled":false,"grpc.start_time":"2024-03-25T07:55:29Z","system":"grpc","span.kind":"server","grpc.service":"tekton.results.v1alpha2.Logs","grpc.method":"UpdateLog","peer.address":"10.248.106.207:48550","grpc.user":"system:serviceaccount:tekton-pipelines:tekton-results-watcher","grpc.issuer":"https://kubernetes.default.svc.cluster.local","error":"failed to create directory /logs/yournamespace/4c90c662-6e12-3c8a-b6ef-4e8f3eb8b23f, mkdir /logs/yournamespace: permission denied","grpc.code":"Unknown","grpc.time_duration_in_ms":951,"stacktrace":"github.com/grpc-ecosystem/go-grpc-middleware/logging/zap.DefaultMessageProducer\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/logging/zap/options.go:212\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/zap.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/logging/zap/server_interceptors.go:61\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware/tags.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/tags/interceptors.go:39\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:58\ngoogle.golang.org/grpc.(*Server).processStreamingRPC\n\tgoogle.golang.org/grpc@v1.60.1/server.go:1673\ngoogle.golang.org/grpc.(*Server).handleStream\n\tgoogle.golang.org/grpc@v1.60.1/server.go:1787\ngoogle.golang.org/grpc.(*Server).serveStreams.func2.1\n\tgoogle.golang.org/grpc@v1.60.1/server.go:1016"}

@zbialik
Copy link

zbialik commented Oct 1, 2024

I am not going to be around until next Wednesday so happy for you to do the PR. To add, once you mentioned that the postgres is running as user 1001, I realised I could just set spec.template.spec.securityContext.fsGroup: 1001 on the postgres sts which also resolves my issue. That seems like a better solution and probably doesnt need any documentation changes anything either. Thoughts?

Can we get this implemented, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale
Projects
None yet
Development

No branches or pull requests

5 participants