Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

host 127.0.0.1:9000: server update failed with: open /usr/bin/.minio.check-perm: permission denied, do not restart the servers yet #2305

Closed
pschichtel opened this issue Aug 31, 2024 · 21 comments

Comments

@pschichtel
Copy link
Contributor

I tried upgrading to MinIO RELEASE.2024-08-29T01-40-52Z (from RELEASE.2024-08-17T01-24-54Z) using operator 6.0.3 on my single-node home lab after #2229 has been fixed in recent images, just to hit a new issue: The operator seems to be unable to update the image due to failing a permission check, the operator logs host 127.0.0.1:9000: server update failed with: open /usr/bin/.minio.check-perm: permission denied, do not restart the servers yet and also sets that as the tenant's currentState.

This currently prevents me from upgrading at all.

Expected Behavior

Operator upgrades the tenant.

Current Behavior

Tenant repeatedly switches between the states host 127.0.0.1:9000: server update failed with: open /usr/bin/.minio.check-perm: permission denied, do not restart the servers yet and Updating MinIO Version. The STS also hasn't been touched.

Possible Solution

No clue

Steps to Reproduce (for bugs)

  1. Deploy single node k0s with operator 6.0.3 and MinIO RELEASE.2024-08-17T01-24-54Z
  2. Update the Tenant to version RELEASE.2024-08-29T01-40-52Z

Context

This prevents me form upgrading.

Regression

y

Your Environment

  • Version used (minio-operator): 6.0.3
  • Environment name and version (e.g. kubernetes v1.17.2): v1.30.4+k0s
  • Server type and version: RELEASE.2024-08-17T01-24-54Z
  • Operating System and version (uname -a): Linux server 6.10.6-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 19 Aug 2024 17:02:39 +0000 x86_64 GNU/Linux
  • Link to your deployment file: n/a
@pschichtel
Copy link
Contributor Author

I can't find /usr/bin/.minio.check-perm in other the tenant or the operator containers, but I guess it might be temporary.

@jiuker
Copy link
Contributor

jiuker commented Aug 31, 2024

Share the tanant plz.

@pschichtel
Copy link
Contributor Author

@jiuker this is how I configure the tenant chart:

tenant:
    name: tenant-name
  image:
    tag: 'RELEASE.2024-08-17T01-24-54Z' # updated to 'RELEASE.2024-08-29T01-40-52Z'
  configuration:
    name: env-configuration
  configSecret:
    name: env-configuration
  pools:
  - servers: 1
    name: pool-0
    volumesPerServer: 1
    size: 200Gi
  metrics:
    enabled: true
  certificate:
    requestAutoCert: false
  env:
  - name: MINIO_DOMAIN
    value: "s3.example.org"
  - name: MINIO_BROWSER_REDIRECT_URL
    value: "https://console.s3.example.org"
  - name: MINIO_SERVER_URL
    value: "https://s3.example.org"
  - name: MINIO_PROMETHEUS_AUTH_TYPE
    value: public
  prometheusOperator: true
ingress:
  api:
    enabled: true
    host: s3.example.org
    tls:
    - hosts:
      - s3.example.org
      secretName: tenant-api-tls
  console:
    enabled: true
    host: console.s3.example.org
    tls:
    - hosts:
      - console.s3.example.org
      secretName: tenant-console-tls

@pschichtel
Copy link
Contributor Author

I assume it attempts to create a file next to the minio binary in order to verify if copying in the new binary would succeed?

@jiuker
Copy link
Contributor

jiuker commented Sep 1, 2024

Let me debug first

@harshavardhana
Copy link
Member

I republished the containers and fixed it @pschichtel - someone moved the binaries to /usr/bin instead of /opt/bin and broke this.

@harshavardhana
Copy link
Member

The tests we have for container upgrade is not handling this properly and we need more deterministic way of making sure ServerUpdate() API works via Operator properly.

// cc @jiuker @pjuarezd

@pschichtel
Copy link
Contributor Author

@harshavardhana I just attempted the update again, but realized that unless you also re-released the old image it wouldn't help me, because the upgrade process starts in the old image. I've just manually patched the STS to use the new image, so the next upgrade should work fine then.

@harshavardhana
Copy link
Member

@harshavardhana I just attempted the update again, but realized that unless you also re-released the old image it wouldn't help me, because the upgrade process starts in the old image. I've just manually patched the STS to use the new image, so the next upgrade should work fine then.

Correct, I did that for others who have not upgraded yet @pschichtel

@Vovcharaa
Copy link

Vovcharaa commented Sep 2, 2024

@harshavardhana Can you provide instructions on how to recover minio from this state?
I can't just run chmod 1777 /usr/bin in pod because I need to exec with root in this case.

@pschichtel
Copy link
Contributor Author

pschichtel commented Sep 2, 2024

my approach will be:

  1. start the upgrade via operator (i.e. change the image tag in the tenant)
  2. manually replace the image.tag in the STS once the tenant is in updating state
  3. delete all pods of the STS (to avoid issues due to different versions acting at the same time)

the operator does not appear to interfere as this is pretty much what the operator would do if the in-place upgrade process were to complete successfully.

@Vovcharaa
Copy link

@pschichtel After manually changing the image tag in STS k8s, pods were replaced automatically by k8s, but after that, the operator triggered pod replacement once again and update was completed successfully.

Thanks for the quick response!

@dharapvj
Copy link

dharapvj commented Sep 5, 2024

I republished the containers and fixed it @pschichtel - someone moved the binaries to /usr/bin instead of /opt/bin and broke this.

To understand correctly - how do I get these new republished containers?
I wanted to test the LDAP console login fix but when I use RELEASE.2024-08-29T01-40-52Z my tenant gets into this state. I am not sure why my k8s environment didn't download the right image. Where has this been published? quay? dockerhub?

@pschichtel
Copy link
Contributor Author

The image will not help you get out of the state, it only prevents you from getting into it, when upgrading from a version that didn't have the issue yet.

This is what has been successfully used to get out of the state: #2305 (comment)

@huguesgr
Copy link

Hi, can someone point to me which operator image has the fix?

@pschichtel
Copy link
Contributor Author

The issue was not fixed in the operator, instead the minio image was fixed to match the operator's expectations. It's been fixed for a while now, any of the recent minio versions and I guess any 6.x operator should work.

@fmt-Println-MKO
Copy link

I did as suggested, the STS is at the latest release now, however, the operator says: Statefullset not controlled by operator.
is there anything missing?

@ramondeklein
Copy link
Contributor

It looks the statefulset is not owned by the tenant resource. See this code. Can you run kubectl -n <ns> get sts <sts-name> and check if the resource is owned by the tenant?

@dhess
Copy link

dhess commented Oct 13, 2024

We're using the operator and tenant Helm charts with Argo CD. We just upgraded to v6.0.4 and hit this issue — how can we fix it?

@fmt-Println-MKO
Copy link

I just deleted the stateful set now, the operator recreates it, and this worked.
but take care not to delete the pods from the stateful set.

@dhess
Copy link

dhess commented Oct 13, 2024

I just deleted the stateful set now, the operator recreates it, and this worked. but take care not to delete the pods from the stateful set.

Thanks! That was a bit scary, but it does appear to have worked for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants