Skip to content

chore: Remove hardcoded uid and gid #575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 6, 2025
Merged

Conversation

lfrancke
Copy link
Member

@lfrancke lfrancke commented May 31, 2025

Description

Part of stackabletech/issues#651

Remove hardcoded uid and gid, they'll default to the ones from the docker images now.
For 25.7 that means they might change from 1000/0
See stackabletech/docker-images#916 for details

Definition of Done Checklist

  • Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant
  • Please make sure all these things are done and tick the boxes

Author

  • Integration tests passed (for non trivial changes)

Reviewer

  • Changelog updated

Acceptance

  • Proper release label has been added

@lfrancke lfrancke force-pushed the feat/hardcoded-uid-gid branch 2 times, most recently from 4c945f0 to eaffcd2 Compare May 31, 2025 17:05
@lfrancke lfrancke marked this pull request as ready for review May 31, 2025 17:15
@lfrancke lfrancke moved this to Development: Waiting for Review in Stackable Engineering May 31, 2025
@lfrancke lfrancke self-assigned this May 31, 2025
@lfrancke lfrancke enabled auto-merge May 31, 2025 17:38
@lfrancke lfrancke force-pushed the feat/hardcoded-uid-gid branch from eaffcd2 to 02ff8d9 Compare June 1, 2025 07:20
@razvan razvan self-requested a review June 2, 2025 09:32
@razvan razvan moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Jun 2, 2025
razvan
razvan previously approved these changes Jun 2, 2025
Copy link
Member

@razvan razvan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lfrancke lfrancke added this pull request to the merge queue Jun 2, 2025
@lfrancke lfrancke removed this pull request from the merge queue due to a manual request Jun 2, 2025
@lfrancke
Copy link
Member Author

lfrancke commented Jun 2, 2025

Thanks! I've removed the PR from the merge queue until the test succeeds

@razvan
Copy link
Member

razvan commented Jun 2, 2025

Problems on OpenShift:

$ make run-dev
...
$ ./scripts/run-tests --skip-delete --test-suite openshift --skip-operator spark-k8s --test spark-history-server_openshift-true_spark-3.5.5_s3-use-tls-true

The history STS is not created because:

  Warning  FailedCreate      18s (x16 over 3m2s)  statefulset-controller  create Pod spark-history-node-default-0 in StatefulSet spark-history-node-default failed error: pods "spark-histor
y-node-default-0" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .sp
c.securityContext.fsGroup: Invalid value: [[]int64{1000}: 1000 is not an allowed group, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbi
dden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "listener-scc": Forbidden: not usable by user or serviceaccount, p
rovider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetw
ork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or s
erviceaccount, provider "stackable-secret-operator-scc": Forbidden: not usable by user or serviceaccount, provider "hostpath-provisioner": Forbidden: not usable by user or serviceaccount,
provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

@razvan razvan self-requested a review June 2, 2025 13:24
@lfrancke
Copy link
Member Author

lfrancke commented Jun 2, 2025

Thanks. This probably means all others are broken too. I don't have time today but I'll try to take a look tomorrow.

@razvan
Copy link
Member

razvan commented Jun 2, 2025

Unrelated but I thought I'd fix here f04723b

Fixes:

2025-06-02T12:50:09.487825Z ERROR stackable_operator::logging::controller: Failed to reconcile object controller.name="history.spark.stackable.tech" error=reconciler for object SparkHistor
yServer.v1alpha1.spark.stackable.tech/spark-history.kuttl-test-included-lab failed error.sources=[failed to apply global RoleBinding, failed to apply patch, unable to patch resource "spark
-history-rolebinding", ApiError: rolebindings.rbac.authorization.k8s.io "spark-history-rolebinding" is forbidden: user "system:serviceaccount:stackable-operators:spark-k8s-operator-service
account" (groups=["system:serviceaccounts" "system:serviceaccounts:stackable-operators" "system:authenticated"]) is attempting to grant RBAC permissions not currently held:
{APIGroups:["security.openshift.io"], Resources:["securitycontextconstraints"], ResourceNames:["nonroot-v2"], Verbs:["use"]}: Forbidden (ErrorResponse { status: "Failure", message: "rolebi
ndings.rbac.authorization.k8s.io \"spark-history-rolebinding\" is forbidden: user \"system:serviceaccount:stackable-operators:spark-k8s-operator-serviceaccount\" (groups=[\"system:servicea
ccounts\" \"system:serviceaccounts:stackable-operators\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroups:[\"security.openshift.io\"], Res
ources:[\"securitycontextconstraints\"], ResourceNames:[\"nonroot-v2\"], Verbs:[\"use\"]}", reason: "Forbidden", code: 403 }), rolebindings.rbac.authorization.k8s.io "spark-history-rolebin
ding" is forbidden: user "system:serviceaccount:stackable-operators:spark-k8s-operator-serviceaccount" (groups=["system:serviceaccounts" "system:serviceaccounts:stackable-operators" "syste
:authenticated"[]) is attempting to grant RBAC permissions not currently held:
{APIGroups:["security.openshift.io"], Resources:["securitycontextconstraints"], ResourceNames:["nonroot-v2"], Verbs:["use"]}: Forbidden]

@razvan
Copy link
Member

razvan commented Jun 2, 2025

Thanks. This probably means all others are broken too. I don't have time today but I'll try to take a look tomorrow.

You'd think but ... superset worked.

@razvan
Copy link
Member

razvan commented Jun 2, 2025

Thanks. This probably means all others are broken too. I don't have time today but I'll try to take a look tomorrow.

You'd think but ... superset worked.

I keep forgetting that make run-dev doesn't work with okd because it doesn't understand Helm templates and it doesn't assign SCCs to cluster roles. It just ignores them.

So actually, if installed properly and with the fix from above it should work.

https://testing.stackable.tech/job/spark-k8s-operator-it-custom/26/

@razvan
Copy link
Member

razvan commented Jun 3, 2025

Update: same on OCP 4.17

Tests look better but on OpenShift 4.18, this

$ ./scripts/run-tests --test-suite openshift \
--skip-release \
--skip-delete \
--test pyspark-ny-public-s3-image_openshift-true_spark-3.5.5_ny-tlc-report-0.2.0

fails to create the driver with the message:

Warning  Failed          45s (x11 over 2m42s)  kubelet 

Error: container has runAsNonRoot and image will run as root

(pod: "pyspark-ny-public-s3-image-2a7ed79734e5583a-driver_kuttl-test-provenpegasus(1fa4b08a-88ad-4cd5-9d57-5c36998ca9cc)", container: job)

Cannot tell what the problem is and why only this particular test is affected.

@lfrancke
Copy link
Member Author

lfrancke commented Jun 3, 2025

Thanks. I have more work to do. I'll look at it when I'm back.

@razvan
Copy link
Member

razvan commented Jun 3, 2025

Update: 30f8f2b
Update: related pr to publish example image #579

The commit above fixes the last two remaining tests.

Tested with OCP 4.17

$ ./scripts/run-tests --test-suite openshift --skip-release --skip-delete --test logging_openshift-true_spark-3.5.5_ny-tlc-report-0.3.0
...
--- PASS: kuttl (467.21s)
    --- PASS: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/logging_openshift-true_spark-3.5.5_ny-tlc-report-0.3.0 (466.17s)
PASS

$ ./scripts/run-tests --test-suite openshift --skip-release --skip-delete --test pyspark-ny-public-s3-image_openshift-true_spark-3.5.5_ny-tlc-report-0.3.0
...
--- PASS: kuttl (163.76s)
    --- PASS: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/pyspark-ny-public-s3-image_openshift-true_spark-3.5.5_ny-tlc-report-0.3.0 (162.84s)
PASS

Note

The image oci.stackable.tech/stackable/ny-tlc-report:0.3.0 doesn't exist yet due to Harbor restrictions. Replace stackable with sandbox for now.

I will make a separate PR to publish oci.stackable.tech/stackable/ny-tlc-report:0.3.0 from GH actions.

razvan
razvan previously approved these changes Jun 3, 2025
Copy link
Member

@razvan razvan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good now

@lfrancke lfrancke added this pull request to the merge queue Jun 6, 2025
@lfrancke lfrancke moved this from Development: In Review to Development: Done in Stackable Engineering Jun 6, 2025
Merged via the queue into main with commit 1e7686d Jun 6, 2025
17 checks passed
@lfrancke lfrancke deleted the feat/hardcoded-uid-gid branch June 6, 2025 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Development: Done
Development

Successfully merging this pull request may close these issues.

2 participants