Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor infra/gcp/... #516

Open
thockin opened this issue Dec 16, 2019 · 20 comments
Open

Refactor infra/gcp/... #516

thockin opened this issue Dec 16, 2019 · 20 comments
Labels
area/infra Infrastructure management, infrastructure design, code in infra/ kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.

Comments

@thockin
Copy link
Member

thockin commented Dec 16, 2019

Right now it is set up as "concept-first". For example "ensure-staging" says "all of these are staging-like" and "ensure-prod" says "all of these are prod-like". That makes it hard to get a sense of what any one project has going on.

I propose to refactor it to "project-first". One list of projects and each one says "I am prod like" or "I am staging like". Then I could simply say ensure-project k8s-foo-bar and all of the properties would be asserted.

This is very close to terraform territory, but I don't know TF well enough to make the "utility" functions to not be so duplicated. @cblecker - is this worth pursuing?

@cblecker
Copy link
Member

Yes, it is absolutely worth pursuing. The bash ensure stuff is getting out of hand IMO.

@thockin
Copy link
Member Author

thockin commented Dec 16, 2019 via email

@rikatz
Copy link
Contributor

rikatz commented Jan 9, 2020

@thockin just taking a walk into the issues, maybe this is related: #523

I also saw the discussion into the mailing list, in my opinion Terraform is the best way to do this :)

I just don't think I have enough knowledge into that to help with the Terraform stuff, but anyway just putting the PR here (again) so we may have a follow up.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2020
@bartsmykla
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2020
@spiffxp spiffxp added the area/infra Infrastructure management, infrastructure design, code in infra/ label Apr 15, 2020
@spiffxp
Copy link
Member

spiffxp commented Apr 15, 2020

/area cluster-mgmt
/area cluster-infra
/kind cleanup

@k8s-ci-robot k8s-ci-robot added area/cluster-mgmt kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Apr 15, 2020
@spiffxp
Copy link
Member

spiffxp commented Apr 28, 2020

Having dipped my toes into adding to this mess:

  • I hated concept-first at first, and thought project-first was a great idea
  • I'm now less sure, as some concept span projects (image promotion needs access to all these, prow/boskos will need access to all those)
  • I suspect the parts of our shell scripts that are for loops could correspond well to terraform modules
  • I just need to say out loud that I am still a little freaked out by terraform. It's this whole other ecosystem that has churn and will need to be kept current. I wasn't confident enough to try migrating the google provider from 2.x to 3.x for aaa without maybe accidentally blowing away the cluster. OTOH as we write more bash in lib*.sh files we're also creating our own ecosystem with possibly inconsistent naming, lack of testing, etc.
  • I fell back to using a shell script for creating projects, but when I have time would be willing to see what rewriting that as terraform would be like

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 26, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@spiffxp
Copy link
Member

spiffxp commented Jan 23, 2021

/remove-lifecycle rotten
/lifecycle frozen
/priority important-longterm
I don't have the bandwidth for this, and this issue is maybe too broad to stay open, but I think the point stands that:

  • continuing with our bash as-is will deepen our tech debt
  • we aren't comfortable enough with our bash to allow automation to run it

Whatever we use, even if it's bash, we need:

  • tests to enforce conventions
  • tests that build trust in our ability to refactor
  • tests that build trust in our ability to have automation run this
  • confirmation of what changes will result

In an ideal world, we would have:

  • ability to reconcile audit output with infra/gcp configs
    • if audit reveals missing resources, create them
    • if audit reveals unknown resources, suggest deletion or new configs to add
  • automated deployment on PR merge (with reliable postsubmits / discoverable postsubmit results)

Even though I'm not a terraform native, to me this sounds really aligned with terraform:

  • modules for organization / re-use
  • plan / apply to build trust in automation
  • terratest or something similar to enforce conventions

There might also be a middle ground where we want some common patterns described in yaml instead, e.g.

  • staging-project -> results in kubernetes.io group, manifest file, gcp project, service accounts, iam permission changes, etc.
  • public-app -> results in kubernetes.io group, aaa namespace, manifests, etc.

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jan 23, 2021
@spiffxp
Copy link
Member

spiffxp commented Feb 8, 2021

/reopen
d'oh, forgot this critical step

@hasheddan had also discussed possibly demoing crossplane for us (slack ref: https://kubernetes.slack.com/archives/CCK68P2Q2/p1611757501019900)

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Reopened this issue.

In response to this:

/reopen
d'oh, forgot this critical step

@hasheddan had also discussed possibly demoing crossplane for us (slack ref: https://kubernetes.slack.com/archives/CCK68P2Q2/p1611757501019900)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Feb 8, 2021
@hasheddan
Copy link
Contributor

@spiffxp preparing whenisgood as we speak :)

@rikatz
Copy link
Contributor

rikatz commented Feb 8, 2021

this is going to be fun, using kubernetes to manage the kubernetes infrastructure :D

@spiffxp
Copy link
Member

spiffxp commented Jun 11, 2021

❤️ to @ameukam for cross-linking PR's I've been contributing to refactor the bash in infra/gcp

#2188 takes a tentative step toward using YAML instead of hardcoded bash variables / arrays as the source of our configuration data

@spiffxp
Copy link
Member

spiffxp commented Jun 11, 2021

An update on the current state of the bash in infra/gcp.

Over the past few months, as I've worked to reconcile inconsistencies or unmanaged resources discovered via our automated audit PRs, I've been trying to nudge the bash in a consistent direction.

The principles I've tried to follow are:

  • extract lib_foo.sh files for different GCP services, eg: lib_iam.sh for IAM, lib_gsm.sh for Google Secret Manager
  • try for some level of consistency in function naming:
    • ensure_[removed_]_{resource} for creation/deletion of resources
  • refactor ensure-foo.sh files:
    • pull everything into functions such that a main entrypoint at the bottom is responsible for kicking off execution
      • makes it easier to test specific parts of a script
      • makes it easier to reuse other functions (vs. relying on order of definitions)
    • write functions such that they can operate on a list of args (e.g. enable_services foo bar baz)
    • use arrays more often, and pass those arrays as lists of args
      • less noise, and support for comments, when doing complicated multi-line things
      • ability to dynamically set flags
  • scope the set of resources a script manages such that less-privileged-than-org-admin roles could run these scripts

ameukam added a commit to ameukam/k8s.io that referenced this issue Aug 30, 2021
Ref: kubernetes#516

Move every bash script used for GCP operations to a separated folder.

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
ameukam added a commit to ameukam/k8s.io that referenced this issue Aug 30, 2021
Ref: kubernetes#516

Move every bash script used for GCP operations to a separated folder.

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
ameukam added a commit to ameukam/k8s.io that referenced this issue Aug 30, 2021
Ref: kubernetes#516

Move every bash script used for GCP operations to a separated folder.

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
@spiffxp
Copy link
Member

spiffxp commented Sep 2, 2021

/milestone v1.23

@ameukam
Copy link
Member

ameukam commented Dec 14, 2021

/milestone v1.24

@k8s-ci-robot k8s-ci-robot modified the milestones: v1.23, v1.24 Dec 14, 2021
@ameukam ameukam removed this from the v1.24 milestone Mar 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/infra Infrastructure management, infrastructure design, code in infra/ kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.
Projects
None yet
Development

No branches or pull requests

9 participants