Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Use an infrastructure-as-code solution to manage the open-telemetry github #1596

Open
jaronoff97 opened this issue Jul 14, 2023 · 33 comments
Labels
area/project-infra Non-GitHub project infra (DockerHub, etc.)

Comments

@jaronoff97
Copy link

jaronoff97 commented Jul 14, 2023

Hello all, after some of the past week's challenges around repository maintenance, some operator contributors had the idea that we could use Github's official terraform provider to provision and manage the various SIG github groups, repositories, branch protection rules, etc. that the community repo is currently used to manage entirely manually. This would then allow the TC to just approve and merge PRs that change the governance of the open-telemetry github rather than needing to make a slew of manual changes.

Benefits

  • Organization-as-code for OpenTelemetry would enable self-describing SIGs and repos that are entirely clear to maintainers
  • TC can automatically ping code owners when repository maintenance is needed
  • Maintainers that need repository maintenance can make the changes themselves
  • TC members who are not familiar with the github UI can focus solely on the requested change
  • (i'm sure there are more, these are the ones I just thought of initially)

Risks

  • Will take some time to migrate us to this new strategy (this can be done piecemeal through terraform by importing resources as needed)
  • Operational overhead
    • We will need someone to set up this terraform code including CI/CD
    • I think we should have the expertise within the organization to make this happen
  • If any of the repository information isn't meant to be public, this may be a challenge
    • I'm not sure what wouldn't be okay being public
  • Secret management will still need to be manual
    • This is no change from today, and we already don't have to update / manage these often right now

Design

  • Phase 1 (team membership)
    • Community would contain a new folder that holds its terraform configuration
    • All members of the org are imported via the Membership resource
    • CI/CD is automatically applying the state of the repository to ensure that the repo's state is always in-sync with the code
    • At this point, all membership requests can now be made solely via terraform
  • Phase 2 (SIGs)
    • A terraform module for OpenTelemetrySIG is created that can be used to create and manage the membership and repositories for a SIG
      • A module is used here to abstract the logical grouping of a SIG within the OpenTelemetry community
    • Each SIG's groups can be imported as teams and team members [doc]
  • Phase 3 (Repositories)
    • SIGs can opt-in initially to import their repository configuration to the SIG repository, some SIGs may choose to initially opt-out if they have special use cases
    • The current state of the repository can be shown using the terraform state show command post import, this will help fill the required terraform fields
  • Phase 4 (Branch rules)
    • Branch protection rules are added to the SIG module
    • Existing protections are imported, otherwise the otel specified imports are used
  • Phase 5 (All SIGs are moved)
    • At this point, we should have the ability to manage most of the capabilities that are requested for repo management
    • All SIGs and their repositories will be imported in to Terraform with any additional feature requirements being added prior to this

Please let me know if there are any steps missing from this list.

Alternatives Considered

  • Do nothing
    • We stick with what we have and we get more kerfuffles and more time spent on making repo changes in an opaque manor
  • Pulumi
    • Entirely valid solution for this
    • Has a github provider
    • Terraform was chosen purely from my own comfort level, I think we should go with whatever tool we feel is right, I don't have opinions one way or the other
  • Terraform Cloud / Pulumi Cloud / Spacelift
    • These cost money, and we should use their OSS stacks IMO. The benefits of the cloud providers are having a UI for approvals, but we don't really need that given github provides us that OOTB

Overall, I think having Terraform to manage the OpenTelemetry github state would allow for much faster and reliable management for the TC and I'm excited to hear the rest of the community's thoughts.

@jaronoff97 jaronoff97 changed the title [Proposal] Use IAAC to manage the open-telemetry github [Proposal] Use an infrastructure-as-code solution to manage the open-telemetry github Jul 14, 2023
@yurishkuro
Copy link
Member

Big +1 for going to org-as-code, but I would like more concrete details in this proposal to understand the required effort to achieve that. Can you outline a solution better? "Using terraform" doesn't tell me much.

@jaronoff97
Copy link
Author

Sure! I was holding off on describing how this would all work if there was strong opposition. I can write up some details soon and update my issue with them.

@trask
Copy link
Member

trask commented Jul 14, 2023

👍👍

can you look at / summarize other implementation options besides https://registry.terraform.io/providers/integrations/github/latest/docs, e.g. https://github.com/apps/settings, or if there are others worth considering?

@jaronoff97
Copy link
Author

@trask @yurishkuro I updated the description with your asks. Please let me know if you have other solutions in mind, or if there are other designs that may be more effective.

@svrnm
Copy link
Member

svrnm commented Jul 17, 2023

I have some experience with the settings app that @trask suggested, it does a good job overall and it's very convenient for maintainers as they can do repo setting updates via a pull request (and if you combine it with CODEOWNERSHIP you can require TC-approval, etc.)

Another project that might be interesting to look into is Peribolos:

Peribolos allows the org settings, teams and memberships to be declared in a yaml file. GitHub is then updated to match the declared configuration.

There is also Peribolos as a Service:

If you ever wanted to manage your GitHub organization as code where everybody can simply open a PR and ask to create a team or make a repository, wait no more!

Credit for pointing me towards peribolos & peribolos as a service goes to my amazing colleague @lelia :-)

Edit: via https://docs.prow.k8s.io/docs/components/cli-tools/peribolos/:

Peribolos allows the org settings, teams and memberships to be declared in a yaml file. GitHub is then updated to match the declared configuration.

See the kubernetes/org repo, in particular the merge and update.sh parts of that repo for this tool in action.

Peribolos was the subject of a KubeCon talk: How Kubernetes Uses GitOps to Manage GitHub Communities at Scale

@justaugustus
Copy link

I've written some ideas up in the past on org management with tools like peribolos: todogroup/governance#106 (comment)

@lmilbaum
Copy link

Terraform works with a backend to store its state files. You might want to consider how to set it up such that it is accessible by whoever needs to work with the Terraform plan.

@jaronoff97
Copy link
Author

@trask whatre the next steps to get started on this?

@trask
Copy link
Member

trask commented Aug 13, 2023

since github administration is owned by the @open-telemetry/technical-committee, we'll need their guidance on how they would like to move forward with this

@Aneurysm9
Copy link
Member

Do we need to reconsider this approach in light of cncf/foundation#617?

@yurishkuro
Copy link
Member

@Aneurysm9 I don't think so, we can still use Hashicorp tools internally if they are not part of the artifacts we release.

@jmacd
Copy link
Contributor

jmacd commented Oct 11, 2023

@jaronoff97
This topic came up (again) in today's technical committee meeting. We want to enable progress and unblock this effort, so that can begin treating github access permissions as code inside the organization.

@EjiroLaurelD
Copy link

Hello, my name is Laurel an Outreachy applicant. I went through the comments on this issue and found the proposal very intriguing. I have experience building with Terraform, and would love to contribute to this project in anyway I can. What are the next steps for Org-as-code and how can I be a part of it please?
Thank you for your time

@austinlparker
Copy link
Member

We should look at OpenTofu (https://opentofu.org/) for this in lieu of terraform. I think it's a good idea though, and the plan seems pretty straightforward.

@austinlparker
Copy link
Member

In terms of CI/deployment runs, Spacelift offers a free plan that would probably work...

@svrnm
Copy link
Member

svrnm commented Nov 14, 2023

I talked about this issue with @jaronoff97 a while ago, because I was looking into different alternatives to TF + github provider, i.e. there are

Compared to the Terraform Provider GitHub they all provide less functionality, but have some individual advantages, e.g. CLOWarden is cncf-owned (but still experimental) and Settings GitHub App "just" works by enabling it on a repository.

I wanted to call out those alternatives for completeness, but if the Terraform Provider for GitHub satisfies our needs, there is no strong objection from my site.

@jaronoff97
Copy link
Author

I'm happy with any of the above solutions, @svrnm should we attend the next TC meeting and walk through the options?

@svrnm
Copy link
Member

svrnm commented Nov 14, 2023

I'm happy with any of the above solutions, @svrnm should we attend the next TC meeting and walk through the options?

I shared those alternatives to have them captured, but to me it looks like there is broad support for going with the TF + GH provider solution as you have outlined it initially. Based on @jmacd's comment ( #1596 (comment) ) I think everyone is happy if we proceed with what you proposed initially.

@austinlparker
Copy link
Member

I've taken the liberty of putting together a spike on this so we can see what it'd look like.

@svrnm
Copy link
Member

svrnm commented Nov 15, 2023

I've taken the liberty of putting together a spike on this so we can see what it'd look like.

Nice, will take a look

In terms of CI/deployment runs, Spacelift offers a free plan that would probably work...

There is also https://www.cncf.io/project-tools/, especially the cloud credits might be helpful here "That’s why CNCF has created the Cloud Credits program, focussed on the mutual success of projects and participating companies. To date, supporters like Google, AWS, Equinix, and GitHub have donated cloud credits"

@bogdandrutu
Copy link
Member

Are we sure we want to use terraform? Are we ok with the new license?

@alolita
Copy link
Member

alolita commented Nov 16, 2023

OpenTofu not TF. +100 for IaC for OTEL GH management.

@alolita
Copy link
Member

alolita commented Nov 16, 2023

I support this initiative. Thanks for raising this @jaronoff97

@austinlparker
Copy link
Member

The GC took a vote on this proposal and are unanimously in favor to continue work on it. Let's keep working on the PR!

@trask
Copy link
Member

trask commented May 8, 2024

just documenting another option for completeness: https://github.com/github/safe-settings

@austinlparker
Copy link
Member

I wanted to flag https://github.com/cncf/clowarden (this has been mentioned elsewhere, but probably good to keep it in this issue) as an alternative we should strongly consider, especially since we now have cloud credits for running our own infrastructure.

@austinlparker
Copy link
Member

Honestly, the only thing CLOWarden doesn't do out of the box is handle 1password vaults (but there's no reason we couldn't add that, and I'm not really sure how easy it'd be to handle it thru OpenTofu anyway since we don't have a SSO provider; we'll need to do manual reconciliations, but I think that'd be straightforward enough to do thru a CLOWarden feature? it's not blocking for now either way.)

@austinlparker
Copy link
Member

Wanted to summarize a discussion from the Maintainer's call on 7/15.

  • There were some strong opinions against team/repo management taking place through IaC
  • Maintainers did not feel adequately consulted/aware of this issue
  • General support for org membership management through IaC

It was decided that SIGs should use this issue to discuss the scope of IaC management.

My position regarding centralizing repo/user/team membership in CLOWarden -

  • Currently, team and repo permission changes are not centrally managed. This makes it difficult to fully audit or be aware of these changes, nor is there a mechanism to ensure that the state of team membership reflects reality.
  • While audit logs do exist for these purposes, it is difficult to narrowly scope permissions to see the full trail of a change, and the context is lost because there is no direct association between an auditable event (e.g., team member change) and the discussion that led to it.
  • Given the importance (and heightened focus) around supply chain security and supply chain attacks, it is our responsibility to not only ensure we have good controls on org and team membership, but it is also important that we adopt least privilege principles. It is not great, for example, to have so many people in the org that are capable of changing team membership.

While I respect that it is, potentially, less easy to make a one-line PR to community than it is to click a button in the GitHub UI, I tend to believe the tradeoffs are worth it.

@jaronoff97
Copy link
Author

jaronoff97 commented Jul 15, 2024

IMO I would really appreciate IAC management. It would really help us understand our current approvers/maintainers, their permissions, repos. Furthermore, it would make changes much more self-serve and auditable to avoid the need to bug GC/TC members. I also would eventually appreciate the abilities to provision and own different pieces of infrastructure.

While I respect that it is, potentially, less easy to make a one-line PR to community than it is to click a button in the GitHub UI, I tend to believe the tradeoffs are worth it.

I think the barrier being a PR isn't the end of the world given that they will have needed to make a PR prior to being a member or changing roles. We could also write some makefile automation to fill out issues for new users as well (automatically pulls the PRs they've made against otel repos).

@svrnm
Copy link
Member

svrnm commented Jul 15, 2024

There were some strong opinions against team/repo management taking place through IaC

What kind of strong opinions against it? Can we get them shared here such that we can address them? +1 for what you both (@austinlparker + @jaronoff97) said o, there are many good reasons for doing it through IaC, which is not only an improvement from an audit and security but also community perspective (people see and recognize better who is filling which role, etc.). Indeed it should be more than "click a button", especially for maintainers, since we have a voting process that requires a PR already, all that we would do is move that PR somewhere else.

Maintainers did not feel adequately consulted/aware of this issue

Also here I would like to understand how we could have made people aware better, this is maybe more a thing for SIG Contributor Experience and might require it's own issue: to be honest I am not 100% sure what the right way is to "consult or make maintainers aware", is it a community issue?, is it the otel slack?, is it the SIG maintainers meeting? Maybe something we also need to be more conscious about with our community having reached the size we have today.

@austinlparker
Copy link
Member

@svrnm I tried to capture those concerns in the issue above; fundamentally, some individuals raised the issue that their existing workflow for team management worked for them and didn't like this change, nor were they consulted on the proposal.

@svrnm
Copy link
Member

svrnm commented Jul 17, 2024

@svrnm I tried to capture those concerns in the issue above; fundamentally, some individuals raised the issue that their existing workflow for team management worked for them and didn't like this change, nor were they consulted on the proposal.

Thanks for clarification.

@mtwo
Copy link
Member

mtwo commented Jul 22, 2024

Summarizing the discussion from the maintainers' call today: the JS SIG wants this, Go also wants it but they would prefer to not be amongst the first wave of implementors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/project-infra Non-GitHub project infra (DockerHub, etc.)
Projects
Development

No branches or pull requests